Complex Adaptive Systems: 2008

Monday, 22 December 2008

Sunday, 21 December 2008

Last month I bought Samsung's NC netbook and I'm astonished how cool it is. It's really handy if you travel a lot (I do!) and have a lot to code (I do!). Unfortunately, there are some issues which have to be solved first.

1. Touchpad problem: It totally sucks when you write something and your fat and overdimensioned nerdy hand (or what I like to call it: The Hand Of Code) gets even slightly over the touchpad. Therefore you totally need to disable the touchpad for a certain time. Fortunately, the gods of Ubuntu created a program called 'syndaemon' which exactly works like this. It's useful to put it to your autostart via System > Preferences > Sessions.

2. Excessive load cycle: It slowly kills your hard drive. So better follow these instructions to set the correct values. It seems that there are still issues even if you've changed the options.

For further information you should check out:

Ubuntu on the Samsung NC10
Linux on the Samsung NC10
The Ubuntu NC10 Community Documentation

Friday, 19 December 2008

Natural Language Processing Online Applications

I want to present an interesting link list with online and free to use interactive NLP related applications:

1. XLE Web Interface allows you to parse sentences of German, English, Norwegian, Welsh, Malagasy and Arabic. You'll get a very detailed parse tree and the functional structure of the sentence, for "This is madness!" you'd get:

2. Wortschatz Leipzig is a German application that crawls the web for a word and returns a detailed analysis of the word frequency, collocations and semantic relations. The word-graphs are most interesting, e.g. the graph for "Humbug" (German for "rubbish"):

3. WordNet is a large lexical database for English, e.g. "house" would show following interpretations:

4. Answerbus is a search engine like Google or Yahoo but with semantics! You can ask natural questions like "Who killed JFK?" and will (perhaps) get the answer "Oswald killed JFK". Perhaps... because the system actually sucks and you can easily outmaneuver it. Another search engine is START, which sucks too.

5. Wordfall is an awesome linguistic game! It's like Tetris but instead of blocks you have to match words to their constituents. Look:

6. Wortwarte is a German site about neologisms in the media. They are collected and sorted.

7. A cool German chatbot called ELBOT. It would definitely pass my Turing Test.

8. Think of a thing and 20Q will read your mind by asking 20 questions.

9. Machine Translation is one of the prime disciplines of NLP. Everyone knows Babelfish. It's not only a translator in the Hitchhiker's Guide but also an online translator like Google Translation.

10. TextCat is a language guesser based on an n-gram Perl script. Another and better one would be the XRCE language guesser.

Wednesday, 3 December 2008

N-Gram M-Adness

One of the main basic concepts of Natural Language Processing is the model of n-grams. It's the splitting of a sequence into n-subsequences, e.g.

(1) Now the lord once decided to set off for the mountain where the man lives

For n = 1 (unigram) the sentence is splitted into:

(n = 1) [ [Now] [the] [lord] [once] [decided] [to] [set] [off] [for] [the] [mountain] [where] [the] [man] [lives] ]

For n = 2 (bigram) the sentence is splitted into:

(n = 2) [ [Now the] [the lord] [lord once] [once decided] [decided to] [to set] [set off] [off for] [for the] [the mountain] [mountain where] [where the] [the man] [man lives] ]

For n = 3 (trigram) the sentence is splitted into:

(n = 3) [ [Now the lord] [the lord once] [lord once decided] [once decided to] [decided to set] [to set off] [set off for] [off for the] [for the mountain] [the mountain where] [mountain where the] [where the man] [the man lives] ]

Okay you get the idea. The sequence of words "w₁, ..., w_k" is splitted into "w_k and w_k-1" for bigrams, "w_k and w_k-1, w_k-2" for trigrams and "w_k and w_k-n+1, ... w_k-1" in general.

Here's the relevant Python code for making n-grams:


def makeNGrams(inpStr, n):
    token = inpStr.split()
    nGram = []
    for i in range(len(token)):
        if i+n > len(token):
            break
        nGram.append(token[i:n+i])
    return nGram

Or a bit more condense:


def makeNGrams(inpStr, n):
    inpStr = inpStr.split()
    return [inpStr[i:n+i] for i in range(len(inpStr)) if len(inpStr)>=i+n]

Why do you need this?

1. Machine Learning uses n-gram models to learn and induce rules from strings.
2. Probabilistic models use n-grams for spell checking and correcting misspelled words.
3. Compression of data.
4. Optical character recognition (OCR), Machine Translation (MT) and Intelligent Character Recognition (ICR) use n-grams to compute the probability of a word sequence or generally a pattern sequence.
5. Identify the language of a text (demo here)
6. Identify the species given a DNA sample.

For example you can compute the probability of a sequence by multipling all previous probabilities: P(w_k|w₁, ..., w_k-1) but if one of these previous sequences is zero, the whole expression will be zero too. This is a huge problem, since these long sequences are hardly ever seen in corpora, even if you take the internet, e.g. "The world, as we know it, will be changed by the pollution of the environment". Therefore we only take the direct predecessor by using an n-gram model and can estimate the probability. Another application for n-grams can be found in Part of Speech tagging and probabilistic disambiguation of tags, e.g. the probability of "book/NN the/DT flight/NN" versus the probability "book/VB the/DT flight/NN".

I wrote a very simple program to predict the next word given a sequence of words in a corpus, e.g. input: "I will eat"; output: "fish" you can find it here.

Another program concering n-grams, which I wrote, is available here. It extracts proper nouns, e.g. "New York City" from English texts.

Friday, 14 November 2008

Art of the State

Currently I'm very busy in learning the formal foundations of linguistics (Set Theory, Relations), programming (Python) and computational linguistics (Maximum Likelihood Estimation, Context Free Grammars).

I got an e-mail. Really. From a reader. By the way, it's my first; that's why I'm quite enthusiastic. He suggested to introduce my readers (do I have any?) to his blog Neuropolitics. Well, well, I read the first sentences and decided to mention it in this post. Form your own opinion.

Perhaps I'm going to write about the foundations of Natural Language Processing or upload some Python code which could be used to play with strings. Mostly harmless, not really meaningful code. I don't think that I'll have time for something else anyway.

Saturday, 11 October 2008

Time's running, running out

Unfortunately, I don't have time to blog for the time being. I study natural language processing, which is very intense. So give me a week or two and I'll come back.

Sunday, 5 October 2008

Points of Interest 05. October, 2008

1. Please dear god make this an imperative:

2. Why don't apes use language although they could? Because they don't have a psychological infrastructure of shared intentionality. Bolles' Review of Tomasello's Power Point Prose: Part 1 & Part 2

3. I always wondered which type of camouflage the US Army use since it looks like bad pixels of ancient computer days. Here's why they use digital camouflage: Can You See Me Now?

4. Dying of capsaicin? Well, I eat Habaneros every day - no joke: Which organisms can feel pain? & Chili, capsaicin and cancer

5. Time for linguistic lolcat:

6. After several years of detective work, philologists at the University of Stavanger in Norway have collected a unique collection of texts online. Now they're about to start the most comprehensive analysis of middle English ever: New life for Middle English: Norwegian detective work gives new knowledge of the English language.

7. Syntactic persistence is the tendency for speakers to produce sentences using similar grammatical patterns and rules of language as those they have used before. Although the way this occurs is not well understood, previous research has indicated that this effect may involve a specific aspect of memory function. Memory is made up of two components: declarative and procedural. Declarative memory is used in remembering events and facts. Procedural memory helps us to remember how to perform tasks, such as playing the piano or riding a bike. A recent study suggests that the common phrase, "it's so easy, it's like riding a bike" should perhaps be replaced with "it's so easy, it's like forming a sentence.": Un-total recall: Amnesics remember grammar, but not meaning of new sentences

9. Cool new robots from Japan with cool abilities: Photos: Robots at CEATEC 2008

10. It's the thalamus that actually matters for sentence processing: Thalamus? Yes. Basal ganglia? Nope.

11. Beautiful statue: "Transi de René de Chalon," Ligier Richier, 1547

12. Broca's area shows a "sentence complexity" effect. It responds more during the comprehension of object relative (OR) constructions than easier to process subject relative (SR) constructions: Broca's area, sentence comprehension, and working memory

13. Carbon nanotechnology in an 17th century Damascus sword

14. A Bob Dylon song encoded in XML: Encoding Dylan

15. Why choose the lesser evil?

16. Interesting, really: How to beat of a cold

17. Taking the new out of neurons

18. Robo-starfish learns about itself and adapts to injuries

19: 2008 IgNobels

Monday, 22 September 2008

Points of Interest 22. September, 2008

1. Listeneres can only keep up with the rapid rate of speech (5 syllables/second) because they anticipate the missing possible syllables of the word. A new study conducted by scientists of the University of Rochester and Georgia Tech showed that this is not only true for the phonology but also the semantics of words: Scientists watch as listener's brain predicts speaker's words & Neural correlates of partial lexical activation

2. At age 3–4, the overwhelming majority of children behave selfishly, whereas most children at age 7–8 prefer resource allocations that remove advantageous or disadvantageous inequality: Egalitarianism in young children

3. The evolution of speech. Speech recognition part in macaques found: Monkey Brains Hint at Evolutionary Root of Language Processing

4. World largest semantic map revealed. First steps toward Semantic Web? Computers figuring out what words mean

5. The right word is in our jaw: Speaking Without Sound & Breakthrough in understanding of speech offers hope to the deaf

6. Stuttering causes bilingualism: Does bilingualism cause stuttering?

7. Neuroaesthetics? Beauty & the Brain and Beauty and the Brain

8. Save humanity. But first I want more funds for computational linguistics: Funding the Mitigation of Extinction Risks and How can we reduce the risk of human extinction?

9. Humans - The best race there is and ever was on earth? Stop kidding me, Lystrosaurus dominated more: Technologies to Watch Out For: Self-Copying

10. The geometric bucket a systematical view: A simple toy, and what it says about how we learn to mentally rotate objects

11. Oh my arse: The Evolution of Assholes

12. The seven gates to humanity: What I've Learned About Human Origins

13. I like the picture of possible paths for human evolution: Mark Stoneking’s Four Models Of Human Origins

14. About rhymes in Japanese Hip Hop and what they reveal about the language: I'll experiment like a scientist/ You wanna rhyme, you gotta sign my list

15. “Thinking about Not-Thinking”: Neural Correlates of Conceptual Processing during Zen Meditation

16. Suicidal Individuals: Evaluation, Therapies, and Ethics – Part 1 & Part 2

Wednesday, 17 September 2008

Points of Interest 17. September, 2008

I got a bit picky about the Points of Interest I choose these days; so less is more. A problem which occured while writing this and which is bothering me: What's the difference between:

a) It is not
b) It isn't
c) It's not

I think the first is the most emphasised because there is no contraction at all. The second emphasises the subject - due to the contraction of "is not" the stress shifts to "It". The third emphasises the negation because the stress lies on "not". Language Hat had a post about this in 2005.

1. [...] findings suggest that New Caledonian crows can solve complex physical problems by reasoning both causally and analogically about causal relations: Do New Caledonian crows solve physical problems through causal reasoning?

Alex Taylor explains the experiment:

2. Pro Transhumanism. It's not a matter of philosophy - It's a matter of time: Transhumanism as Universal

3. About the temperature of excluding metaphors: Social exclusion literally feels cold

4. Pulvermuller's vs. Wernicke-Lichtheim's functional anatomy of language: Pulvermuller = Wernicke-Lichtheim

Monday, 15 September 2008

Points of Interest 15. September, 2008

1. An interesting article about the neurobiology of a hallucination based on Ffytche, D. (2008). The hodology of hallucinations. Cortex 44: 1067-1083. DOI: 10.1016/j.cortex.2008.04.005:

"In the EEG experiments, the activity recorded from two of the electrodes was found to become sychronous whilst the subjects were hallucinating. [...] Ffytche hypothesizes that the changes in connectivity could be due to changes in the firing mode of the thalamo-cortical connections [...] Overall, Fytche's findings suggest that hallucination cannot be explained by a topological or hodological explanation alone, but instead by a combination of the two. [...]"

2. Gestalt meets linguistic relativism: What Bolles have learned about language.

3. "[...] some so-far anonymous computational linguist caused United Airlines to lose more than a billion dollars of its market capitalization, over the course of about 12 minutes last Monday: Economic linguistics

4. Who carried out 9/11? Views Differ...

5. From E-Paper to Semantic Web. What kind of technologies could we expect in 2018? Nature asks: What will happen in the next 10 years?

Thursday, 4 September 2008

One word != one number

Earlier this year a study was conducted by researchers from the University of Melbourne and University College London - namely Brian Butterworth, Robert Reeve, Fiona Reynolds and Delyth Lloyd. Children of two indigenous communities were tested for their numeracy skills; one from Tanami Desert and the other from Groote Eylandt. Another group were indigenous preschool children from Melbourne. Here's a map of the locations:

The results showed clearly that the children of indegenious communities - who have no words or even gestures for numbers - have numeracy skills equal to native English speaking children. So numeracy is not based on culture or language but probably an innate facility.

Publications:

Butterworth, B., Reeve, R. (Forthcoming). Verbal counting and spatial strategies in numerical tasks: Evidence from indigenous Australia. Philosophical Psychology
Butterworth, B., Reeve, R., Reynolds, F., Lloyd, D. (Forthcoming). Numerical thought with and without words: Evidence from indigenous Australian children. Proceedings of National Academy of Sciences of the USA

Tuesday, 26 August 2008

Points of Interest 26. August, 2008

Found two extraordinary humorous blogs:

1. The world on file cards: http://indexed.blogspot.com/

2. All about Hitler: http://blogs.taz.de/hitlerblog/

Back to the daily scientific work:

1. Can Brains Point the Way?

2. Is Ambiguity Bad?

3. Music and memory: How the songs we heard growing up shape the story of our lives

4. A simple metric to infer personality from facial expression

5. Mirror self-recognition in Magpies

6. The smell of fear

7. Mirror neurons, hubs, and puppet masters

8. Evil and Theodicy, Part 1: Are Happiness and Virtue Linked?

9. Evil and Theodicy, Part 2: Can Happinees and Virtue Be Linked?

10. Evil and Theodicy, Part 3: Kant, Authenticity, and Lament

Friday, 15 August 2008

Points of Interest 13. August, 2008

キタ━━━(゜∀゜)━━━!!!!!, (゜どうも，おひさしぶりです!

1. The genetic component to European ethnic groups. You can see a clear distinction between individuals with northern from southern European ancestry. Interestingly, these genetic boundaries often mark linguistic boundaries too: Genetic, Geographic, And Linguistic Structure Of European Populations

2. "Jugemu" is a rakugo, verbal entertainment, which everybody in Japan knows: Jugemu Jugemu Go-Kō-no-Surikire Kaijari-suigyo no Suigyō-matsu Unrai-matsu Fūrai-matsu Kū-Neru Tokoro ni Sumu Tokoro Yaburakōji no Burakōji Paipo Paipo Paipo no Shūringan Shūringan no Gūrindai Gūrindai no Ponpokopii no Ponpokonā no Chōkyūmei no Chōsuke

3. Are nouns and verbs represented differently in the brain? A typical view is that they are, with nouns relying more on temporal cortices and verbs on frontal regions: Representation of nouns and verbs in the brain

4. Participation in most sports requires agility, impeccable timing and the planning and execution of complex movements, so that actions such as catching a ball or throwing it into a hoop can be performed. Performing well at sports also requires anticipating and accurately predicting the movements of others: The baller's brain (and his pinky)

5. Did the introduction of cooking, have caused a relaxation of selective constraints on diet-related genes: Advent Of Cooking & The Big Cognitive Leap In Human Evolution

6. Totally awesome. I think bionics is the future technology: Robot controlled by "brain" in a culture dish

7. The top thirty 顔文字 (kaomoji/emoticons) used in Japan: Top thirty Japanese emoticons

8. Rap version of LHC from CERN - soon they'll blow up the world: Large Hadron Rap

9. Post about the Japanese pronoun nanji: I want to talk about you

10. The eye tells the brain when to plasticize

11. Mélange

12. England's rock art

13. Dinosaur Supertree

14. Collectie Ver Huell

15. Simulated Linguistic Evolution In The Laboratory

16. Testosterone and aggression, or what Frank's Red Hot Sauce has to do with handgun violence

17. Tone deafness and bad singing may not go hand in hand

18. Do you choke under pressure? Depends on what you're trying to accomplish

19. If you want to persuade a woman, look straight at her

20. 'Beer goggles' are real - it's official

21. US boasts of laser weapon's 'plausible deniability'

22. Has porn become mainstream? Not really

23. Did the Baldwin Effect Give Us Language?

24. On Human Nature

Sunday, 10 August 2008

Resistance is futile

Soon I'll be one of them, i.e. accepted for computational linguistics at Heidelberg - hooray!

Wednesday, 30 July 2008

Points of Interest 30. July, 2008

1. It has long been debated whether dinosaurs were part of the ‘Terrestrial Revolution’ that occurred some 100 million years ago during the Cretaceous when birds, mammals, flowering plants, insects and reptiles all underwent a rapid expansion: Dinosaur Supertree

2. Take one part gorgeous ornamental typography and one part diabolical imagery. Combine slowly over a low heat with incidental visual curiosities. Add caprice to taste. Serve haphazardly over a bed of 19th century lithographic stones. For best effect, consume before retiring: Collectie Ver Huell

3. If you judge the progress of humanity by Homer Simpson, Paris Hilton, and Girls Gone Wild videos, you might conclude that our evolution has stalled—or even shifted into reverse. Not so, scientists say. Humans are evolving faster than ever before, picking up new genetic traits and talents that may help us survive a turbulent future: Where Is Human Evolution Heading?

4. Today's Daily Telegraph contains a fascinating extract from Norman Doidge's new book The Brain That Changes Itself, about a woman who feels that she is constantly falling because she has lost her sense of balance as a result of damage to the vestibular system: Perpetually falling woman learns to balance with her tongue

5. Jump to Comments Language is a product of culture. Or is it? Which came first — language or culture? That’s like asking if the chicken or the egg came first. But cultural behavior has been documented in animals who do not have language systems, like gorillas who have intricate systems of processing plants: Can There Be A Synthesis Between Cultural And Biological Evolution?

6. Could we have evolved speech without evolving morality, or morality without evolving speech: A Biological Revolution

Thursday, 24 July 2008

Issa

Kobayashi Issa (小林一茶) or just Issa (June 15, 1763 - January 5, 1828), by the way also the Arabic name for Jesus, is one of my favourite poets. He is one of the great four Japanese Haiku poets besides Bashô, Buson and Shiki. I like him because I sense his melancholy. His mother died when he was three, his grandmother, who raised him, when he was fourteen. He wandered through Japan. Got back, got a wife. All of his children died soon after birth and finally his wife died too. He wrote one of his most famous haiku at the time when his first daughter died:

The world of dew --
A world of dew it is indeed,
And yet, and yet . . .

One of my favourite poems of Issa is:

夜神楽や焚火の中へちる紅葉

yokagura ya takibi no naka e chiru momiji

In English:

Shinto dance at night--
red leaves fall
into the bonfires

It has even more power in German:

Ein Tempeltanz nachts:
Es stiebt ins Feuer hinein
Das rote Herbstlaub.

Wednesday, 23 July 2008

LaTeX

If you don't know this picture, you shouldn't study. Okay, that's a hyperbole. Nevertheless, LaTeX (/ˈleɪtɛk/ or /ˈlɑːtɛk/) is an important tool for writing scientific essays. It's not sufficient for a dotoral thesis or any other academic document with more than fifty pages to use Microsoft Word or Open Office. Unfortunately, in Germany many students and even post-graduates don't know LaTeX unless they study computational science, mathematics or physics. In other countries it is the tool for scientists, philosophers, mathematicians and engineers to write a scientific essay. It is most convenient tool to display mathematical formulas as you'll see below.

The high-level markup language allows you to produce ready to print and printer/monitor independent documents with easy numbering, cross-referencing, tables and figures, page layout and bibliographies. It takes you one week, at the most, to learn the language and saves you years of blood and tears.

Wikipedia shows a comprehensive example of a LaTeX document, here is the raw script:

\documentclass[12pt]{article}
\title{\LaTeX}
\date{}
\begin{document}
\maketitle
\LaTeX{} is a document preparation system for the \TeX{}
typesetting program. It offers programmable desktop publishing
features and extensive facilities for automating most aspects of
typesetting and desktop publishing, including numbering and
cross-referencing, tables and figures, page layout, bibliographies,
and much more. \LaTeX{} was originally written in 1984 by Leslie
Lamport and has become the dominant method for using \TeX; few
people write in plain \TeX{} anymore. The current version is
\LaTeXe.
\newline
% This is a comment, it is not shown in the final output.
% The following shows a little of the typesetting power of LaTeX
\begin{eqnarray}
E &=& mc^2                              \\
m &=& \frac{m_0}{\sqrt{1-\frac{v^2}{c^2}}}
\end{eqnarray}
\end{document}

This is converted to following document:

If you want to learn LaTeX, you can find an excellent Introduction at Wikibooks here. LaTeX is build into Linux operating systems, but you can use proTeXt for Windows as well. If you don't use Linux, you can get LaTeX here.

Monday, 21 July 2008

Points of Interest 22. July, 2008

1. Literary criticism sucks. So true, so true: Impostor

2. Windows into heaven: Notes on the Theology of Icons, Part 4: Reverse Perspective

3. Hilarious flashgame. Create your own disease and kill mankind: Pandemic 2

4. Differences Between Cognition & Memory Of Males And Females Linked To Two Genes

5. Speech's Stabilizing Forces

6. Sorry, Charlie, you and Nemo aren't the only fish that talk

7. UPM School of Computing researchers open up a new road for the computational representation of languages

Friday, 18 July 2008

Points of Interest 18. July, 2008

Some changes are going down. I won't write in German anymore since I have almost no German readers and German linguists aren't really interested in state of the art theories.

1. A picture of an abnormally folded amyloid fibril reconstructed in 3D: Detailed 3D image of Alzheimer's pathology

2. Third part of the Iconography series on Experimental Theology. : Notes on the Theology of Icons, Part 3: Time and Space

3. Ubiquity of same-sex couplings in nature.

Tuesday, 15 July 2008

Language Diversity Made by God

At least that's written in Genesis 11: 1-9. First there was one language spoken by everyone which unified the people:

And the whole earth was of one language, and of one speech. [...] And the Lord said, Behold, the people is one, and they have all one language [...].

So they decided to build a tower and a city which were afterwards called Babel (hebr. bâlal - to overflow, mix, confound) because God came down from heavens to "[...] confound their language, that they may not understand one another's speech [...]" which scattered them across the earth so that the tower was never build. Henceforth there were many languages across the earth.

Saturday, 12 July 2008

Points of Interest 14. July, 2008

1. "You Otaku are already dead." That's the new title of a book which discusses the rise and fall of the Otaku aristorcracy of Japan: Speaker for the Dead

2. Food for token, please: Can animals comprehend the power of symbols?

3. One, two, three - a monkey is what I want to be: Counting monkeys tick off yet another 'human' ability

4. Highly debatted Goldin-Meadow paper about 'SOV' charade order: When using gestures, rules of grammar remain the same despite speakers' language

5. Same issue. Wired article: Roots of Language Run Deeper Than Speech

6. To hell with linguistic phylogenetics: A Look at Linguistic Evolution

7. Beat the hell out of those Generativists: Language Adapted to Us

8. Hug those lovely Generativists: Questions For A Theory

9. Take it into your own hands: Keeping Hands Where You Can See Them Alters Perception, Study Finds

10. I'll be a chimp again: Will Our Future Brains Be Smaller?

11. Children Are Naturally Prone To Be Empathic And Moral

12. Notes on the Theology of Icons, Part 1: Stylization

13. Notes on the Theology of Icons, Part 2: Light

14. I Can't Understand Your Accent, So Keep Talking

15. Shakespeare makes us alive: The Shakespeared brain

Tuesday, 8 July 2008

Am I wrong about Pinker's extended locative?

Pinker writes in his book Language of Thought about the locative rule:

If a verb can appear in a content-locative construction then it can also appear in a container-locative construction, and vice versa. (Pinker 2007: 35-6)

as in:

(1) [to spray] water on the roses.
(2) [to spray] the roses with water. (Pinker 2007: 35)

In contrast to:

(3) to fill the glass with water.
(4) *to fill water into the glass. (Pinker 2007: 50)

According to this exception, he alters the locative rule:

[...] by specifying the change of a container, it is compatible with a construction that is about state-change, and thereby allows us to say [(3)]. But because it says nothing about a cause or manner of motion of the contents, it isn't compatible with a construction that is all about motion, and thereby doesn't allow us to say [(4)].

His conclusion at the end of the chapter:

This uncovered a number of basic features of our thought processes: [...] that a frame for thinking about a change of location in real space can be metaphorically extended to conceptualize a change of state as motion in state-space; and that when the mind conceives of an entity as being somwhere or going somewhere, it tends to melt it down to a holistic blob.

As far as I understand him, he ascribes this property to the mind.

So, the first thing that struck me was that (3) and (4) sound correct for non-native speakers and I assume it would sound correct to many native speakers as well - depending on the region they're from. Actually you can say both sentences in German - which is, by the way, closely related to English:

(3(DE)) Das Glas mit Wasser füllen.
(4(DE)) Wasser in das Glas füllen.

So this must be specific for English, right? Since he ascribes this to the mind, this must mean that it only works for the English mind which, and here comes the logic conclusion, is different from the German mind. Hence English speakers have another cognition, i.e. they perceive the reality in a different way and therefore (3) is possible, while (4) is not? This would be an argument for linguistic determinism. I know this is nonsense.

_

Pinker, S. Language of Thought. 2007.

Thursday, 3 July 2008

Points of Interest 03. July, 2008

Everyone's talking about the upcoming Christiansen and Chater paper and that Universal Grammar, especially the Poverty of Stimulus argument, will be destroyed. I'm curious.

1. NLP (Natural Language Processing) goes Mainstream: Powerset bought by Microsoft

2. David Beaver's critique on the New Scientist's article Charades reveals a universal sentence structure: Charades does not reveal a universal sentence structure

3. Diffusion Spectrum Imaging Used to Map the Structural Core of Human Cerebral Cortex

Wednesday, 2 July 2008

Labov vs. Chomsky - The Ultimate Smackdown

A Facebook Group

* two linguists walked into an x-bar and got SMACKED DOWN!
* "generate THIS motherf***er!"
* "ain't nothin' minimalist about this SMACKDOWN, mofo!"
* "how about some government-binding in your FACE!"

***Breaking News: an angry mob of Bloomfieldians is challenging the winner

Thanks to Michael and Language Log.

Tuesday, 1 July 2008

Point of Interest 01. July, 2008

1. Intriguing proposal on a community's ability to shape speech by self-organisation and contagion: Speech, Community and Power

2. Computational linguistics history - the ALPAC report: "These can be aptly compared with the challenges, problems, and insights of particle physics"

3. 10 Extraordinary Burial Ceremonies From Around The World

4. The Richness of the Stimulus

5. Charades reveals a universal sentence structure

Tuesday, 24 June 2008

Sapir-Whorf Hypothese

Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt schrieb Wittgenstein in seinem Tractatus (Wittgenstein 1921) und sagte damit, dass man nur soweit denken kann, wie einen seine Sprache trägt. Ist aber das Denken selbst durch die Sprache beeinflusst? Gibt es so etwas wie non-verbale Gedanken, i.e. Gedanken die ich nicht in meiner Sprache ausdrücken kann? Schon die beiden Kant Schüler Hamann und Herder haben über den Determinismus nachgedacht und Humboldt schrieb 1836, dass die Sprache das Organ des Denkens sei. (Humboldt 1836)

Gut 120 Jahre später beschäftigt sich ein Laienlinguist mit diesem Problem und prägt die posthum nach ihm benannte (Sapir-)Whorf Hypothese. Diese Hypothese sagt, dass eine Einzelsprache (z.B. Deutsch) Einfluss auf das Denken eines Individuums hat und impliziert damit, dass es nicht möglich ist, die Realität auf eine objektive Weise über Sprache zu beschreiben, da jede Sprechergemeinschaft eine eigene Vorstellung von Realität hat.
Eigentlich beginnt die Geschichte dieser Hypothese mit dem Ethnologe Franz Boas, welcher die Sprachen der Ureinwohner Amerikas, speziell der Hopi-Indianer, untersuchte und dabei eine Relation zwischen Grammatik und geographischer Lage herstellte. Er folgerte daraus, dass Sprache als Reflexion der Kultur zu verstehen sei.

Eine bekannte Hopi Künstlerin

Sapir, der Schüler von Boas war und seine Magisterarbeit über Herders Abhandlung über den Ursprung der Sprache schrieb, vertrat als Anthropologe und Linguist die These, dass die Sprache und das Denken, mehr oder weniger, in einem wechselseitigen Verhältnis zueinander stehen und das Sprache das Denken determiniert – deshalb auch linguistischer Determinismus genannt. (Sapir 1929, 1958)

Edward Sapir

Whorf, eigentlich Chemiker, eignete sich sein linguistisches Wissen autodidaktisch an und war Schüler von Sapir. Er selbst kam nie in Kontakt mit den ursprünglichen Hopi-Indianern. Seine Idee war, dass die Realität anhand bestimmter Wertigkeiten durch linguistische Systeme in Konzepte eingeteilt wird. Dadurch, dass diese Wertigkeit durch Übereinkunft mit den Sprechern einer Sprechergemeinschaft getroffen wird, spiegelt sich in jeder Sprache ein einzigartiges Weltbild wieder, i.e. die physischen Zeichen werden unterschiedlich in den Sprechergemeinschaften bewertet und so entsteht ein anderes Bild der Realität – deshalb auch linguistischer Relativismus genannt. (Whorf 1940)

Benjamin Lee Whorf

Bekannt ist sein Vergleich der englischen Sprache mit der Sprache der Hopi-Indianer anhand der grammatischen Struktur. Die englische Sprache tendiert dabei eher die Realität als Objekt im Raum zu analysieren. Gegenwart und Zukunft werden als Lokalitäten gesehen und die Zeit als Weg zwischen ihnen, deshalb werden auch Phrasen wie drei Äpfel und drei Tage als grammatisch gleichwertig angesehen. Unsere Sprache besteht aus vielen Metaphern um die Abstrakten Begriffe der Zeit zu fassen, so kann man zum Beispiel Zeit genauso verschwenden wie sein Geld. Hopi, im Vergleich dazu, ist eher prozessorientiert. Was Whorf nicht berücksichtigt hat, da er selbst nie Feldarbeit leistete, war, dass es auch zwei Zeiten bei der Verbkonjugation in der Hopi-Sprache gibt – manifestiert und nicht-manifestiert. Manifestiert bezeichnet dabei alles konkret Wahrnehmbare, physisch Existente, in Gegenwart und Vergangenheit. Nicht-manifestiert bezeichnet alles Nicht-Physische und Nicht-Wahrnehmbare.

Dieser Idee liegt auch zu Grunde, dass die kulturellen Unterschiede von Sprechern deren Sprache einer gemeinsamen Sprachfamilie angehören, also zum Beispiel Deutsch und Niederländisch (Indogermanisch > West-Germanisch), nicht so gravierend sind wie der kulturelle Unterschied von Sprechern unterschiedlicher Sprachfamilien, also zum Beispiel Deutsch und Chinesisch (Sinotibetisch). Diese Schlussfolgerung ist natürlich kompletter Schwachsinn. Sprachen einer Sprachfamilie sind natürlich geographisch näher zusammen und damit ist auch der kulturelle Austausch eher vorhanden als zwischen Deutschland und China. Außerdem gibt es Sprachen wie Türkisch und Japanisch, die beide vielleicht der makro-altaischen Sprachfamilie angehören und deren Kultur wohl grundverschieden ist.

Ein weiteres Problem ist das der Übersetzung. Nach der starken These würde es nicht möglich sein die Inhalte eines Satzes in eine andere Sprache zu übersetzen. (Chandler 1995) Tatsächlich beklagt sich der Dichter Pablo Neruda darüber, dass seine Gedichte in der Übersetzung an etwas verlieren, obwohl der Sinn erhalten bleibt. Das ein Gedicht und ein Handelsvertrag anders in der Übersetzung zu bewerten sind, liegt aber wohl eher daran, dass Gedichte nach einzigartigen künstlerischen und einzelsprachspezifischen Gesichtspunkten geschrieben werden, wie zum Beispiel die Wortlänge und dem musikalischen Gehalt (Metrik). In der Tradition der Universalisten sagt Karl Popper, dass was auch immer in einer Sprache ausgedrückt wird in eine andere, mit relativem Aufwand, übersetzt werden kann. (Popper 1970) Ein konkretes Beispiel: Das Volk der Pintupi hat ein Wort für das Loch welches durch einen Goanna (besonderer australischer Waran) bei seinem Durchbruch an die Oberfläche nach seinem Winterschlaf hinterlassen wurde – oder in der Sprache der Pintupi katarta. Man kann also mit vielen Wörtern das gleiche Konzept beschreiben und so können auch Nuancen durch, mehr oder weniger aufwendiges, das Paraphrasieren übertragen werden. Ein Problem am Rande ist das der Synonyme. Da jedes Wort sozusagen einzigartig ist, gibt es nach der starken Auslegung der These keine absolut deckungsgleichen Synonyme, denn Form und Bedeutung sind untrennbar miteinander verknüpft. (Fish 1980)

Der starke Whofianismus ist kaum zu vertreten und findet auch kaum Anhänger. Er ist eher als extremster Punkt auf einer Skala zu sehen, der zur theoretischen Orientierung geschaffen wurde. Als Knockoutkriterium führt Pinker den Fall des Immigrantenkinds Idlefonso an, der völlig ohne Sprache und trotzdem intelligent mit mathematischen Fähigkeiten ausgestattet war. Er konnte später dann sogar Zeichensprache lernen. Wie hätte er dies ohne Denken leisten können? (Schaller 1991)

Der schwache Whorfianismus macht dagegen einige Abstriche und ist weit schwerer zu widerlegen. Anstatt zu behaupten, dass die Sprache das Denken determiniert, wird behauptet, dass die Sprache das Denken nur beeinflusst und damit die Wahrnehmung der Realität. Die Sprache wird auch nicht als losgelöstes System (Saussure: Langue) betrachtet sondern konkret in den sozio-kulturellen Kontext gesetzt (Saussure: Parole). Die Varietäten und die Sprachgewohnheiten der Sprecher werden zum beeinflussenden Element erklärt. So sagt Sapir, dass die Gesellschaften unterschiedliche Welten sind und nicht eine Welt mit unterschiedlichem Etikett. (Sapir 1929, 1958)

Auf dieser These beruht auch die Aussage, dass die Inuit eine Vielzahl von einzigartigen Wörtern für Schnee haben, was von Pullum widerlegt wurde und hier nicht aufgegriffen werden soll, da schon genug darüber geschrieben wurde. (Pullum 1994) Zu einiger Berühmtheit sind die Farbtests gelangt bei denen Ureinwohner unterschiedliche Farben kategorisieren sollten. Fokalfarben (z.B. rot) für die sie Wörter hatten, konnten von ihnen akkurater identifiziert werden. Nun bekamen sie aber 16 Wörter einer anderen Sprache beigebracht, um die Farben zu benennen - darunter 8 Wörter für Fokalfarben und 8 Wörter für Nebenfarben - und siehe da, wieder wurden die Fokalfarben akkurater mit den entsprechenden Fremdwörtern belegt. Pinker sagt dazu, dass wir die Farben so lernen wie wir sie auch wahrnehmen und nicht umgekehrt.

Das Farbspektrum

Was wir wissen ist, dass das Gehirn Assoziationen zwischen semantischen Konzepten und phonetischen Repräsentationen lagert, wobei die Anfangslaute wichtiger als die Endlaute sind. Beziehung zwischen unterschiedlichen semantischen Konzepten, die nur indirekten Bezug zueinander haben, werden ebenso gelagert und können durch phonetische Ähnlichkeit leichter abgerufen werden. Die Sprache unseres Denkens ist aber wahrscheinlich nicht die der natürlichen Sprache mit der wir uns auch verbal unterhalten. Man geht davon aus, dass es sich um eine metasprachlichen Vorform davon handelt, welche mit Konzepten arbeitet, das sogenannte Mentalesisch, was aber auch umstritten ist und weiteren Erklärungsbedarf bräuchte.

Die Phrenologie als Vorläufer der Theorie zur Modularität des Geistes

Greenberg, der lange Zeit die universellen Elemente in der Sprache gesucht und untersucht hat, bestätigt, dass der Anteil an fundamentalen Elementen in der menschlichen Verhaltensweise unter verschiedensten Sprachen größer ist als die idiosynkratischen Unterschieden, welche die Theorie des linguistischen Relativismus vorhersagt.

Ein weiteres Beispiel gegen linguistischen Relativismus ist das Argument von Fodor, dass es einfache Module für visuelle Wahrnehmung gibt, welche nicht von der Sprache beeinflusst werden können.

Die bekannte Müller-Lyer Illusion

Obwohl ich weiß, dass die Striche gleich lang sind, nehme ich sie nicht als gleich lang wahr. Die linguistische Information modifiziert also das entsprechende Wahrnehmungsmodul nicht und ergo hat die Sprache keinen Einfluss auf meine Wahrnehmung und so ist der linguistische Relativismus widerlegt. Allerdings wissen wir nicht wie Module für höhere neuronale Prozesse funktionieren und so können Aspekte des linguistischen Relativismus trotzdem wahr sein und die Einflussnahme der Sprache subtiler als bisher gedacht sein. (Fodor 1984) Es gilt deshalb sich folgende Frage zu stellen: Welche Aspekte der Sprache beeinflussen das Denken in systematischer Weise und wie stark ist dieser Einfluss?

_

Wittgenstein, L. Logisch-philosophische Abhandlung (Tractatus Logico-Philosophicus). 1921.

Humboldt, W. von. Über die Verschiedenheit des menschlichen Sprachbaus und seinen Einfluss auf die geistige Entwicklung des Menschengeschlechts. 1836.

Sapir, E. Language: An introduction to the study of speech. 1929.

Whorf, B. L. Science and Linguistics. In: Technology Review 42 (6): 229-31, 247-8. 1940.

Chandler, D. The Act of Writing. 1995.

Popper, K. Normal Science and its Dangers. In: Criticism and the Growth of Knowledge. 1970

Fisher, S. Is There a Text in This Class? The Authority of Interpretative Communities. 1980

Pullum, G. K. The Great Eskimo Vocabulary Hoax and Other Irreverent Essays on the Study of Language. 1991.

Pinker, S. The Language Instinct. 1994

Wednesday, 18 June 2008

A Kanji - A Day (18.06.08)

Halftime!

31. 耳 means "an ear", "an edge" or "an salvage" and is read:

On-Yomi:
a) ジ (ji), e.g. 耳鼻科（じびか） / an ear, nose and throat hospital

Kun-Yomi:
b) みみ (mimi), e.g. 耳が早い（みみがはやい） / have keen ears

32. 七 means "seven" and is read:

On-Yomi:
a) シチ (shichi), e.g. 七月（しちがつ） / July

Kun-Yomi:
b) なな (nana), e.g. 親の七光（おやのななひかり） / the influence of parents
c) ななつ (nanatsu), e.g. 七つ（ななつ） / seven
d) なの (nano), e.g. 七日（なのか） / seven days, the seventh day

33. 車 means "a car" or "a vehicle" and is read:

On-Yomi:
a) シャ (sha), e.g. ぴかぴかの新車（ぴかぴかのしんしゃ） / a shiny new car

Kun-Yomi:
b) くるま (kuruma), e.g. 車椅子（くるまいす） / a wheelchair

34. 手 means "a hand", "a worker", "a means", "a device", "a way", "a trick", "an idea", "a kind", "trouble", "labor" or "care" and is read:

On-Yomi:
a) シュ (shu), e.g. 握手（あくしゅ） / shaking hands, a handshake

Kun-Yomi:
b) て (te), e.g. もっといい手が有る（もっといいてがある） / There is a better way.
c) た (ta), e.g. 手綱（たづな） / reins

35. 十 means "ten" and is read:

On-Yomi:
a) ジュウ (juu), e.g. 十月（じゅうがつ） / October
b) ジッ (ji|), e.g. 十戒（じっかい） / the ten commandments

Kun-Yomi:
c) とお (too), e.g. 十日（とおか） / ten days, the tenth day
d) と (to), e.g. 十人十色（じゅうにんといろ） / So many men, so many minds.

36. 出 means "go out", "attend", "enter" , "participate" , "come out" , "appear" , "rise" , "start" , "leave" , "take out" , "get out" , "show" , "produce" , "serve" , "pay" , "put out" , "an appearance" or "one's origin" and is read:

On-Yomi:
a) シュツ (shutsu), e.g. 出資（しゅっし） / investment
b) スイ (sui), e.g. 出納係（すいとうがかり） / a cashier, a teller

Kun-Yomi:
c) で (de), e.g. 出口（でぐち） / an exit, a way out
d) でる (deru), e.g. 買い物に出かける（かいものにでかける） / go out for shopping
e) だす (dasu), e.g. お茶を出す（おちゃをだす） / serve green tea

37. 女 means "woman", "the female", "a lady", "one's mistress" or "one's woman" and is read:

On-Yomi:
a) ジョ (jo), e.g. 女性（じょせい） / the female, a woman
b) ニョ (nyo), e.g. 女官（にょかん） / a court lady
c) ニョウ (nyou), e.g. 女房（にょうぼう） / one's wife

Kun-Yomi:
d) おんな (onna), e.g. 女物（おんなもの） / for ladies, ladies'
e) め (me), e.g. 女神（めがみ） / a goddess

38. 小 means "small", "little" or "tiny" and is read:

On-Yomi:
a) ショウ (shou), e.g. 小学校（しょうがっこう） / a primary (elementary) school

Kun-Yomi:
b) ちいさい (chiisai), e.g. 小さなコアラ（ちいさなこあら） / a small koala
c) こ (ko), e.g. 小石（こいし） / a pebble, a small stone, gravel
d) お (o), e.g. 小川（おがわ） / a stream, a creek, a brook, a rivulet
e) さ (sa), e.g. 小夜曲（さよきょく） / a serenade

39. 上 means "the upper part", "a higher place", "go up", "rise", "lift up", "get in", "get out", "finish" or "over" and is read:

On-Yomi:
a) ジョウ (jou), e.g. 上達する（じょうたつする） / become skillful, get better, make progress, improve, advance

Kun-Yomi:
b) うえ (ue), e.g. 机の上に～（つくえのうえに） / on the desk
c) うわ (uwa), e.g. 上着（うわぎ） / outerwear, a coat, a jacket
d) かみ (kami), e.g. 川上（かわかみ） / the upper part of a river, upstream
e) あげる (ageru), e.g. 棚に上げる（たなにあげる） / put ~ on the shelf
f) あがる (agaru), e.g. 階段を上がる（かいだんをあがる） / go up the stairs
g) あがり (agari), e.g. 病み上がり（やみあがり） / convalescence
h) のぼる (nobaru), e.g. 噂に上る（うわさにのぼる） / be talked about

40. 森 means "woods", "forest" or "a grove" and is read:

On-Yomi:
a) シン (shin), e.g. 森林（しんりん） / woods, a forest

Kun-Yomi:
b) もり (mori), e.g. 鎮守の森（ちんじゅのもり） / the grove of the village shrine

Speech examples can be found at the Japanese Kanji Dictionary.
Stroke order can be found at the Kanji Land.

First Class (40/80 signs): 一右雨円王音下火花貝学気休玉金九空月犬見五口校左三山四子糸字耳七車手十出女小上森人水正生青石赤先千川早草足村大男竹中虫町天田土二日入年白八百文本名木目夕立力林六

Points of Interest 18. June, 2008

Long time no see!

1. Who cares about society? Let's have gay sex! It depends on you own choice and your genetics: Society's Attitudes Have Little Impact On Choice Of Sexual Partner

2. I'll become rich and you'll die of cancer because I know everything about you: Seeing ourselves / Seeing others - built in errors

3. There are only 4 elements: Earth, Wind, Fire, Water and not 100: Only a theory!

4. My neolithic stone projectile is going to be so big that it'll blast away the whole universe: Culture does, in fact, optimize

5. I can write this shit since my brain's so big. Actually it is has something to do with gradualism and integration: Building a New Brain from Old Parts

6. At least this post will be a success to our species: Humor Shown To Be Fundamental To Our Success As A Species

7. Oh well, I got lice that's why I'm talking rubbish: Can parasites influence the language we speak?

Sunday, 15 June 2008

Kanake mit Migrationshintergrund

Nicht nur in unserer Esskultur passen wir uns dem großen Bruder jenseits vom großen Teich an; Nein, inzwischen haben wir sogar unser eigenes "Afro-Amerikaner-Phänomen". Deutscher mit Migrationshintergrund (DMM) liest man häufig, vor allem wenn es um Jugendliche und Strafrecht geht und genau dieser Kontext verleiht dem Begriff eine negative Konnotation. Es würde niemandem einfallen diesen Begriff auf die stille chinesische Biologiestudentin anzuwenden. Was man mit dieser pejorativ gewordenen Phrase eigentlich meint ist krimineller Kanake, Mulatte oder Ausländer, aber das darf man ja nicht so schreiben - wissen was damit gemeint ist tut jeder, macht das die Sache etwa besser?

Ja, jedenfalls denken so die Medienmacher Deutschlands. Neue Bezeichnung ist gleich neuer Inhalt und so geizt man nicht mit kreativen Neologismen. Doch Vorsicht, was den Schreiberlingen noch nicht verraten wurde ist, dass Euphemismen nicht losgelöst von ihren Vorgängern sind und irgendwann eingeholt werden von ihrer Vergangenheit. Dieses Prinzip wurde von Pinker als Euphemismus-Tretmühle bezeichnet und lässt sich gut anhand des DMM-Phänomens beschreiben. So kann der Begriff zwar anfangs eine gewisse Neutralität haben, wird aber - nicht zuletzt durch den Kontext negativ - nach Gresham's Law, und übernimmt die dominanten Konnotationen des Vorgängers.
Dieser Prozess ist keineswegs neu und wurde schon 1933 von Orwell beobachtet. Wenn wir zurück in der Geschichte gehen, sehen wir einen generellen Trend unliebsam Gewordenes neu zu benennen. So bezeichnet Kanake eigentlich ein Mensch vom Volk der Kanak in Neukaledonien. Über die hanseatische Schifffahrt kam der Begriff, welcher die Matrosen eigentlich als Honoration - also positiv besetzt - für ihre polynesischen Kameraden verwendeten, nach Deutschland und wurde durch die Lautähnlichkeit zu "Hannak" negativ. Hannak war ursprünglich die Bezeichnung für eine tschechische Volksgruppe und in Berlin als Synonym zu Bösartigkeit und Niederträchtigkeit in Gebrauch.
Soviel zum Thema "Aus der Geschichte lernen". Vielleicht kann man sich in 5 Jahren auf Prozentualdeutscher einigen und das Spiel geht von vorne los.

Wednesday, 11 June 2008

Points of Interest 11. June, 2008

1. The old story - biologist does a paper about language with cool phylogenetic, cladistic or mathematic models and it's published without a consult from a linguist: Math and its uses

2. Beautiful Japanese poem from Ryôkan translated by No-sword: Moment of silence

3. Growing new brain cells enhanced by social contact

4. The neural circuits of free choice

5. The End of Akiba? No.

6. Serotonin Link To Impulsivity, Decision-making, Confirmed

7. 'Super Paper:' New Nanopaper More Break-resistant Than Cast Iron

8. Robot Asimo can understand three voices at once

9. Have we begun to crack the brain's code?

Monday, 9 June 2008

Points of Interest 09. June, 2008

1. German literature and foreign language puns

2. Handling exponential growth in demographic model

3. Sex differences in judging attractiveness - brain correlates

4. Body position affects memory for events

5. What's more convincing than talking about brains? Pictures of brains!

6. Toolmaking and Speech

7. Honeybee dance breaks down cultural barrier

Wednesday, 4 June 2008

Points of Interest 04. June, 2008

1. The evolution of language. It's not nature ~~versus~~ nurture but nature and nurture in my opinion: The Workings of Co-Evolution

2. The evolution of music. Check out the article on Nature if you can: The Evolution of Music

3. All hail our chimpanzee leaders: Do chimpanzees have a theory of mind? 30 years later

4. Ficition versus reality - the difference in a MRI: Meeting George Bush versus Meeting Cinderella

5. Buehler deserves some respect, bitches: Karl Bühler/Buhler/Buehler on the Evolution of Language

6. I'd nominate this headline for the most freaky one I've ever seen and beware the article itself is even stranger: Zombie caterpillars controlled by voodoo wasps

7. Oh well, too good to be true: Good News In Our DNA: Defects You Can Fix With Vitamins And Minerals

8. Cannabis makes you even more stupid (you have to be stupid to do drugs): Long-term Cannabis Users May Have Structural Brain Abnormalities

9. Micro-robots Dance On Something Smaller Than A Pin's Head

Saturday, 31 May 2008

Semantic Web

Today an interesting article was posted on New Scientist about the Semantic Web - unfortunately, not available for me yet. Everybody's talking about Web 2.0 and nobody knows what it's all about. The real thing that's going on at the moment is the so called Semantic Web project.
Tim Berners-Lee, so to say the founder of the World Wide Web and W3C head, suggested a new dimension of the Internet; Please note that the Internet and the WWW are not the same thing but often used equally and so do I. He expressed it by following words:

I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A 'Semantic Web', which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The 'intelligent agents' people have touted for ages will finally materialize. (Wikipedia)

This is, by the way, one topic of my, hopefully, new course of study - computational linguistics. The idea is to categorise and connect ideas in a way that a machine can understand and interact on a (semi-)natural with humans, e.g. via meta-information:

Non-semantic web (Web 1.0 and Web 2.0):

<.item>Cat<./item>

Semantic web (part of Web 3.0):

<.animal Kingdom="Animalia" Phylum="Chordata" Class="Mammalia" Order="Carnivora" Family="Felidae" Genus="Felis">Cat<./animal> (Wikipedia)

This should create a Serendipity effect as many users already experience when they do 'wikihopping'. A very advanced usage would be that a user can now ask the web:

Person: "Oh, I have a lovely cat at home and could you please tell me the class of my cat?"
Semantic WWW: "Of course sir/madam. You're from Europe, Republic of Ireland so the most likely class would be Mammalia. May I show you some images for verification or do you want to know something more?"

or:

Person: "When did the Berlin wall go down?"
Semantic WWW: "In 1989. Do you want some videos related to the event?"

Because search engines are dumb and can't understand meaning, you'll receive often inaccurate answers:

1. BBC ON THIS DAY | 9 | 1989: The night the Wall came down
2. When did the Berlin Wall go down? - Blurtit
3. Why did the berlin wall come down? - Yahoo! Answers (Google)

The idea of meta data is, infact, not new. Via HTML, especially Dublin Core, you can realise meta data (via <.meta>) but almost nobody uses it because it is much more work to do and search engines do not use meta data to index websites anymore and many browsers don't care at all. This will also be one of the main problems for a global Semantic Web. Many coders/users just don't care about meta information. So you'll have to make them care with prescriptions or to make it interesting to use meta tags.

Thursday, 29 May 2008

A Kanji - A Day (29.05.08)

I'm a bit exicted since the fourth edition of Dungeons and Dragons will be released very soon (06.06.08). Nevertheless, I have to continue learning Japanese.

26. 山 means "a mountain" and is read:

On-Yomi:
a) サン (san), e.g. 火山（かざん） / a volcano

Kun-Yomi:
b) やま (yama), e.g. 山場（やまば） / the climax, the peak, the crisis

27. 四 means "four" and is read:

On-Yomi:
a) シ (shi), e.g. 四月（しがつ） / April

Kun-Yomi:
b) よつ (yotsu), e.g. 四つ角（よつかど） / a crossroads
c) よっつ (yottsu), e.g. 四つ（よっつ） / four
d) よん (yon), e.g. 四輪駆動（よんりんくどう） / 4WD

28. 子 means "child", "kid", "baby" or "an infant" and is read:

On-Yomi:
a) シ (shi), e.g. 子午線（しごせん） / the meridian
b) ス (su), e.g. 様子（ようす） / appearances, a look, the situation, the state of things, circumstances

Kun-Yomi:
c) こ (ko), e.g. 子分（こぶん） / a follower, a henchman, a following

29. 糸 means "thread", "yarn" or "string" and is read:

On-Yomi:
a) シ (shi), e.g. 一糸乱れず（いっしみだれず） / in precise order, in perfect coordination

Kun-Yomi:
b) いと (ito), e.g. 絹糸（きぬいと） / silk thread

30. 字 means "a character", "a letter", "writting" or "script" and is read:

On-Yomi:
a) ジ (ji), e.g. 文字（もじ） / a character, a letter

Kun-Yomi:
b) あざ (aza), e.g. 字（あざ） / a section, a township

Speech examples can be found at the Japanese Kanji Dictionary.
Stroke order can be found at the Kanji Land.

First Class (30/80 signs): 一右雨円王音下火花貝学気休玉金九空月犬見五口校左三山四子糸字耳七車手十出女小上森人水正生青石赤先千川早草足村大男竹中虫町天田土二日入年白八百文本名木目夕立力林六