Humans are generally believed to have five senses: sight, sound, taste, smell, and touch/feel. Beyond these senses, humans developed language skills to allow sophisticated information to be shared. But how did we communicate before we had language?
This thought caused me to recall a primitive community in which members communicate effectively with gestures: members demonstrate what they see, imitate what they hear, have others taste what they taste, and have others touch what they touch. This kind of simple communication is symmetrical and the series of gestures can transfer meaning in both directions. Think about the handshake, for example. Shaking hand sends a friendly signal from A to B, or B to A. To establish good social relations, this symmetrical message transfer is favorable; without it, maintaining efficient social relationships would be difficult, and social relationships are the most prominent feature of humankind.
This primitive communication means often requires an object to be sensed, tasted, or sniffed, without which it would be impossible to transfer human experience to others. Listening and talking is much easier than confining communication to the other senses. The human throat makes it possible to imitate others' voices or sounds and, sometimes, to make modifications if the speaker desires.
As communication becomes more complicated, our brains play a bigger role in creating more sophisticated messages. Helped by the brain, we can exchange deeper meaning. Here, of course, the brain works in real time. Surrounded by competing animals, our communication must be processed in real time. Lacking quick response, it would be difficult to avoid attack or to cope with environmental change. Under the real time processing environment, utterances by the throat and interpretation through Corti's organ are helped by the language area in the brain; everything proceeds in parallel. For efficient communication, this real time processing is very important, especially during debates or conflict.
What if humans could not keep up with real time communication? What would happen if we couldn't respond until we understood the other person's meaning after five minutes had passed? During the wait, there may be times when opinions and the environment change. If three people are talking, 10 minutes may have passed before you understand what the other two have to say. Discussion could not continue, which highlights the importance of real time understanding.
Humans successfully developed a safe environment that allows enough time to consider what is being said. Taking into account all acquired knowledge, humans can extrapolate details to more general knowledge. With the help of language, this process becomes more efficient. Persons with sufficient intelligence discovered the laws governing natural phenomena. They devised tools for hunting and created new strategies. They may even have prayed to Heaven for good hunting. If a new law is promulgated in common language, the knowledge is shared by the group, signaling birth of a new civilization.
In most cases, censored signals do not reach the brain directly; rather, they are converted to other type signals, which are called "primitives." With voice, the primitive is called formant, a pattern of peak frequencies on the time-frequency plane, or phoneme, a phonetic symbol of spoken language. For images, the primitive is an edge vector corresponding to edge features in the image.
The human brain can process varied information, among which images consume the largest memory in the brain. Higher animals seem to rely on image information more than do lower level creatures. Image sensors are generally large. Compared with auditory sensors, touch sensors, olfactory sensors or taste sensors, the visual organ is huge if we include the visual nerve system comprising everything from the retina to the hippocampus. The visual processing area in the brain holds the largest area, more than 40% (John S. Werner, et al., "Illusory Color and the Brain," Scientific American, March 2007). Taken as a whole, humans spend lots to process images. This means that without vision, we could use this huge cost for other purposes, like the famous history scholar Hokiichi Hanawa during the Edo Era, in Japan, or organist Helmut Walcha, in Germany. Both were blind and famous for their remarkably strong memory.
Images formed on the retina provide high resolution visual objects with a tremendous amount of information within a unit time, compared with other sensory signals. For data storage, an image requires huge memory. In recording voice data, for example, the data is interpreted as voice signals or meta expression, which is quite efficient. On the other hand, image data may be stored as object meta data, or as primitives in the brain.
This latter case is presumed as a result of research on a unique group of people called savants, who can memorize scenes in detail and remember them after long periods. The pictures drawn by this person with a pen take several days to complete. Pen drawings comprise lines and edges, which are more memory efficient primitive features than pixels. This primitive is also effective for expressing features; an object is often expressed by boundary lines only. By appropriately integrating these primitives, we can create a meaningful figure (see column 20: "Weak Classifier and Strong Classifier"), (Frank Weblin, Botond Roska, "The Movies in Our Eyes," Scientific American, April 2007). There are already several known primitives. We may propose a primitive as a building block of an image. Using these building blocks, we may create any kind of picture, just as any sentence is created by a character set.
Another example is shown in figure 1, in which finger print images were synthesized from simple edge primitives. In this figure, a 13 x 13 pixel block is expressed by an 8-bit edge vector, which has 1/169 times less information than the original pixel image. In an experiment performed by A. Ross, et al., the synthesized images from these primitive vectors are quite similar to the original image, especially image features (minutiae) (A. Ross, et al; "From Template to Image: Reconstructing Fingerprints from Minutiae Points", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 29, No.4, pp. 544-560, April 2007).
(a) is the original image, (b), the primitive expression for each 13 by 13
pixel block, and (c) is synthesized from (b), which shows typical image
features of the original image (a).
Figure1: (a) Finger print image, (b) edge feature and (c) synthesized image