Dr. JH, a researcher of image processing, visited Beijing for his conference presentation. As we rode from his hotel to our research center, I told him that a car ahead of us had a license plate beginning with
, indicating military use. He immediately said, "Oh I can see a number 4 in the first Chinese character," which had never occurred to me (Fig.1).
Fig. 1: A Chinese character font,
, which means army This font is a modern font and includes a numeral pattern "4" inside. This was pointed out by Dr. JH, like (b). The font (c) is an old font, which is still used in Japan.
It has been three years since I stayed in Beijing. I suffered knee pain for a while, so I continued exercising on a room treadmill. When I resumed jogging outdoors, I was surprised that the scenery seemed to shake as I ran (Fig. 2), causing me to get dizzy. A few minutes later, my sight had cleared. Since then, I have sometimes had the same experience after a long layoff from outdoor.
Fig. 2: Unstable view
Under normal conditions, the outdoor view looks like (a), but when I resumed outdoor jogging after a long break, the scenery looked like (b) until the image stabilized after a few minutes. This will not happen when the eye is accustomed to a vibrated view.
As mentioned in a previous column (No. 24, "Show Others What You Have Imagined"), image recognition is not based on pixel level recognition, but on much more complex units. In Fig.3, each character has vertical, horizontal edges (strokes), which we sometimes call primitives. Based on the relative position of each stroke, we can recognize the character
Most people in today's information society unconsciously read characters daily by identifying strokes. They regard a character as a primitive, which is instantly recognizable. A Chinese or Japanese person finds it easy to recognize the character in Fig.3, disregarding small differences. Dr. JH, who was born in the USA, was not familiar with East Asian characters. He's familiar with alphabetic characters and numbers. This is the reason he responded with sub-pattern "4" in Fig.1.
Fig. 3: Some examples of primitives that compose the Chinese character,
The primitives in the figure are identical with the strokes. The basic pattern composed of primitives plays an important role in pattern recognition.
Another feature of human visual perception is its quickness and robustness of pattern recognition against image vibration. Under a vibrating environment, the vision system tracks the pattern in real time as if there were no vibration. In everyday life, we rarely realize our abilities. With prologue 2, I experienced difficulties in catching up with a just perceived previous image, because most of the far away patterns vibrate large distances compared to nearby objects, whose vibrating distances look fairly short considering the size of the object. This is the main cause for difficulty in tracking the object patterns as shown in Fig. 2(a) and Fig. 2(b), which eventually caused my dizziness. This process is believed to be handled in the brain. If the perceived pattern has a shape like a character, perceptional strength is much stronger and stable. Small pattern fluctuations observable in handwritten characters do not affect recognition very much. This stable recognition is theresult of pattern classification in the brain.
Fig. 4 shows this powerful classification function. There is a photo of a bear at the center, then several illustrations surround the photo. If you carefully compare the photo and illustrations, they differ greatly. But most people, regardless of nationality or age, recognize that these illustrations are bears. It is hard to explain why these drawings are recognized as bears.
Fig. 4: Photo of a bear with illustrations
The real photo of the bear does not resemble the illustrations, but most people see those illustrations are "bears."
As the bear illustrations show, the human classification function is powerful. Here is another example. In Fig. 5, the font "A" is shown with 6 vertical slices of lines (a1). Using a one dimensional display like (a2), if the display pixels corresponding to the font body sliced by (a1) each line, one after another, are turned-on, the same font "A" will be observable. Fig.5 (b) is an example. Anybody can recognize the font A from afar. If the one dimensional display moves slower and keeps the same periodicity of pixel-switch-on, what will happen? You may guess the font shape will be compressed horizontally (c), or (d). Still, it looks like (b), a normal image. If the experiment is conducted in real darkness, although some training is required, the pattern "A" is still visible even if the one dimensional display stands still. In case (c), (d) or the last case, the pattern "A" moves leftward.
Fig. 5: Font displayed using moving one dimensional display
Using a one dimensional display, the character font is synthesized stably regardless of the speed at which it's moved. Pattern (b) is stably observable, not (c) or (d).
Some of you may have the experience of observing the one dimensional display advertisement that used to be in the tunnel just before the Haneda Airport monorail station in Tokyo. When I saw it for the first time, I was amazed at the accuracy with which the image signal was synchronized with car speed. In actuality, the image synchronization was handled by the human brain. A much newer version of a one dimensional display can be viwed at the following URL;
Why does this phenomenon happen? Human eyes constantly vibrate and they try to catch known patterns in sight, and then try to trace the patterns to stabilize the perceived image. The pattern established in the brain works as a template for searching the same pattern in the perceived image. Helped by this mechanism, the perceived image remains stable, even if the whole image is vibrating. We can identify the outside image from a moving car, or, keep reading a book while scanning the page. When an unknown complicated image is given, people will try to find something identifiable. I remember my boyhood when I tried to find something reasonable in the texture image that appeared in wood planks on the ceiling.
Fig. 6 illustrates a human face made up of facial parts. By changing the combination, the expression looks different, doesn't it?
Fig. 6: Experimental facial pattern built from various facial parts (By Prof. KH)
With the appropriate combination of facial parts, various expressions are synthesized.
There are some common rules to compose those parts as illustrated at lower part of the figure.
Fig.7 shows another type of facial appearance that depends on the location and transformation of each facial part. Using the same parts, with similarity based modification, I tried to show a typical Asian and Caucasian face.
Fig. 7: Experimental expression of Asian and Caucasian faces based on the same parts but with different modification rules.
In Fig. 8, we see the artwork of professional artist Prof. KH.
Fig. 8: Rich expressions made from various costume parts (By Prof. KH)
Another difficulty of image synthesis is the labor cost to complete it. I am sure that the artwork in Altamira Cave was made by a talented person after much effort. Even now, special skill is required to be a professional painter. Time is also required for the drawing itself. However, we can use a camera for this purpose, which overcomes one of the great barriers of the past. Even the resolution can be freely selectable.
An image provides only ambiguous information compared with text. Even so, sometimes it imparts a strong impact. In the image, lines, which are the major component of illustrations, function like a character in the text; with an appropriate combination of lines, any illustration is possible. One of the most successful examples is the story comic (gekiga). The key feature of gekiga is the appropriate combination of illustration and text; here the illustration is often metamorphosed and emphasized. Gekiga is an artform comprising rich story by text.
In ancient times, written documents were made by expert "scribes." Three thousand years later, anybody can write text. Font style has evolved so that it can be memorized and used by anybody. Description style has also evolved to be used for varied fascinating literary works. Now, most paintings are drawn by experts called "painters."
When I visited the Software and Microelectronics School of Peking University, one of the graduate students introduced his own work. He then wanted my confirmation: "Is it true that many people in Japan enjoy comics or gekiga during a commute on a bus or train?" One professor also told me, "My daughter insists that the most famous hero among animation lovers is "Konan"; you know Konan of course." I knew this name, but I was surprised that Japanese animation had such a strong impact on Chinese young aged alike.
Animation or gekiga is a potential goal. As a huge number of armature painters are now emerging, the future environment will be more cultivable. As everybody uses word processors for text writing, future picture processors may support us to describe imaginative and visionary expressions. Many approaches are already proposed: