
 
orientations of the character markers. Instead of 
texts, images, or 3-D graphics, sounds are played 
using the sound files in the word list. When the user 
selects the voice mode, a voice synthesizer reads out 
loud for the recognized word.  
3.3 Software Architecture 
Figure 3 shows the software architecture of the 
prototype system. The software is mainly composed 
of image processing and presentation parts. The two 
parts have a server-client relationship, and they 
exchange data using socket communications. We 
used UDP/IP to support multiple clients that transmit 
and receive control signals with a server on the 
network. The multiple clients can share the 
information based on the recognized word at the 
server. This software architecture allows users to 
apply the system to support for telecommunication. 
For example, a presenter makes a speech using 
keywords while the video images are processed at 
the server. The processed information is then 
multicast to remote sites where the audiences see the 
video with the keywords translated to their mother 
tongues. 
 
 
Figure 3: Software architecture. 
 
Character Recognition
 
The way an image is processed depends on the 
number of characters on the character marker. First, 
the OCR tries to recognize some characters in order 
to detect words in the character marker. If nothing or 
only one character is found, the process is passed to 
ARToolkit. If the character matches another one in 
the pre-registered marker data, that character is then 
used as a control command. Otherwise, the message 
“no word” appears on the display. If the OCR 
detects some words, they are checked as possible 
matches against the words in the word list.  
We used ARToolkit for identifying a control 
command because the number of control commands 
is currently around 30, and registering the markers is 
not tedious work. The ARToolkit was also used for 
detecting square frames—determining their position 
and orientation. 
We used the OCR middleware tool “Yonde!! 
KoKo” (A.I. Soft) for the Japanese character 
recognition. OCR is well suited for extracting the 
characters of specific languages even though it does 
not recognize the script or special symbols in minor 
languages. The OCR tool recognizes the JIS first-
level kanji and hiragana, katakana, the English 
alphabet, and Roman numerals. These characters do 
not have to be registered in advance. 
Word Detection
 
An original word list was created for conducting 
the language translation because using a word list is 
efficient on word detection. We did not use a 
commercial translator. The translation is done by 
referring to the compact word list from a  specific 
field of interest to the user. In the prototype system, 
Hiragana, Katakana, and Kanji were designated as 
the source characters, and everyday English or Thai 
words were the destination words.  
The word detection has not been perfect, and 
incorrect results were sometimes obtained. The main 
reason for this was that the characters derived from 
the OCR were not always correct, depending on the 
lighting condition and the movements of the 
character marker. We observed that a statistically 
significant number of characters were recognized 
incorrectly. For example, the small characters of 
Japanese Hiragana were often mistaken for the 
normal-size Hiragana characters.  
The word detection was thus conducted using the 
following correction process. The characters that 
were recognized by the OCR were first looked up in 
the word list. If the group of characters did not 
match any words in the word list, one character at a 
time in the group was substituted with a candidate 
character, and the corrected group was then checked 
against the word list. This process was repeated until 
a match was found. 
Reducing the number of times the page is refreshed
 
The contents were presented in a layout that was 
based on the position of the markers held by the user. 
An HTML file was created to place the text, images, 
or 3-D models at the marker positions in the real 
scene. A page is renewed by recreating and 
reloading the new file. However, refreshing the page 
frequently destabilizes the presentation, and should 
thus be kept to a minimum.  
The change in position of the virtual objects was 
controlled smoothly without renewing the page by 
WEBIST 2006 - WEB INTERFACES AND APPLICATIONS
484