orientations of the character markers. Instead of
texts, images, or 3-D graphics, sounds are played
using the sound files in the word list. When the user
selects the voice mode, a voice synthesizer reads out
loud for the recognized word.
3.3 Software Architecture
Figure 3 shows the software architecture of the
prototype system. The software is mainly composed
of image processing and presentation parts. The two
parts have a server-client relationship, and they
exchange data using socket communications. We
used UDP/IP to support multiple clients that transmit
and receive control signals with a server on the
network. The multiple clients can share the
information based on the recognized word at the
server. This software architecture allows users to
apply the system to support for telecommunication.
For example, a presenter makes a speech using
keywords while the video images are processed at
the server. The processed information is then
multicast to remote sites where the audiences see the
video with the keywords translated to their mother
tongues.
Figure 3: Software architecture.
Character Recognition
The way an image is processed depends on the
number of characters on the character marker. First,
the OCR tries to recognize some characters in order
to detect words in the character marker. If nothing or
only one character is found, the process is passed to
ARToolkit. If the character matches another one in
the pre-registered marker data, that character is then
used as a control command. Otherwise, the message
“no word” appears on the display. If the OCR
detects some words, they are checked as possible
matches against the words in the word list.
We used ARToolkit for identifying a control
command because the number of control commands
is currently around 30, and registering the markers is
not tedious work. The ARToolkit was also used for
detecting square frames—determining their position
and orientation.
We used the OCR middleware tool “Yonde!!
KoKo” (A.I. Soft) for the Japanese character
recognition. OCR is well suited for extracting the
characters of specific languages even though it does
not recognize the script or special symbols in minor
languages. The OCR tool recognizes the JIS first-
level kanji and hiragana, katakana, the English
alphabet, and Roman numerals. These characters do
not have to be registered in advance.
Word Detection
An original word list was created for conducting
the language translation because using a word list is
efficient on word detection. We did not use a
commercial translator. The translation is done by
referring to the compact word list from a specific
field of interest to the user. In the prototype system,
Hiragana, Katakana, and Kanji were designated as
the source characters, and everyday English or Thai
words were the destination words.
The word detection has not been perfect, and
incorrect results were sometimes obtained. The main
reason for this was that the characters derived from
the OCR were not always correct, depending on the
lighting condition and the movements of the
character marker. We observed that a statistically
significant number of characters were recognized
incorrectly. For example, the small characters of
Japanese Hiragana were often mistaken for the
normal-size Hiragana characters.
The word detection was thus conducted using the
following correction process. The characters that
were recognized by the OCR were first looked up in
the word list. If the group of characters did not
match any words in the word list, one character at a
time in the group was substituted with a candidate
character, and the corrected group was then checked
against the word list. This process was repeated until
a match was found.
Reducing the number of times the page is refreshed
The contents were presented in a layout that was
based on the position of the markers held by the user.
An HTML file was created to place the text, images,
or 3-D models at the marker positions in the real
scene. A page is renewed by recreating and
reloading the new file. However, refreshing the page
frequently destabilizes the presentation, and should
thus be kept to a minimum.
The change in position of the virtual objects was
controlled smoothly without renewing the page by
WEBIST 2006 - WEB INTERFACES AND APPLICATIONS
484