A WEB-BASED MULTILINGUAL UTTERANCE COLLECTION

SYSTEM FOR THE MEDICAL FIELD

Takashi Yoshino, Taku Fukushima and Ryuichi Nisimura

Faculty of Systems Engineering, Wakayama University, 930 Sakaedani, Wakayama, Japan

Keywords:

Voice collection, Intercultural communication, Parallel texts, Medical ﬁeld, Web-based system.

Abstract:

We have developed a web-based multilingual utterance collection system, named OTOCKER, for the medical

ﬁeld. The purpose of OTOCKER is to act as a voice data collection platform for intercultural communica-

tion. Although speech synthesis systems have improved signiﬁcantly, ﬂuent and smooth speech synthesis is

still a problem. Currently, it is difﬁcult to synthesize speech in different languages. In the medical ﬁeld, in

particular, it is important that the intended meaning of spoken words be conveyed effectively along with the

different nuances. Therefore, we use an utterance collection system that collects peoplefs voices directly. The

limitations of this system are (1) an insufﬁcient number of correct sentences pertaining to the medical ﬁeld

and (2) that easy participation in both voice record and collection are difﬁcult. We can solve the ﬁrst problem

by using a system that collects parallel medical texts. The second problem can be solved by using w3voice

— web-based voice-recording system. This system can run only on a web browser. This paper presents the

design of OTOCKER, its prototype, and the results of its trial.

1 INTRODUCTION

Recently, there has been a sharp rise in the number

of foreigners visiting Japan. Moreover,the number of

foreign residents in Japan is increasing steadily. The

Immigration Service reported that approximately 6.73

million foreigners visited Japan in ﬁscal year 2006.

That is, there was a 10.0% increase as compared to

the previous year. The Immigration Service also re-

ported that the number of foreign residents in ﬁscal

year 2006 was approximately 2.08 million, which was

a 3.6% increase as compared to the previous year.

However, many of these foreigners who either live in

or visit Japan cannot speak Japanese. Further, there

are some people who can speak a language but do not

know how to read and write in that language. There-

fore, translation and voice support has to be provided

to such people. For instance, foreigners are now pro-

vided such voice and translation support in the ﬁeld of

medicine by a medical treatment interpreter. The in-

terpreter support is also provided over the telephone.

This telephone interpreter support service is available

in seven languages and is used by more than 4000

people a year. The number of languages supported

and the considerable number of users place an exces-

sive load on the system. Moreover, a further increase

in the load is expected in the future.

This paper presents a Web-based multilingual ut-

terance collection system, named OTOCKER, which

can collect multilingual voice data related to the ﬁeld

of medicine. We have developed OTOCKER and car-

ried out a trial experiment using this system. In this

paper,we will discuss the effectivenessof this system.

2 RELATED WORK

There are some researches to collect speech cor-

pus(Maekawa et al., 2000; Fujii et al., 2008; Griol

et al., 2008). Maekawa et al. collect a spontaneous

speech corpus of Japanese to make it useful for nat-

ural language processing and phonetic or linguistic

studies(Maekawa et al., 2000). Fujii et al. col-

lect a speech corpus of Japanese classroom lecture

to extract an importance sentences(Fujii et al., 2008).

David et al. propose two speech corpora acquisition

techniques(Griol et al., 2008). There is no research to

collect multilingual voice data for being provided to

other voice-based systems.

Considerable research has been carried out in or-

der to support intercultural communication via ma-

chine translation(Ishida, 2006; Yoshino et al., 2008).

370

Yoshino T., Nishimura R. and Fukushima T.

A WEB-BASED MULTILINGUAL UTTERANCE COLLECTION SYSTEM FOR THE MEDICAL FIELD.

DOI: 10.5220/0001844503700375

In Proceedings of the Fifth International Conference on Web Information Systems and Technologies (WEBIST 2009), page

ISBN: 978-989-8111-81-4

However, the accuracy of machine translation is not

sufﬁcient for it to be used in the ﬁeld of medicine.

This is because any mistake in communication can di-

rectly affect a patient’s life.

Therefore, multilingual corpora, consisting of text

related to the medical ﬁeld, are collected by a trans-

lation system for parallel translation(Yoshino et al.,

2009). The following three-step process is followed

by this system: (1) The necessary corpus is registered

with the system by the user. (2) The corpus is trans-

lated into different languages by other users. (3) The

translation is clubbed with the original corpus.

The collected corpora are then made available to

other medical support systems. As of now, this trans-

lation system is available in ﬁve languages (Japanese,

English, Chinese, Korean, and Portuguese).

However, this translation system cannot provide

support to a foreigner who is unable to read any of

the languages the system is available in. In such a

case, it is necessary to provide voice support. Differ-

ent voices have been collected for speech translation

research before. However, the use of these collected

voices has become one of the most challenging issues

in the ﬁeld since not all voices can be understood by

everyone.

Therefore, the purpose of this research is to collect

accurate voices for providing voice translation sup-

port in the ﬁeld of medicine(Miyabe et al., 2007).

3 ATTRIBUTES OF UTTERER

Voice data collected in the utterance collection system

are meant to be provided to other voice-based sys-

tems. Therefore, the attributes of the person whose ut-

terances are recorded (the utterer) need to be known.

This system considers the following four attributes of

an utterer.

1. Sex and Date of Birth.

This system records the sex and the date of birth of

the utterer since prior to using the recorded data, it

is necessary to know whether the utterer is a male

or a female and whether he/she is an adult or a

child.

2. Native Language.

Since a person’s native language affects his/her

pronunciation, the system also records the native

language of the utterer.

3. Proﬁciency in Nonnative Language.

It is necessary to know whether the utterer is

comfortable with the nonnative language in which

he/she is recording. Therefore, we asked the per-

son to rate his/her proﬁciency in the nonnative

language he/she was prepared to record in. The

deﬁnitions of the levels of proﬁciency are given in

Table 1.

4. Dialect.

The different ways in which a language is spoken

in different countries or in different regions of the

same country are called dialects. Since the qual-

ity of voice data is also dependent on these differ-

ences, the system even maintains a record of the

dialect spoken by the utterer. This system classi-

ﬁes the variants of a language into the following

two types:

• Variants in which both pronunciationsand char-

acters are different

• Variants in which words are spelt in almost the

same way but their pronunciation is different

Chinese and Portuguese are examples of the ﬁrst

type of classiﬁcation. Chinese has two dialects,

Pekingese and Cantonese. The Portuguese spo-

ken in Portugal is different from the Portuguese

spoken in Brazil. These dialects are mutually dif-

ferent in terms of both pronunciation and spelling.

In particular, the pronunciation of both the Chi-

nese dialects is so different that people speaking

the two different dialects cannot understand each

other. English is anexample for the second type of

classiﬁcation. Some words in American English,

British English, and Australian English are almost

similar in spelling but differ in pronunciation.

This system refers to the language variants that

fall in the latter category as ”dialects,” and the

ones that fall in the former category are classiﬁed

as different languages.

4 DESIGN OF OTOCKER

This section describes the design of a Web-

based multilingual utterance collection system called

OTOCKER. We used PHP as the development lan-

guage. In this system, the voice is registered and re-

played on a Web browser.

4.1 User Management

This system deﬁnes four types of roles for users: man-

ager, voice editor, voice registrant, and voice replay

user.

Table 2 shows the system rights for each role. The

system limits the functions of each role in order to

manage the large number of users.

A WEB-BASED MULTILINGUAL UTTERANCE COLLECTION SYSTEM FOR THE MEDICAL FIELD

371

Table 1: Description of four proﬁciency levels.

Proﬁciency levels Description

1 I can convey my opinions and ideas clearly in this language.

2 I can manage day-to-day conversation in this language.

3 I am a poor speaker of this language.

4 I do not speak this language.

Table 2: System rights for each role.

Roles

Rights Manager Voice editor Voice registrant Voice replay user

Add user, edit user, delete user O X X X

Delete voice data O O X X

Delete my voice addition O O O X

Add new voice data O O O X

Replay voice data O O O O

*A voice registrant can delete his/her own registered voice.

“O” shows that the role has the rights. “X” shows that the role does not have the rights.

4.2 System Conﬁguration

Figure 1 shows the system conﬁguration. Figure 2

shows the screen transition chart. In order to run cor-

rectly, this system needs a Java-enabled Web browser.

The system uses w3voice(Nisimura et al., 2008) for

collecting voice data on the Web browser. w3voice

can operate on Windows, MacOSX, and Linux; how-

ever, Java has to be installed on these operating sys-

tems. The voice of an utterer can be recorded only by

a mouse click. We use w3voice to reduce the users’

threshold.

4.3 Voice Data Registration and Replay

Figure 3 shows the ﬂow of voice data registration and

replay.

4.3.1 Flow of Voice Registration

1. The system presents the corpus list obtained from

the database of a medical-ﬁeld corpora-sharing

system to a voice registrant.

2. The voice registrant selects a corpus from the list.

3. The user utters the corpus into the microphone.

Figure 4 shows the voice registration screen. Af-

ter recording the voice data with a voice recorder,

the system saves this voice data into a directory.

Then, the attributes and information of the voice

of the utterer are added to the database.

4.3.2 Flow of Voice Replay

1. The system presents the corpus list obtained from

the database of the medical ﬁeld corpora sharing

system to a voice player.

2. The voice player selects a corpus from the list.

3. The voice list for the corpus is displayed.

4. The user selects the voice to replay.

5. The system obtains the voice ﬁle name from the

database and replays the voice. Figure 5 shows

a screenshot for a voice replay. When the user

clicks the voice replay button, the voice replay

begins. During the replay, the system shows the

proﬁle of the utterer to the voice replay user.

5 TRIAL EXPERIMENT

We carried out a trial experiment using the proposed

system. The purpose of the experiment was to evalu-

ate the usability of OTOCKER.

5.1 Examinees

Table 3 shows the examinees’ native language and the

other languages that they can speak. Table 3 provides

the language-baseddistributionof the examinees. The

examinees are volunteer interpreters.

WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies

372

Voice registration user Voice replay user

Medical parallel corpora collection server

Multilingual utterance collection Web system (OTOCKER server)

User DB

Voice DB

Parallel

corpora DB

Internet

Figure 1: System conﬁguration of OTOCKER.

5.2 Experimental Procedure

We carried out the following procedure:

1. We explained the background and the purpose of

this system to the examinees.

2. Examinees registered their user information.

3. We explained the usage of the system.

4. The examinees recorded their own voice.

5. We asked the examinees to ﬁll out a questionnaire.

6 RESULT AND DISCUSSION

6.1 Overall Expression of OTOCKER

Table 4 shows the results of the questionnaire survey.

The survey was based on a description-type question-

naire, and for the evaluation of the survey results, we

used a ﬁve-item Likert scale.

1. Item 1: I was able to register my voice easily.

The average value of the result is 4.2 on the ﬁve-

item evaluation. This value implies that even

though the examinee was not accustomed to the

operation of a PC, he/she could operate the system

easily. Some of the examinees also expressed sur-

prise at the fact that their voice could be recorded

so easily and clearly just by clicking on a button.

2. Item 2: This system is easy to use.

The average value of the result is 3.6 on the ﬁve-

item evaluation. In this case, the examinees were

Voice replay page

Voice registration page

Selected parallels corpus page

Parallel corpora list page

Select a parallel corpus

Figure 2: Screen transitions in OTOCKER.

A user

Medical parallel corpora server

Multilingual utterance

collection OTOCKER

server

Parallel

corpora DB

Voice DB

w3voice

Voice data directory

(wav format)

PHP script

Figure 3: Flowchart of voice registration and replay.

divided in their opinions. Some people said that

the operation of this system was easy or very sim-

ple. The others said that they required a guide in

order to use this system. Therefore, we have con-

cluded that the usability of the system needs to be

improved further.

A WEB-BASED MULTILINGUAL UTTERANCE COLLECTION SYSTEM FOR THE MEDICAL FIELD

373

Table 3: Language-based distribution of examinees.

Native language Japanese: 4 persons; Portuguese: 1 person

Other languages that

can be spoken

Japanese: 5 persons; English: 2 persons; Portuguese: 2 persons; Span-

ish: 2 persons

Table 4: Results of questionnaire survey.

Evaluation value

No. Items 1 2 3 4 5 Average

1 I was able to register my voice easily. – 1 – 1 3 4.2

2 This system is easy to use. – – 3 1 1 3.6

3 I want to register my voice on this system. 1 – – 3 1 3.6

4 I want to use this system continuously. 1 – 2 1 1 3.2

5 I would not like my voice to be used by other systems. 2 2 1 – – 1.8

1) Evaluation value: 1: Strongly disagree, 2: Disagree, 3: Neutral, 4: Agree, 5: Strongly agree

2) The numbers in each evaluation value column denote the number of people who held the view reﬂected by that particular evaluation value.

Voice recording button

Figure 4: Voice registration screen.

Voice replay button Sex Age Delete button

Figure 5: Voice replay screen.

3. Item 3: I want to register my voice on this system,

and Item 4: I want to use this system continuously.

The average values for Items 3 and 4 are 3.6 and

3.2, respectively. As responses to these items,

we received positive comments such as “I think

that the system can help many foreigners if it is

used in hospitals across Japan,” “I would be glad

to register my voice on this system,” and “I am

happy to have registered my voice with this sys-

tem.” However, we also received few negative

comments such as “I was not happy with the qual-

ity of recording” and “I ﬁnd it disturbing that my

voice will be used in a hospital.”

4. Item 5: I would not like my voice to be used by

other systems.

The average value of the result is 1.8 on the ﬁve-

item evaluation. This result shows that there was

little resistance to the use of the voice. We did

not expect this result and believe that it could be

attributed to the fact that the examinees were vol-

unteers.

7 CONCLUSIONS

We designed and developed a Web-based multilingual

utterance collection system, named OTOCKER. The

purpose of OTOCKER is to act as a voice collection

platform for intercultural communication. In this pa-

per, we presented the conﬁguration of this system and

the results of trial experiments.

ACKNOWLEDGEMENTS

This work was supported by SCOPE (Strategic In-

formation and Communications R&D Promotion Pro-

gramme) of Ministry of Internal Affairs and Commu-

nications.

REFERENCES

Fujii, Y., Yamamoto, K., Kitaoka, N., and Nakagawa, S.

(2008). Class lecture summarization taking into ac-

count consectiveness of important sentences. In IN-

TERSPEECH2008, pages 2438–2491.

WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies

374

Griol, D., Hurtado, L. F., Segarra, E., and Sanchis, E.

(2008). Acquisition and evaluation of a dialog cor-

pus through woz and dialog simulation techniques.

In (ELRA), E. L. R. A., editor, Proceedings of the

Sixth International Language Resources and Evalua-

tion (LREC’08), Marrakech, Morocco.

Ishida, T. (2006). Language grid: An infrastructure forinter-

cultural collaboration. In SAINT-06, IEEE/IPSJ Sym-

posium on Applications and the Internet, pages 96–

100. IEEE Press.

Maekawa, K., Koiso, H., Furui, S., and Isahara, H. (2000).

Spontaneous speech corpus of japanese. In Proceed-

ings of the 2nd International Conference on Language

Resources and Evaluation (LREC2000), pages 947–

952.

Miyabe, M., Fujii, K., Shigenobu, T., and Yoshino, T.

(2007). Parallel-text based support system for in-

tercultural communication at medical receptions. In

Proceedings of The First International Workshop on

Intercultural Collaboration (IWIC2007), pages 182–

192.

Nisimura, R., Miyake, J., Kawahara, H., and Irino, T.

(2008). Speech-to-text input method for web system

using javascript. In SLT2008, IEEE Workshop on Spo-

ken Language Technology.

Yoshino, T., Fujii, K., and Shigenobu, T. (2008). Availabil-

ity of web information for intercultural communica-

tion. In PRICAI-08, The Tenth Paciﬁc Rim Interna-

tional Conference on Artiﬁcial Intelligence.

Yoshino, T., Fukushima, T., Miyabe, M., and Shigeno, A.

(2009). A web-based multilingual parallel corpus col-

lection system for the medical ﬁeld. In IWIC-09, ACM

Workshop on Intercultural Collaboration.

A WEB-BASED MULTILINGUAL UTTERANCE COLLECTION SYSTEM FOR THE MEDICAL FIELD

375