Recognition of Reading Activities with
Read Aloud Voice on Japanese Text Presentation System
Kyota Aoki
Graduate school of Engineering, Utsunomiya University, 7-1-2 Yoto, Utsunomiya, Japan
Keywords: Reading Difficulty, Text Presentation, Assessment, Activity Analysis.
Abstract: There are many pupils with reading difficulty in Japanese schools. The dyslexia is the disability about reading
and writing texts. In Japanese public elementary schools, every pupil may use an ICT device individually and
simultaneously. In the cases, a few teachers must teach all pupils. The ICT devices must help users to use the
ICT devices by itself for effective usage. For help the users, the ICT devices must understand the state of the
user. This paper proposes a method to recognize the reading activity of a user with read aloud voices. The
proposed method is implemented. Experiments confirm the performance for measuring the reading activity
of a user.
1 INTRODUCTION
Information and communication technology (ICT)
spreads in Japanese public elementary schools. In
Japanese public elementary schools, every pupil may
use an ICT device individually and simultaneously.
In the cases, a few teachers in most cases one teacher
must teach all pupils. A class have 40 pupils at most
in Japanese public elementary schools. The median of
the numbers of pupils is about 32 in Utsunomiya
Japan.
The usage of the ICT devices makes many
problems. There are many easy problems. However,
one teacher cannot handle all of the problems about
the usage of ICT devices individually and
simultaneously.
We will cover the easy problems with the ICT
device itself. In Japan, a normal class includes about
32 pupils. About 20% of pupils have some problems
about using ICT devices. We will cover the 80% of
the problems with ICT device itself. In the case, the
teachers can treat only two pupils that have the
problems not covered by the ICT device itself.
For treating the problems caused with the usage of
ICT devices and helping a user, the ICT systems must
recognize the user’s activities. The Japanese text
presentation system was proposed for helping the
pupils with or without reading difficulty. (Aoki, K.,
Murayama, S., Harada, K., 2014) In the Japanese text
presentation system, the activities of a user are key
touches, eye movements, and read aloud voices.
The pupil may leave the ICT system. Our system
does not have arms. It cannot prevent to leave the
pupils from the front of the ICT system. However,
teachers can treat this kind of problems. Many pupils
use the ICT system well. However, many simple
problems prevent to use the system well.
In Japan, if a pupil shows two years delay of
reading ability, we say that the pupil has reading
difficulty. Some Japanese normal public elementary
schools have about 20% of pupils with light reading
difficulty. Of course, there are pupils with heavy
reading difficulty. The pupils with heavy reading
difficulty attend special support education classes or
schools.
The reading ability is most important ability for
learning in a school. Almost all materials are text
books. Recently, multimedia materials have
increased gradually. However, in multimedia
materials, texts have an important role. The pupils
with reading difficulty have a large handicap in all
subjects. Even if a pupil has enough intelligence, with
reading difficulty the pupil has difficulty about
learning all subjects. The helping method for the
pupils with reading difficulty is important.
This paper proposes the method to recognize the
activities of the user on the Japanese text presentation
system that helps to read Japanese texts with or
without reading difficulty, and the system decreases
the work of teachers who help the pupils.
There are many pupils with reading difficulty in
Japanese elementary schools. There are many
25
Aoki K..
Recognition of Reading Activities with Read Aloud Voice on Japanese Text Presentation System.
DOI: 10.5220/0005427900250035
In Proceedings of the 7th International Conference on Computer Supported Education (CSEDU-2015), pages 25-35
ISBN: 978-989-758-107-6
Copyright
c
2015 SCITEPRESS (Science and Technology Publications, Lda.)
difficulties. The big and first one is reading Japanese
characters. Japanese characters are the construction of
hiragana (phonetic character), katakana (another type
of a phonetic character), kanji (Semantic character)
and other characters. In the period of elementary
school, pupils learn 48 characters of hiragana, 48
characters of katakana and 1008 characters of kanji.
Almost all pupils learn hiragana and katakana easily.
However, the huge number of kanji is difficult to
learn for some pupils in normal classes. (Murayama,
S., Aoki, K., 2012)
The next one is the difficulties about recognizing
the sentence structures. In Japanese sentences, there
is no spacing between words. For easing the
difficulties about reading kanji characters, we can
replace kanji characters with the hiragana characters.
We can write hiragana characters that represent the
pronunciation of the kanji characters at the side of the
kanji characters.
We recognize the words constructing the Japanese
text in the help of kanji. There are a large number of
words starting from the character of kanji. We
recognize the chunk of characters that constructs a
word for the complex of hiragana, katakana and kanji.
Replacing kanji characters with hiragana
characters, we have the sequence of hiragana
characters only. In a long sequence of hiragana, it is
difficult to recognize the chunk of characters
constructing a word. It has no problems of this kind
to write hiragana characters at the side of kanji
characters.
In an elementary school, pupils learn hiragana and
katakana at first. In the first stage in elementary
schools, the Japanese text-books have a space
between words for the ease of understanding the
structures of the sentences. However, normal
Japanese texts have no space between words.
Every pupil has those two difficulties at first. In a
long school life, they acquire the skill to conquer
those difficulties. Anyway, those two difficulties are
large barriers for reading and understanding Japanese
sentences.
Every infant has no knowledge about the Japanese
characters. Every pupil has a little knowledge about
the huge number of kanji characters at first. Then,
they learn hiragana, katakana and kanji characters in
a long elementary-school life.
In Japanese elementary schools, reading difficulty
means two years delay of reading abilities. A few of
pupils with dyslexia learn in special support
education classes or schools. However, there are
many pupils with reading difficulties in normal
elementary schools. Of course, some pupils have
difficulty about remembering kanji characters. Most
of the pupils remember kanji characters gradually.
However, pupils with a learning disability tendency
have difficulty with reading Japanese sentences in the
case that they can remember the kanji characters. In
the case, they may be dyslexia.
There may be many causes of the difficulties on
reading Japanese texts. We do not discuss the causes.
We only pay attentions to the methods for easing their
difficulties. We call their difficulties as “reading
difficulty” in this paper.
The research about teachers shows that the pupils
with ADSH tendency have difficulty about following
the characters sequentially and recognizing the
grammatical structures (Murayama, Aoki, 2009). Of
course, there are many types of reading difficulties.
There are many causes of the reading difficulties. The
resulting reading difficulties show the similar
symptoms. They are the difficulties about following
the characters sequentially, recognizing grammatical
structures and reading kanji characters.
We have developed a visual text presentation
system for persons with reading difficulty in windows
environments. The system records every operation of
a user. With the recorded operations, we assess the
difficulty of the user.
The Japanese text presentation system was
proposed and implemented for the pupils with reading
difficulties (Aoki and Murayama 2012). The system
provides the multi-level highlighting. The system
makes the precise record of the operations. With the
operational record, we can assess the reading abilities
and difficulties on objective base.
For treating the problems caused with the usage of
ICT devices and helping a user, the ICT systems must
recognize the user’s activities. In the Japanese text
presentation system, the activities of a user are key
touches, eye movements, and read aloud voices.
The pupil may leave the ICT system. Our system
does not have arms. It cannot prevent to leave the
pupils from the front of the ICT system. However,
teachers can treat this kind of problems. Many pupils
use the ICT system well. However, many simple
problems prevent to use the system well.
This paper proposes the method to recognize the
activities of the user with the read aloud voices for
decreasing the work of teachers who help the pupils
with or without reading difficulties.
First, this paper proposes the Japanese text
presentation system with user’s activity recognition
based on the read aloud voices. Then, we discuss
precisely the plan of the Japanese text presentation
system recognizing reading activities with read aloud
voices in a normal Japanese class room. Next, we
discuss the implementation of the system. Then, we
CSEDU2015-7thInternationalConferenceonComputerSupportedEducation
26
show the experimental results. And last, we conclude
this work.
2 JAPANESE TEXT
PRESENTATION SYSTEM
WITH RECOGNITION OF
READ ALOUD VOICES
The Japanese text presentation system records all the
operations of a user. The Japanese text presentation
system moves the high-lighted part in a text with the
key-input of the user. However, with only the key
operations, we cannot recognize precisely the reading
activities of a user. For instance, a user may only type
a proper key with a proper interval without no reading
activities. For recognizing a reading activity and
helping the user, the system needs to observe the
reading activity with a more direct method. In the
usage of the system, the user read aloud Japanese
sentences. One direct observation method of the
reading activity is the measurement of a read aloud
voice. The read aloud voice is a direct result of
reading activity. The eye movement is important for
understanding a reading activity. However, with eye
movements, we cannot have an information about
reading results. So, we start from the analysis of read
aloud voices. The read aloud voice is the result of
reading activity itself. We can evaluate the
performance of reading activity directly.
In the usage of the Japanese text presentation
system, the user directs the move to the next high-
lighted part with a key-input. The Japanese text
presentation system records the key operations with
the precise time. With the record, we can measure the
time for reading the high-lighted part.
With the proper operations, the resulting
information is important for understanding the
reading activities of a user. For confirming the proper
operation of the Japanese text presentation system,
we use the voice of reading aloud.
2.2 User’s Activity about Reading
Using Japanese text presentation system, the usage is
simple as shown in figure 1. A user read a high-
lighted part of a text, then types a key to move the
high-lighted part. In the simple process, a user looks
at the display, follows the text, recognizes the
characters, understands the high-lighted chunk of
characters, read aloud and types a key. Figure 2 shows
more precise flow of reading aloud. The Japanese text
presentation system cannot help a user to look a
Figure 1: Basic system operation.
Figure 2: Precise reading activity.
display. However, the Japanese text presentation
system helps to find a proper sentence on the display
with high-lighting the sentence and masking other
sentences weakly. The system helps the user to find a
chunk of characters with high-lighting also. We
cannot observe the process of understanding.
However, we can observe the read aloud actions and
eye movements. We can observe the expressions on a
user’s face and body movement also. For guiding and
helping the user of the Japanese text presentation
system, the eye movement and the read aloud voice
are important. The read aloud activity is a direct result
of a reading. We target the read aloud activity at first.
With the recognition of read aloud activities, we can
assess the reading ability directly.
Display texts with high-lighting
and masking
Start
Read the high-
lighted part
Key input
Move the high-
lighted part
Start
Look a display
Find a text
Find a chunk of characters
Look the high-lighted part
Understand the part
Generating voice
End
RecognitionofReadingActivitieswithReadAloudVoiceonJapaneseTextPresentationSystem
27
Figure 3: Relation between reading time and the length.
Figure 4: Relation between reading time and the length
without out-lire data.
2.2 Reading Activity Measurement
based on the Reading Aloud Voices
For assessing, we use the relation between the reading
time and the length of the high-lighted part. For
measuring the length of a high-lighted part of texts,
there are many measures. One is the number of
characters, and the other is the number of phonemes.
In our pre-experiments, it shows clear relations
between the reading time and the number of
characters. We use the number of characters for
measuring the length of a text. In Japanese texts, there
are kanji characters, hiragana characters and etc. As a
result, there is a change of phonemes in a character.
However, the number of character shows better
relation to the reading time.
The Japanese text materials differ in the target age
of the readers. For elder pupils, the materials include
more kanji characters. A single kanji character
represents a same word that is represented using
many hiragana characters. The elder pupils read faster
than the younger pupils do. As a result, there are
constant relations between the number of characters
and the reading time of a material.
Without reading difficulties, there is a linear
relation between the reading time and the length of
the high-lighted part. However, in real reading, there
are many miss-operations and reading difficulties.
Figure 3 shows the example of the relation between
the reading time and the length of the high-lighted
part. There are points on a linear function and out-lire
points.
We decide the out-lire points in the reading time
per character of the high-lighted part. We use a simple
threshold for this process. We decide that the reading
time per character without reading difficulties are
between 0.1S and 0.3S. We plot the pairs of the length
and the reading time of the high-lighted parts after
filtering with the threshold in figure 4.
The out-lire points may represent a reading
difficulty or some error operations. It is important for
understanding the reading activity to distinguish a
reading difficulty and error operations. Only from the
key operations, we have no information for
distinguish them.
With read aloud voices, we can easily recognize
the reading activities. However, it is difficult to
recognize the relation between the read aloud voice
and the high-lighted part of a text. The observed voice
may be only a talking to oneself. The observed voice
may be a correct read aloud of the high-lighted part
of a text. The read aloud voice includes some error
pronunciations of the high-lighted part of a text.
However, pupil is not an announcer. Their
pronunciations are not clear.
Normal speech recognition is powerful now. With
the power of a web cloud, our smart phones recognize
our speech well. However, dictations of long
sentences are difficult. With a long sentence, a speech
recognition makes some errors.
In Japanese elementary schools, the Internet
connection is more or less restricted for keeping
security. In the environment, powerful cloud-based
speech recognition cannot work. We must use the
poor speech recognition system that works without
the Internet connection. In the environment, the
Japanese text presentation system must recognize the
reading activity of a user with error some results of
speech recognitions.
In figure 4, the pairs of the length and the reading
time have the relation of linear function clearly. With
the reading difficulties, the pupil needs much more
reading time. As a result, the high-lighted parts where
the user has difficulties for reading are plotted upper
regions over the linear function.
The plotted points over the normal linear function
direct the reading difficulties. The corresponding part
of the text shows the kinds of reading difficulties.
0
500
1000
1500
2000
2500
3000
3500
4000
0 5 10 15 20 25
y = 177.14x - 13.256
R² = 0.901
0
500
1000
1500
2000
2500
3000
3500
4000
0 5 10 15 20 25
2013-0704-1541
CSEDU2015-7thInternationalConferenceonComputerSupportedEducation
28
3 IMPLEMENTATION OF
JAPANESE TEXT
PRESENTATION SYTEM WITH
RECOGNITION OF USER
ACTIVITY
3.1 ICT Environments
A normal personal computer has a microphone to be
able to catch the voices of user’s read aloud. The basic
function of the Japanese text presentation system is to
present the Japanese text properly for easing the
reading difficulties of a user without any stress. The
system must move the high-lighted part without no
delay after a key-input.
Speech recognition needs some processing time.
A key operation and a reading aloud are
asynchronous activities. So, the system processes the
task around the key operations and one around voice
recognition simultaneously.
In Japanese elementary schools, the Internet
connection is more or less restricted for keeping
security. In the environment, powerful cloud-based
speech recognition cannot work. We must use the
poor speech recognition system that works without
the Internet connection. In the environment, the
Japanese text presentation system must recognize the
reading activity of a user with error some results of
speech recognition.
The assessment process needs large amount of
teacher contributions. In reading with the Japanese
text presentation system, teachers monitor the process
of the readings. After that, teachers see the
operational records. This assessment results an
objective estimation of the reading difficulties of the
user. However, there is a little difference of the
teacher contributions between the assessment using
the Japanese text presentation system and the
classical assessment methods.
Figure 5: Ratio among pupils need helps for using the ICT
device properly.
Table 1: Problems about ICT usability in a special aid
school in Japan.
There is a load concentration into the teacher, who is
good at ICT.
There are a few educational materials for the DAISY.
They do not use the SAVE AS DAISY.
Using OCR for preparing educational materials for
pupils with a learning disability tendency, the
recognition errors make a large check and correct work.
There are large works for replacing difficult kanji
characters with hiragana.
Using classical ICT tools as the DAISY, we need to
prepare educational materials for each pupil who has a
different age and a different disability.
It is difficult to evaluate the performance.
Table 2: The plan for covering the problems.
New/Old Feature
New Automatic reading activity estimation.
New
Writing hiragana characters at the side of
kanji characters.
Old Automatic operation observations.
Old
Automatic assessment of reading
difficulties.
Old
A collection of simple software is better
than complex multi-functional software.
Old Avoid the usage of OCR.
Old
An educational material presentation
system that does not need the special material
preparations.
Old
An evaluation method/function for
evaluating the performance of a pupil.
In the reading processes, a pupil may read the part
that is not high-lighted. A pupil may make un-correct
pronunciation. Those events make no marks in the
operational record. The observing teachers guide the
pupil for proper operations of the Japanese text
presentation system. The teachers also record the un-
correct pronunciations.
The Japanese text presentation system tries to help
every pupil with reading difficulties in a normal class
room. In Japanese elementary schools, there are a few
pupils with reading difficulties. The teacher must
make a class for the majority of normal pupils. The
teachers need the day by day assessments of reading
difficulties for evaluating their teaching to ease the
reading difficulties of a pupil. With the present
Japanese text presentation system, teachers can assess
the difficulties about reading. However, the Japanese
text presentation system needs many works with
teachers. For enabling day by day assessments of
reading difficulty, we must decrease the teachers’
contributions for assessing the reading difficulties.
4%
16%
80%
RecognitionofReadingActivitieswithReadAloudVoiceonJapaneseTextPresentationSystem
29
3.2 Class Room
There are many problems for utilizing the ITC
technology in Japanese elementary schools
(Murayama, Aoki and Morioka, 2009). The problems
are listed in Table 1. For solving the problems, the
proposed text presentation system treats only the
electronic text. In Japan, a law forces to prepare the
electronic readable texts of text books (Law). And,
there are many documents accessible through the
Internet. There is no paper document for an input in
the proposed system.
Many pupils may remember the full text of the
many times used materials as text books. Those
remembered materials cannot be used for evaluating
the reading performance of a pupil. The reading of the
materials cannot help to enforce the reading abilities
of the pupil.
In normal class rooms, many pupils use the
Japanese text presentation system simultaneously. In
Japan, a class of a public elementary school has about
30 pupils and a teacher. With the instructions of the
teacher, we estimate that about 80% of the pupils
work with the Japanese text presentation system
properly. There are 20% of pupils who need a help to
use the Japanese text presentation system properly. It
is difficult to support 6 pupils by a teacher
simultaneously. Our new system will support 80% of
pupils that have some problems to use the system by
itself. Then, 2% of the pupils in a class there are one
or two pupils who need helps. A teacher can support
the pupils. In the case, all of the pupils in a class work
properly with the Japanese text presentation system.
Figure 5 shows these relations graphically. There is
no need of the complete support for all the pupils in a
class. The 80% support for pupils is enough in a
normal class.
3.3 System Design
The proposed system has the features listed in Table
2. The proposed Japanese text presentation system
has only 2 new functions. We restrict the functions of
the proposed system. The new proposed system has
the function writing hiragana characters at the side of
kanji characters, and the function of analysis of user’s
voices. With those new functions, the new Japanese
text presentation system makes easy to estimate the
user’s reading difficulties. This is discussed in
previous section. The teachers around the pupil with
reading difficulties need the objective measurements
of the performance of the reading ability of the pupil.
For the pupils without reading difficulties, the
objective measurements of performance show the
Figure 6: The Japanese text presentation system with
recognition of user’s activity.
Figure 7: Outline of the new Japanese text presentation
system.
progress of the user. For this purpose, the proposed
system provides the operation logging function. The
operation logs describe the reading speed at each
meaningful chunk of characters.
The proposed Japanese text presentation system
enables to use one-time materials for measuring the
performance of a pupil. The real-time presentation
CSEDU2015-7thInternationalConferenceonComputerSupportedEducation
30
generation enables to use any new plain text materials
at any time with personalized presentation.
This real-time presentation generation enables to
adapt the presentation for each pupil with different
reading difficulties. DAISY has no function about
adaptation for each pupil.
For adapting the variety of pupils’ ages and
disability grade, the presentation system has the
function to replace the un-studied kanji characters
with hiragana characters. The phonic hiragana
character is first studied character. There is a little
difficulty about reading hiragana.
For easing the difficulty about kanji characters,
the new system has another function that adds
hiragana characters that represent the pronunciations
of the kanji characters at the side of the kanji
characters. This presentation helps users to recognize
the relation between the kanji characters and their
pronunciation.
The operations to the presentation system have
the information about the user. The proposed system
logs every operation at the time. This log represents
the fluency of the reader.
The new system has the function that analyses the
voice of read aloud of the user. With the voice of the
reading aloud, the new proposed system estimates the
pronunciation. With the estimated pronunciations, the
new system estimates the reading activity of the user.
With the reading activities estimated, the new system
can change the presentation. The new system guides
the user for proper usage of the system. With the
recorded voice, the teacher may check the
pronunciations afterward.
The new system has the features listed in Table 2.
The first and the second rows are new added features.
They decrease the work by a teacher about using the
Japanese text presentation system. For wide use of the
Japanese text presentation system, the system does
not need large-scale contributions of teachers. The
network problem is important in Japanese schools.
There is a large limitation about the Internet access.
As a result, some cloud based implementation cannot
work. The proposed system must work without the
Internet access.
3.4 System Implementation
3.1.1 Language and Library
We implement the new Japanese text presentation
system with Python. The new system uses Julius and
Mecab. Julius is a Japanese speech recognition
system (Julius). Mecab is a morphological analyser
for Japanese sentences (Mecab). There are Python’s
interfaces for Julius and Mecab. Our Python based
system integrates Julius and Mecab. For Japanese test
presentation, the system uses Pyglet. Pyglet provides
an object-oriented programming interface for
developing games and other visually-rich
applications (Pyglet). With Pyglet functions, the new
system enables to display any collections of display
formats.
3.1.2 Multiprocessing
The new Japanese text presentation system has two
major processes. One process takes a work for
presenting Japanese text. The other process takes a
work for estimating user’s activities. With separating
a text presentation and an activity estimation, the text
presentation works freely from the time-consuming
speech recognition. This implementation ensures the
light display of texts. Figure 6 shows the basic
structure of the new Japanese text presentation
system. The dashed line box is the range of current
implementation. The guidance generation is left for
future. Figure 7 shows the outline of the new Japanese
text presentation system.
3.1.3 Phoneme Recognition
The Japanese speech recognition system Julius can
recognize a speech well with proper preparations.
However, in simultaneous use without proper
preparations, the Julius cannot show its good
performance. In the case, there are many error
recognitions. With the error some recognition results,
the new system makes the estimation of user’s
activity with error some speech recognition results.
The new system equates similar sounds with each
other. The new system recognizes the part where the
user read aloud in a text. The correctness of reading
is not evaluated. With the recognition of the part of
reading aloud, the system recognizes that the user
uses the system properly or not.
The Japanese speech recognizer Julius recognizes
the chunk of voices. There are many errors in the
recognized results. The new system only uses the
phonemes.
The new system evaluates the length of phonemes
recognized. The number of phonemes is robust in
noisy environments. Using the number of phonemes
recognized, the new system estimates the
correspondence between the phonemes of a high-
lighted part of texts and the phonemes recognized
from voices based on the length of the phonemes. The
new system evaluates the difference between the
phonemes of the high-lighted part of texts and the
recognized phonemes using Levenshtein distance
RecognitionofReadingActivitieswithReadAloudVoiceonJapaneseTextPresentationSystem
31
(Levenshtein 1966). With the Levenshtein distance,
the new system estimates the correctness of the
reading aloud voices for the high-lighted part in the
text. In the implementation, the insertion and the
deletion take 2 for their edit distances. The
substitution’s cost is 4 for normal substitutions.
Between the nearly same phonemes, the
substitution’s cost is 2. For instance, ‘shi’ and ‘hi’ are
nearly same in Japanese. With a threshold, the new
system decides the read aloud voice is proper
pronunciation of the high-lighted part of a text, or not.
Figure 8 shows the precise flow of phoneme analysis.
4 EXPERIMENTS
Figure 8: Flowchart of voice analysis.
Figure 9: Presentation example.
Figure 10: Presentation example using large fonts.
Figure 11: Presentation example using other fonts.
Figure 12: Presentation example without writing hiragana
characters at the side of kanji characters.
We will help the user by the Japanese text
presentation itself. For this purpose, we implement
the reading activity estimation with the voice of
user’s read aloud. The new system records the voice.
CSEDU2015-7thInternationalConferenceonComputerSupportedEducation
32
The new Japanese text presentation system includes
the original Japanese text presentation system. The
new system includes the function to estimate the
reading activity with user’s reading aloud voice and
the function to make reading difficulty assessment.
4.1 Text Presentation Varieties
The new Japanese text presentation system enables
much more varieties of text presentation. The new
function displays hiragana characters at the side of
kanji characters. In Japan, it is popular helping
method for easing the difficulty of reading kanji
characters to write hiragana characters at the side of
kanji characters.
The placement of hiragana characters at the side
of kanji characters has many methods. Our
implementation places the hiragana characters at the
center of the word of kanji characters. Figure 9 shows
an example of presentation of Japanese texts with
hiragana characters writing at the side of kanji
characters. The current sentence is high-lighted, and
the current part of the sentence is high-lighted with
other formats. Other parts of the text in Figure 10 are
examples using larger fonts. Figure 11 shows an
example using other types of fonts. Figure 12 is an
example without writing hiragana characters at the
side of kanji characters.
4.2 Read Aloud Voice Recognition
We have eight students in our laboratory for the
experiments. They include three students that mother
tongs are not Japanese. It is easy to measure the
strength of the voice of a user in experimental
environments. With the voice of a single person, it is
difficult to evaluate the precise pronunciations.
However, it is easy to evaluate the strength of the
voice.
In normal class room, there are many other sounds
other than the voice of the user. In the environments,
it is not easy to separate the voice among other voices
and noises. We use the recorded voice for checking
the pronunciations by the teachers.
In the experiment, the new Japanese text
presentation system decides about 80% of the voices
as correct pronunciations of the high-lighted parts.
This result depends on the threshold. We can tune
these results. Figure 13 shows the part of the
recognition results. In the figure, ‘mukashimukashi’
is the phonemes of the first part of the text in figure
9. The phenomes of a text and the phenomes of a
voice are same in figure 13.
Figure 13: Phoneme analysis logs.
Table 4: Reading time of all subjects.
Subject A B C D E F G H
Correctness 0.80 0.60 0.67 0.87 0.87 0.67 0.47 0.47
Reading time (S) 25.1 24.0 26.2 30.2 23.5 28.1 38.1 45.3
Utterance time (S) 21.0 14.5 19.5 20.5 14.0 17.5 15.5 18.0
Silence time (S) 4.1 9.5 6.7 9.7 9.5 10.6 22.6 27.3
RecognitionofReadingActivitieswithReadAloudVoiceonJapaneseTextPresentationSystem
33
Table 3: Error examples in voice recognition.
# Text Voice
1 oji:saNto ojisaNto
2 oba-
saNgasuNdeimashita
obasaNgasuru
3 takeotorini kakyo:toriniru
4 takeotorini shibakarini
5 hitoyasumishiteiruto ichiyasumishiteiruto
6 hitoyasumishiteiruto hitoriyasumi/shiqteruto
Table 3 shows the examples of voice recognitions
that have some errors. In the table, the column ‘Text’
is the correct phonemes of a text. The column ‘Voice’
is the recognized phonemes from the voice reading
the text.
At the first row, a long vowel is not recognized.
That is represented as ‘:’. At the second row, also a
long vowel is not recognized. And, a gap between
words is not properly recognized. At fifth row, two
phonemes are not recognized properly. At sixth row,
a phoneme ‘ri’ is inserted in the result of voice
recognition.
The errors as the first row are recovered with the
help of the Levenshtein distance. The errors as the
second row are difficult to recover in this stage.
4.3 Estimation of User’S Activity
Table 4 shows the analysed results of users’ activities
using phoneme analysis. The subjects are male, and
span from 22 years old to 27 years old. The subjects
D, G and H are subjects that mother tongs are not
Japanese. They can read, write, and speak Japanese
well. Other subjects are Japanese. The subject D, G
and H need more reading time than other Japanese.
Table 4 shows the experiments of 8 subjects. The
correctness in the table is the correct recognition rate
Figure 14: Reading time and utterance time.
of the decision about correctly reading aloud or not.
In Table 3, the subjects D, G and H need more silence
time than other subjects need. With the utterance
analysis, we have much more precise information for
understanding the user’s reading activity.
Figure 14 shows the relations between reading
time and the utterance time. In the graph, the vertical
measure’s unit is second. There are varieties of
reading activity. In the graph, the increase in a silence
time causes the increase in a reading time. The subject
‘A’ needs a little silence time. The subject ‘H’ needs
a large silence time. With a long silence time, the
reading speed increases.
5 CONCLUSIONS
The proposed new Japanese text presentation system
estimates the precise reading activities of the user to
the teacher. The report includes not only the key
operations, but also the analysis of reading aloud
voices. The reading aloud voices are direct
descriptions of the user’s reading activities. We
implement the analysis of user’s voices. Our
experiments confirm that the function works well.
The new Japanese text presentation system with
phoneme analysis in read aloud voices enables to be
used simultaneously in a class room. In normal class
room, a teacher s many pupils, including ones with
reading difficulties.
The new proposed system decreases the works
with a teacher for using the Japanese text presentation
system in a class. All of pupils in a class utilize the
Japanese text presentation system properly with the
help of teachers and the system itself. Teachers do not
need to check all record of the user’s reading
activities. The system detects the points where the
reading difficulty is. This enables easier use of the
Japanese text presentation system in normal class
rooms.
The much more precise record of the user’s
activities helps to make the precise understanding of
the reading activity with less teacher’s work. We will
add user guidance function discussing with teachers.
We must discuss about the sequence of a silence
time and an utterance time. This leads us to the more
precise understanding of reading activity.
ACKNOWLEDGEMENTS
Mr. Shu Aoki has supported the implementation and
the experiments. This work is supported with
JSPS25330405.
0
10
20
30
40
50
ABCDEFGH
Reading time (S) Utterance time (S)
Silence time (S)
CSEDU2015-7thInternationalConferenceonComputerSupportedEducation
34
REFERENCES
Aoki, K., Murayama, S., Harada, K., 2014. Automatic
Objective Assessments of Japanese Reading Difficulty
with the Operation Records on Japanese Text
Presentation System. CSEDU2014, vol. 2, pp.139-146,
Barcelona, Spain.
DeMeglio, M., Hakkinen, M., Kawamura,H., 2002.
Accessible Interface Design: Adaptive Multimedia
Information System (AMIS). Computers Helping
People With Special Needs, Lecture Notes in Computer
Science. Springer.
Law.
http://www.bunka.go.jp/chosakuken/pdf/tokuteitosyo_f
ukyu_gaiyo.pdf.
Murayama, S., Aoki, K., 2010. Real Time Image
Presentation System for Persons with a Learning-
Disabled Tendency. IEICE-ET, vol. 109, no. 387,
ET2009-96, pp. 25-29, 2010. IEICE.
DAISY. http://www.daisy.org/
Murayama, S., Aoki, K., Morioka, N., 2009. Image
processing to make teaching aids for learning disability
persons. IEICE-108, IEICE-WIT-488, IEICE.
Aoki, K., Murayama, S., 2012. Japanese Text Presentation
System For Persons With Reading Difficulty -Design
and Implementation-. CSEDU2012, vol.1, pp. 123-128,
Porto, Portugal.
Murayama, S., Aoki, K., 2012. Japanese Text Presentation
System for pupils with Reading Difficulties, ICCHP
2012, Lecture Notes in Computer Science, vol. 7382,
Computers Helping People with Special Needs, pp.507-
514, Linz, Austria.
OpenCV. http://opencv.org/. retrieved at 2014.
Mecab. https://code.google.com/p/mecab/. retrieved at
2014.
Julius. http://julius.sourceforge.jp/. retrieved at 2014.
Pyglet. http://www.pyglet.org/. retrieved at 2014.
Levenshtein A., 1966. Binary Codes Capable of Correcting
Deletions, Insertions and Reversals, Soviet Physics
Doklady, vol. 10, no. 8, pp. 707-710.
RecognitionofReadingActivitieswithReadAloudVoiceonJapaneseTextPresentationSystem
35