Figura 7: Example images of the character “ ” (Uni-
code: U+8336) in the test data.
Figura 8: Example images of the character “ ” (Uni-
code: U+8336) in the training data.
U+8336) was 4.2% (= 2 / 48). Corresponding trai-
ning data, which are shown in Fig. 8, include many
types of literary works and characters that were sty-
led differently by authors. Furthermore, as kuzushiji
characters were handwritten by different authors, the
character shape greatly differed in between training
and testing. Thus, kuzushiji character classification
was notably difficult.
4.4 Elimination of
Difficult-to-Recognize Characters
In the test data derived from “Oraga Haru” and
“Ugetsu Monogatari,” there were characters that were
not included in the training data (26 literary works).
Furthermore, character classes with less than 20 ima-
ges in the training data were not trained. In the testing
process, it is difficult to know whether there are unk-
nown characters that have not been trained. Thus, a
function that automatically eliminates such characters
is necessary. In this experiment, 3,003 of 56,029 ima-
ges in the test data were character classes that were
not trained.
The character recognition rate using the experi-
mental training model was 73.10%, which will lead
to a reduction in work efficiency when actually per-
forming transcription work. Therefore, it is important
to eliminate false recognition as much as possible and
leave only the correct recognition results.
Based on the error analysis described in the previ-
ous section, we found that the maximum output pro-
bability values from the softmax function tended to be
low if character images that were not correctly clas-
sified were input to CNN. Thus, we confirmed whe-
ther it was possible to eliminate erroneously classified
characters with low maximum output probability va-
lues (hereinafter referred to as confidence values) by
using characters with high confidence values. Spe-
cifically, confidence values were used as a threshold,
and characters below the threshold were reserved as
unknown characters. As shown in the paper (Yama-
moto and Osawa, 2016), efficient transcription will
be achieved by marking low-confidence characters as
Figura 9: Relationship between the number of images that
can be classified and the classification accuracy according
to the confidence values. Horizontal axis: the threshold of
confidence values. Vertical axis: the number of images that
can be classified (red), and the classification rates (blue)
“ (geta),” and asking experts in the post-process
to make a final judgment.
Figure 9 illustrates the relationship between the
classification rates and the number of images that can
be classified when the threshold of confidence values
is changed. For example, if tested images were limi-
ted to the confidence value of 50% or more, the num-
ber of characters that could be classified was reduced
to 40,209. Among them, the number of correctly cla-
ssified images was 36,544, and the recognition rate
improved to 90.89% (= 36,544 / 40,209). When we
confirmed the result of 3,003 images of the charac-
ter class not used for training, we found that 2,917
images (97.14%) could be eliminated, which was ex-
pected.
Figure 10 provides an example of the recognition
results of one page from “Ugetsu Monogatari.” When
the recognition results of the original book are dis-
played in typographical form, it is necessary to tell
transcribers in the post-process which characters have
low confidence values. In this experiment, we displa-
yed characters with low confidence values in the red
rectangles as shown in the center and on the right of
Fig. 10. If the confidence value threshold is less than
50%, too many characters remained to be judged by
humans, so it was necessary to reduce the number of
characters to be eliminated in order to improve the
efficiency of transcription. However, if the threshold
of confidence values was set to less than 10%, the nu-
mber of characters to be eliminated would be reduced.
Thus, it would be easier to predict characters with low
confidence values from the previous and subsequent
characters with high confidence values.
Japanese Cursive Character Recognition for Efficient Transcription
405