Computer Vision and Deep Learning Tools

for the Automatic Processing of Wasan Documents

Yago Diez

, Toya Suzuki

, Marius Vila

and Katsushi Waki

Faculty of Science, Yamagata University, Japan

Department of Computer Science and Applied Mathematics, University of Girona, Spain

Keywords:

Wasan, Document Processing, Kanji Detection, Kanji Recognition, Deep Learning.

Abstract:

”Wasan” is a type of mathematical texts unique from Japan developed during the Edo period (1603-1867).

These ancient documents present a wealth of knowledge and are of great cultural and historical importance.

In this paper we present a fully automatic algorithm to locate a landmark element within Wasan documents.

Speciﬁcally, we use classical computer vision techniques as well as deep learning tools in order to locate

one particular kanji character called the ”ima” kanji. Even though the problem is challenging due to the low

image quality of manually scanned ancient documents and the complexity of handwritten kanji detection and

recognition, our pipeline including noise reduction, orientation correction, candidate kanji region detection and

kanji classiﬁcation achieves a 93% success rate. Experiments run on a dataset with 373 images are presented.

1 INTRODUCTION

Figure 1: Wasan document example. The ”ima” kanji char-

acter marks the starting point in the description of a new

problem and is typically placed beside the graphical de-

scription of the geometric problem.

”Wasan” is a type of mathematical texts unique to

Japan developed during the Edo period (1603-1867).

Japanese citizens have used Wasan documents to

learn mathematics or as a type or mental training

hobby ever since(Matsuoka, 1996; Martzloff, 1990).

This paper belongs to a line of research aiming

to construct a document database of Wasan. This

database will enable users to search and retrieve prob-

lems based on their geometric properties. This ef-

fort aims at dealing with the problem of the aging

of researchers within the Wasan research area as it

has been estimated that the whole area is in dan-

ger of disappearing within 15 years. By building

an online-available database, searchable with state-

of-the-art tools and built using modern computer vi-

sion and deep learning techniques, we aim at collect-

ing the knowledge of traditional Wasan scholars and

making it available for young researchers and educa-

tors. Speciﬁcally, we believe that the possibility of

automatically describing Wasan problems in terms of

their geometrical components will represent a power-

ful new tool of interpretation and description of these

documents. It is our hope that this future electronic

Wasan document data base will help ﬁnd new insight

on the historical and cultural importance of traditional

Japanese mathematics.

In the current paper, we present the ﬁrst of a set

of tools that use computer vision and deep learning

techniques for the automatic analysis of Wasan docu-

ments. Wasan problems are mostly geometric in na-

ture. Their description is composed of a diagram ac-

companied by a textual description using handwrit-

ten Japanese characters known as kanji. Figure 1

presents an example of a Wasan document. The start

of the description of a problem is marked by one par-

ticular kanji called ”ima” and representing the ini-

tial part of the sentence: ”Now, as shown,...”. This

”ima” (now) kanji is typically placed beside or un-

derneath the graphical description of the geometri-

cal problem and, thus, represents, a necessary start-

ing point for any algorithm aiming at automatically

processing these documents.

Therefore, in this paper our goal is to automat-

ically determine the position of the ”ima” kanji in

Diez, Y., Suzuki, T., Vila, M. and Waki, K.

Computer Vision and Deep Learning Tools for the Automatic Processing of Wasan Documents.

DOI: 10.5220/0007555607570765

In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2019), pages 757-765

ISBN: 978-989-758-351-3

757

Wasan documents. In order to do this, we use several

classical computer vision techniques (Agrawal and

Doermann, 2013; Marr and Hildreth, 1980; Fernan-

des and Oliveira, 2008) as well as some of the most

recent developments in the Deep Learning branch

of Artiﬁcial Intelligence (Krizhevsky et al., 2012).

Speciﬁcally we use the following steps:

First we preprocess the documents to alleviate

some of the problems that they present in terms of im-

age quality: On the one hand, we use a Hough line de-

tector to correct possible tilts in orientation introduced

during the process of digitizing the original docu-

ments. On the other hand, we use a noise removal

step to clear small spurious regions of the image and

improve the performance of the subsequent steps of

the pipeline. In the second step of the pipeline the im-

ages resulting from the preprocessing steps are then

used in blob detector algorithms in order to determine

regions that are candidates to containing kanji char-

acters. Each candidate region is transformed into an

image and passed on to the ﬁnal step. Finally, we run

a Convolutional Neural Network (CNN)-based clas-

siﬁer in order to classify the single kanji contained in

each of the images resulting from the previous step.

The position of the images classiﬁed as containing the

”ima” kanji constitute the ﬁnal result of our algorithm.

Notice how the techniques used are general-

purpose, so their adaptation to Wasan documents de-

veloped in the current work for the particular prob-

lem of locating the ”ima” kanji will serve as basis for

future applications. The rest of the paper is organ-

ised as follows: Section 2 presents details and ref-

erences of the algorithms used throughout our image-

processing pipeline. Section 3 discusses experimental

results dealing with the kanji detection and classiﬁca-

tion steps. Finally, conclusions and future work are

presented in section 4.

2 MATERIALS AND METHODS

Our Database was composed of images that had

been previously manually scanned from original Edo

period Wasan documents. Our documents were

downloaded from the publicly accessible database in

(YUWasanDB, 2018). This database includes ”The

Sakuma collection ” consisting of the works of the fa-

mous Wasan mathematician ”Aida Yasuaki” and one

of his disciples know as ”Sakuma”. We had access to

images corresponding to 431 books with each book

containing between 15 and 40 pages with problem de-

scriptions. For the purposes of this paper, we manu-

ally chose 373 images that contained the start of the

description of mathematical problems and, thus, the

”ima” character. Both the scanning procedure and the

conservation state of some of the document resulted

in some issues that make the automatic processing of

the images challenging. In the following sections we

will deal with two of these issues: Noise and image

tilt.

2.1 Noise Correction

Noise can be deﬁned as any unwanted information

contained in a digital image. The digital image acqui-

sition process is the primary source of noise. Noise

can produce undesirable effects such as blurred ob-

jects, artifacts, unrealistic edges and unseen lines. It

is almost impossible to totally remove noise without

distorting an image, but it is essential to reduce it in

order to perform image analysis.

Noise reduction in ancient documents is a partic-

ularly challenging issue (Arnia et al., 2015). One of

the main problems that it entails is that it is often not

possible to affect the binarization process of the docu-

ments. This makes it necessary to consider only oper-

ations on what typically are very low quality (in terms

of resolution and information-to-noise ration) binary

images (Agrawal and Doermann, 2013).

In order to reduce the noise present on the images,

we used morphological transformations. Morpholog-

ical transformations are a set of shape-based opera-

tions that process binary images. Morphological op-

erations apply structuring elements or ”kernels” to an

input image and generate an output image. The use of

these operations for noise removal in documents rep-

resents a very active research area (Barna et al., 2018;

Goyal et al., 2018; Tekleyohannes et al., 2018).

The most basic morphological operations are ero-

sion and dilation. They have a wide range of function-

alities, such as removing noise, isolation of individ-

ual elements, joining disparate elements and ﬁnding

bumps or holes in an image: The Dilation operation

convolves an image A with a kernel (B). As the kernel

B is scanned over the image, at each position we com-

pute the maximal pixel value overlapped by B and re-

place one image pixel with it. This maximizing opera-

tion causes white regions within an image to grow. In

a typical text document image, the white background

dilates around the black regions of the text. On the

other hand, Erosion computes local minima, causing

the black areas to get bigger.

In this paper we were constrained by the large

quantities of noise but also by the fact that kanji char-

acters that we want to segment often present small

parts that can be easily mistaken with noise. Con-

sequently, we designed the following procedure using

morphological operations:

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

758

First, an erosion operation is applied to the whole

input image obtaining a new image with well-deﬁned

kanji. The goal was to obtain an image where all the

elements belonging to each kanji are interconnected.

Also we obtained much better deﬁned lines. The

erode operation is performed 8 times with a typical

square kernel of 3x3 pixels. Subsequently, an algo-

rithm to remove lines is used. This algorithm con-

sists in convolving the image using a kernel with liner

shape. When all kernel pixels are black, they are con-

verted to white. Then the image obtained is used as

a mask, applying a binary OR operation, to obtain a

new, cleaner image. In the next step, blobs which have

a small enough size, so that they are not confused with

any element of a kanji, are removed. Then, the ﬁrst

step is applied again to this new image in order to in-

terconnect the elements of each kanji. Similarly to the

fourth step, the blobs which have a size smaller than

an eroded kanji are removed. Finally, and similar to

the third step, the image obtained in the sixth step is

used as a mask to obtain the ﬁnal resulting image.

2.2 Orientation Correction

The second issue that our data presented was that dur-

ing the process of manually scanning the original doc-

uments, small tilt angles were introduced. This repre-

sented a problem as the structure of Wasan documents

relies on some orientation-dependant characteristics.

Speciﬁcally, the position of the diagram explaining

the problem respect to the ”ima” character that we

want to detect is either placed immediately over or

just to the right of the character. Consequently, prop-

erly identifying the vertical and horizontal directions

in the image is an important step towards further au-

tomatic processing of Wasan documents.

Wasan documents often present ”pattern lines”.

These are vertical and/or horizontal lines that helped

the writers maintain a coherent structure and orien-

tation over each of the pages. In our case, these

lines clearly show the tilt angle introduced during the

scanning process as, instead of following the vertical

direction on the scanned image their slope deviates

slightly from it. As the noise removal process also

removed these lines, we started this orientation cor-

rection process from the original images:

Taking into account the peculiarities of Wasan

documents and the quality limitations in our images,

we focused on ﬁnding the lines that used to indicate

the vertical direction. Thus, our orientation correc-

tion procedure, can be divided into three main parts 1)

Line detection 2) Line Filtering to remove lines that

have slopes too far from the image vertical 3) Among

the resulting lines, detection of the main lines in the

image by grouping those lines that were close in the

image.

Figure 2 presents an example of the process. In

the ﬁrst step we detected lines in the image using a

Hough Line transform algorithm. The Hough trans-

form is a well known feature extraction technique

dating as far back as 1962 that is frequently used in

many applications in Computer Vision and related ar-

eas . For this work we explored the two versions of

the Hough transform implemented in the OPENCV

library (Bradski, 2000): Probabilistic and determinis-

tic. The visible ink density of the original documents

resulted in discontinuous lines at pixel level. This

caused the deterministic version of the Hough trans-

form to yield poor results. The probabilistic version

(Matas et al., 1998) , though, was able to ﬁnd many

lines in our images. These lines, however, did often

not contain information that was useful in terms of

orientation as they were often made up of kanji parts

and were very short. Consequently we ﬁltered these

results to consider only lines over a certain length

threshold. We also ﬁltered out the lines that were too

far away from the image vertical as our goal was to

ﬁnd the original vertical lines and the tilt angles in

the images were found not to be very large. Once a

small number of lines was left, we observed that many

of the smaller lines were actually part of longer lines

that had some part in the middle that was either faint

or had been totally erased. We grouped those lines by

proximity of their endpoints. All grouped lines were

then converted to a single ”summary line” that aimed

at following the path of the original vertical in the im-

ages.

2.3 Individual Kanji Character

Segmentation Kanji

In this section we describe the part of our pipeline that

segments individual kanji characters.

kanji Characters are made up of a number of

strokes ranging from one to around 20 strokes for the

most common kanji. From the point of view of image

processing, they are small regions made up of even

smaller and disconnected regions. Furthermore hand-

written kanji (such as the ones in Wasan documents)

often present irregular or incomplete shapes.

In order to segment regions that are candidates

to containing kanji (and, speciﬁcally, the ”ima” kanji

that we are aiming at segmenting) we will use a com-

puter vision technique known as blob detection. We

tested four different blob detectors, Figure 3 presents

an example of their results.

The ﬁrst detector used was implemented using the

OPENCV library (Bradski, 2000) while the other 3

Computer Vision and Deep Learning Tools for the Automatic Processing of Wasan Documents

759

a: Original image b: Lines Detected With Probabilistic Hough Line Transform

c: Filtered Lines d: Corrected image

Figure 2: Orientation Correction.

where implemented using the SciKit library (van der

Walt et al., 2014).

Simple Blob Detector

This algorithm aims at exploiting differences in

pixel intensity by thresholding the source image at

different levels of gray and produce a a number of

binary images. Then connected components are ex-

tracted from every binary image and their centers

of masses are considered. These centers are then

grouped among several binary with the aim of detect-

ing blobs by proximity of the coordinates of the cen-

ters. Finally, blob connected components are merged

into single blobs represented by a center and a radius.

Laplacian of Gausian LoG

The input image is convolved by a Gaussian kernel

with t representing a scale parameter. The Laplacian

of the space of the scales is considered:

∇

L = L

+ L

The goal is to highlight dark blobs (indicated by

large positive values) or bright blobs (large negative

values) of radius r =

√

2t (Lindeberg, 2015).

Difference of Gaussians DoG

The Laplacian operator of the scale space can be

approximated by ﬁnite differences:

∇

norm

L(x, y;t) ≈

∆t

(L(x, y;t + ∆t) −L(x, y;t))

This approach is known as the difference of Gaus-

sians (DoG) approach (Marr and Hildreth, 1980).

Determinant of the Hessian DoH

This operator is deﬁned by considering the Hes-

sian matrix of the scale-space representation L. The

determinant of this operator

detH

norm

L = t

−L

)

is then used to detect maxima of this operator.

The blobs in the image are deﬁned from this scale-

space maxima. Blobs deﬁned using this technique

are reported to present better properties under non-

Euclidean afﬁne transformations than the LoG and

Dog operators ((Lindeberg, 2015)).

We also took advantage of the orientation correc-

tion and used the relative position of the blobs de-

tected to merge small blobs that were very close and

could be ﬁt into a bounding box of a maximum given

size along the y axis. This step slightly improved the

results of the blob detector based on pixel intensities

and had a very small effect on the results of the DoG,

LoG and DoH blob detectors.

2.4 Classiﬁcation of Kanji Images

In the ﬁnal step of our algorithm we focused on the

classiﬁcation of the obtained kanji images to automat-

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

760

a: Opencv Simple Blob Detector b: Lagrangian Of Gaussians

c: Difference of Gaussians d: Determinant of Hessian

Figure 3: Kanji detector Methods.

ically locate the occurrences of the ”ima” kanji. Im-

age classiﬁcation has been for some time one of the

areas where artiﬁcial intelligence approaches (more

recently in their Deep Learning variants) have ob-

tained best results.

Speciﬁcally, classiﬁcation of kanji characters

(both Chinese and Japanese) is a well established re-

search area producing a large number of commercial

applications as well as research papers. Among the

ﬁrst, we ﬁnd, on the one hand the OCR applications

aimed mainly at automatic text recognition (gener-

ally working with printed characters, see, for exam-

ple, (OCRconvert, 2018) ) and on the other software

solutions aimed at students of the Japanese and Chi-

nese languages (using both printed and handwritten

characters, (see, for example (Yomiwa, 2018))).

Concerning the research papers, we will only

mention the methods that mostly resemble the ap-

proach that we have used in this paper. Basically,

we have followed closely the Deep Learning, CNN

Computer Vision and Deep Learning Tools for the Automatic Processing of Wasan Documents

761

approach used in (Tsai, 2016; Simonyan and Zisser-

man, 2014) and also broadly followed in (Gre¸bowiec

and Protasiewicz, 2018) in terms of the architecture.

However, we have also considered data augmentation

following (Cires¸an et al., 2010) and used the publicly

available ETL Database (ETL, 2018) to obtain kanji

image to retrain our deep learning architecture.

Convolutional neural networks (CNN) consists of

layers that transform an input image into a value that

is then used to classify it in one of the existing classes.

In our case, each kanji candidate image is assigned to

one of the existing kanji classes. These layers contain

element-wise operations to introduce non-linearities

and hence increase the capacity of the network to rep-

resent concepts as complex as the shape of a hand-

written character. See (Tsai, 2016) for a descrip-

tion of the different types of layers and their func-

tion. Regarding the shape (architecture) of our net-

work, we followed the ideas described in (Tsai, 2016;

Simonyan and Zisserman, 2014). Convolutional lay-

ers are followed by activation and max-pooling layers.

This is repeated with convolutional layers of increas-

ing size. Finally, a fully-connected layer is used be-

fore computing the ﬁnal class scores with a softmax

function. Non-linearity is ensured by placing ReLU

layers both after every convolutional layer.

In order to train our network we used images of

handwritten Japanese characters obtained from the

ETL Character Database. Speciﬁcally we used the

images corresponding to 876 different kanji charac-

ters contained in ﬁle ETL-8. We also explored the

possibility of using Elastic distortions for data aug-

mentation as presented in (Cires¸an et al., 2010). How-

ever, we were not able to improve our classiﬁcation

results. The most likely explanation is that as our

training dataset already has a large enough number

of images per class (around 80).

3 RESULTS AND DISCUSSION

In this section we will analyze the performance of the

algorithms presented throughout the paper. All tests

were run on a 2.6GHz computer under a linux Ubuntu

environment with an NVIDIA GTX 1080 graphics

board. The computer vision techniques were imple-

mented using the SciKit (van der Walt et al., 2014)

and OpenCV (Bradski, 2000) libraries. The CNN pre-

sented where implemented using Keras and Tensor-

ﬂow.

Regarding the images used in the experiments, all

373 images containing the ”ima” character were used.

In the experiments that required manual checking, a

subset of images was chosen due to the long time

needed for manual checks. Speciﬁcally, this involved

25 images per method in for the counting of correctly

detected kanji and false positive kanji in section 3.2

and 100 images per method for the rest of checks in

section 3.2 (detection of ”ima”) and all the checks in

section 3.3. Some of these images contained more

than one ”ima” kanji.

3.1 Running Time Considerations

Our pipeline consisted of four steps, the two ini-

tial steps were common and took, on average: 3.72s

(noise removal) and 7.73s (orientation correction).

Figure 4 presents these two runtimes as well as the

subsequent kanji detection (for the four methods stud-

ied) and the classiﬁcation of the resulting candidate

kanji regions with the used CNN. All times are pre-

sented in seconds. Apart from these times, training

the CNN took approximately 190 minutes. This pro-

cess only needs to be performed once and, thus, we

have left it out of the runtime discussion.

Figure 4: Runtimes for the 4 algorithm variants tested.

The simple blob detector was fastest overall, al-

though, as we will see, it did not produce satisfac-

tory results. For the rest of methods, the kanji de-

tection step (using blob detectors) determined the ﬁ-

nal time, with methods producing more detection (in-

cluding false positive ones) taking longer to run both

the detection and classiﬁcation steps. Overall, even

the slowest pipeline took about 3 minutes to run in

a single image (LoG, detection 145.49s, classiﬁca-

tion 30.70s). This shows that the automatic process-

ing of large quantities of Wasan documents is feasible

although code optimization and multi-process execu-

tion will need to be considered.

3.2 Kanji Detection Experiment

In this ﬁrst experiment we aimed at determining

which of the blob detectors produced best results for

our problem. We automatically counted the num-

ber of blobs detected and then proceeded to manually

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

762

Table 1: Blob detector methods for kanji Segmentation. Summary Table.

Method Name 1 Detected kanji % False Positive kanji % ”ima” kanji detection %

Opencv Simple Blob Detector 63.30 4.43 69.35

LoG 71.57 2.42 96.71

DoG 74.21 3.06 93.54

DoH 86.91 1.50 83.87

count the number of kanji that had not been detected.

We note these ”False negative” (FN), detections. We

also manually counted what of the detected blobs did

not contain any kanji and noted them ”False positive”

(FP), detections. Using these values we were able to

determine the actual number of kanji present in each

page. Then we were able to compute the percentage

of kanji detected in every page (over the total num-

ber of kanij). The average over the test set for this

correct detection percentage is presented in Table 1

second column. The ratio of FN detections over the

total number of kanji is presented in column three.

Finally, the average percentage of occurrence of the

”ima” kanji correctly detected is presented in column

four. A typical page had around 200 kanij characters

with only one occurrence of the ima kanji although

in a small number of cases the ”ima” kanji appeared

twice in the same page.

The table shows how the best performance in

terms of the detection of the ”ima” kanji was obtained

by the LoG blob detector. It’s failure rate of 3.29%

ranked higher above all method and illustrates the ad-

equacy of scale space blob detectors for our very spe-

ciﬁc problem. As this was the main goal of the current

paper, this is the method that we will use in our full-

pipeline experiment in section 3.3.

To complement the analysis, we observe that in

terms of the detection of all the kanji in a page, the

DoG and, specially, the DoH methods performed bet-

ter than the LoG blob detector. however, specially for

the DoH method, this higher capacity to detect kanji

regions (expressed both in higher percentage of de-

tected kanji and lower FP kanji detection) did not re-

sult in a better rate of location of the ”ima” kanji. This

was slightly surprising to us as in the data analyzed

the DoH method seemed to perform best over all but

had a tendency to ”miss” the one kanji that mattered in

our application. This is likely explained by the shape

of the ”ima” kanji, which is small and made up of four

or ﬁve (depending on the writer) small connected re-

gions. Visual inspection of the results seemed to indi-

cated that the DoH method was very good at detecting

kanji regions but had some trouble with smaller kanji,

specially if they were not totally surrounded by other

kanji as is often the case with ”ima”.

This differences in behaviour of the different blob

detector method indicate the possibility that combin-

ing their outputs might result in future work in im-

proved results for ”ima” recognition as well as the

development of an automatic tool for the automatic

extraction of the full text of the description of Wasan

problems.

3.3 Classiﬁcation of the ”ima” Kanji

After considering the LoG blob detector as the best

suited to locate the ”ima” kanji, in this section we an-

alyze the results of running its output through the last

step of our pipeline.

Speciﬁcally, the CNN described in section 2.4 was

asked to classify every candidate kanji region in one

of the 876 kanji classes that we had trained it with.

During the trainig of the CNN the system achieved

a 99.12% accuracy that is in line with previous kanji

recognition applications (Tsai, 2016; Cires¸an et al.,

2010; Gre¸bowiec and Protasiewicz, 2018).

Figure 5 shows the percentage of: 1) Success,

deﬁned as the percentage of correct ”ima” kanji de-

tection without wrongly detection any other kanji as

”ima” at the same time. 2) Success with FP, the per-

centage of detections that happened along with the

misclassiﬁed of some other kanji as ”ima”. 3) De-

tection Failure: The percentage of failed location of

the ”ima” kanji because it was not inside a kanji can-

didate region and 4) Classiﬁcation Failure: The per-

centage of cases in which a properly detected occur-

rence of the ”ima” kanji was classiﬁed as some other

kanij class.

Figure 5: Summary of results.

The results show how the high success of the de-

tection of the ”ima” kanji (93.02% total success rate

Computer Vision and Deep Learning Tools for the Automatic Processing of Wasan Documents

763

including success with FP) makes our method suitable

to achieve our goals in this paper. There is, however,

still room for improvement. Speciﬁcally, the 13.95%

FP rate might result in extended human review time

over the results to ﬁlter the wrong classiﬁcation. The

most likely explanation for these classiﬁcation errors

is that the CNN that we used had not been trained

to include all possible kanji variants. We checked that

the kanji most often confused with ”ima” did not have

its own class properly deﬁned. Consequently, extend-

ing our kanji image library will likely result in a re-

duced false positive misclassiﬁcation rate. The failure

rate of 6.97% is acceptable in the context of our ap-

plication and can be attributed in almost equal parts to

failure in classiﬁcation and detection. Regarding the

failure due to the misclassiﬁcatio of a correctly de-

tected ”ima” kanji, a possible issue is that the ground

truth present in the ETL database (ETL, 2018) cor-

responds to modern writing of the kanji which differ

slightly from some of the ways in which they were

written in the Edo period.

4 CONCLUSIONS AND FUTURE

WORK

We have presented what is, to the best of our knowl-

edge the ﬁrst work using computer vision tools and

deep learning for the analysis of Wasan documents.

We have presented a tool to select an important struc-

tural element, the ”ima” character that indicates the

position of the start of the textual problem and lies

beside or underneath the graphical description of the

problem. The application presented in this paper rep-

resents, thus, the ﬁrst step in the construction of a

data base of Wasan documents that can be search-

able in terms of the geometric properties of each prob-

lem. The results presented in this work illustrate how

our algorithms are able to overcome the image quality

problems present in the images (mainly noise in data,

low resolution and orientation tilt) and locate the oc-

currences of the ”ima” character with a high success

rate of 93.02%. Among the issues that remain is the

presence of a 13.95% of false positive detections and

the 6.97% of cases in which the detection failed.

Both issues will be addressed in future work.

In order to reduce False positives, we will widen

our Kanji database so that it includes more than the

present 876 kanji classes. Regarding the cases in what

the algorithm failed, roughly half where due to detec-

tion failure and the other half to classiﬁcation failure.

The ﬁrst issue will be addressed by combining the re-

sults of the three blob detectors obtaining best results

in Experiment1 (LoG,DoG,DoH). while the second

can be addressed by tailoring the ground truth for the

”ima” kanji so it is closer to the writing peculiarities

of the EDO period. The experiments included in this

paper also indicate that modifying the code used to

produce the experiments presented to obtain the full

text of the description of each Wasan problems is fea-

sible with minor modiﬁcations of the current code.

REFERENCES

Agrawal, M. and Doermann, D. (2013). Clutter noise re-

moval in binary document images. IJDAR, 16(4):351–

369.

Arnia, F., Fardian, Muchallil, S., and Munadi, K. (2015).

Noise characterization in ancient document images

based on DCT coefﬁcient distribution. In ICDAR

2015, pages 971–975.

Barna, N. H., Erana, T. I., Ahmed, S., and Heickal, H.

(2018). Segmentation of heterogeneous documents

into homogeneous components using morphological

operations. In 17th IEEE/ACIS 2018, Singapore, Sin-

gapore, June 6-8, 2018, pages 513–518.

Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Jour-

nal of Software Tools.

Cires¸an, D. C., Meier, U., Gambardella, L. M., and Schmid-

huber, J. (2010). Deep, big, simple neural nets for

handwritten digit recognition. Neural Computation,

22(12):3207–3220. PMID: 20858131.

ETL (2018). Etl character database. http://etlcdb.db.

aist.go.jp/. Accessed: 2018-11-20.

Fernandes, L. A. and Oliveira, M. M. (2008). Real-time

line detection through an improved hough transform

voting scheme. Pattern Recognition, 41(1):299 – 314.

Goyal, B., Dogra, A., Agrawal, S., and Sohi, B. S. (2018).

Two-dimensional gray scale image denoising via mor-

phological operations in NSST domain & bitonic ﬁl-

tering. Future Generation Comp. Syst., 82:158–175.

Gre¸bowiec, M. and Protasiewicz, J. (2018). A neural frame-

work for online recognition of handwritten kanji char-

acters. In 2018 FedCSIS, pages 479–483.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-

agenet classiﬁcation with deep convolutional neural

networks. In Proceedings of the 25th International

Conference on Neural Information Processing Sys-

tems - Volume 1, NIPS’12, pages 1097–1105, USA.

Curran Associates Inc.

Lindeberg, T. (2015). Image matching using generalized

scale-space interest points. Journal of Mathematical

Imaging and Vision, 52(1):3–36.

Marr, D. and Hildreth, E. (1980). Theory of Edge Detec-

tion. Proceedings of the Royal Society of London Se-

ries B, 207:187–217.

Martzloff, J.-C. (1990). A survey of japanese publica-

tions on the history of japanese traditional mathemat-

ics (wasan) from the last 30 years. Historia Mathe-

matica, 17(4):366 – 373.

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

764

Matas, J., Galambos, C., and Kittler, J. (1998). Progressive

probabilistic hough transform.

Matsuoka, M. (1996). Wasan, and its cultural background.

In Ogawa, T., Miura, K., Masunari, T., and Nagy, D.,

editors, Katachi

Symmetry, pages 341–345, Tokyo.

Springer Japan.

OCRconvert (2018). Japanese ocr (optical character

recognition). https://www.ocrconvert.com/japanese-

ocr. Accessed: 2018-11-20.

Simonyan, K. and Zisserman, A. (2014). Very deep con-

volutional networks for large-scale image recognition.

CoRR, abs/1409.1556.

Tekleyohannes, M. K., Weis, C., Wehn, N., Klein, M.,

and Siegrist, M. (2018). A reconﬁgurable accelerator

for morphological operations. In 2018 IEEE IPDPS

Workshops 2018, Vancouver, BC, Canada, May 21-25,

2018, pages 186–193.

Tsai, C. (2016). Recognizing handwritten japanese charac-

ters using deep convolutional neural networks.

van der Walt, S., Sch

onberger, J. L., Nunez-Iglesias, J.,

Boulogne, F., Warner, J. D., Yager, N., Gouillart, E.,

Yu, T., et al. (2014). scikit-image: image processing

in Python. PeerJ, 2:e453.

Yomiwa (2018). Real-time ofﬂine camera translator for

japanese. http://www.yomiwa.net/. Accessed: 2018-

11-20.

YUWasanDB (2018). Yamagata university wasan sakuma

collection (japanese). http://www2.lib.yamagata-

u.ac.jp/mainlib/rarebooks/sakuma-index.html . Ac-

cessed: 2018-11-20.

Computer Vision and Deep Learning Tools for the Automatic Processing of Wasan Documents

765