Efﬁcient Keypoint Reduction for Document Image Matching

Thomas Konidaris

, Volker M

argner

1,2

, Hussein Adnan Mohammed

and H. Siegfried Stiehl

1,3

Centre for the Study of Manuscript Cultures, Universit

at Hamburg, Hamburg, Germany

Technische Universit

at Braunschweig, Braunschweig, Germany

Department of Informatics, Universit

at Hamburg, Hamburg, Germany

maergner@ifn.ing.tu-bs.de, stiehl@informatik.uni-hamburg.de

Keywords:

Document Analysis, Word Spotting, SIFT Features, Image Matching.

Abstract:

In this paper we propose a method for eliminating SIFT keypoints in document images. The proposed method

is applied as a ﬁrst step towards word spotting. One key issue when using SIFT keypoints in document images

is that a large number of keypoints can be found in non-textual regions. It would be ideal if we could eliminate

as much as irrelevant keypoints as possible in order to speed-up processing. This is accomplished by altering

the original matching process of SIFT descriptors using an iterative process that enables the detection of

keypoints that belong to multiple correct instances throughout the document image, which is an issue that the

original SIFT algorithm cannot tackle in a satisfactory way. The proposed method manages a reduction over

99% of the extracted keypoints with satisfactory performance.

1 INTRODUCTION

Document image analysis has gained a lot of atten-

tion due to the increased need of exploitation of the

several manuscript collections found worldwide. Fur-

thermore, the technological advances in terms of im-

age acquisition and digitization have given access to

a vast amount of digital material. However, process-

ing this information manually requires a signiﬁcant

amount of time as well as effort in order to complete

the underlying tasks. Furthermore, it is not uncom-

mon, due to this load, some of the tasks to be thought

of as non-realistic to be processed in a manual way

due to the time they require to complete. This re-

sulted to the development of computer vision methods

that can assist in a number of tasks and also, impor-

tantly, allow the batch processing of large amount of

information in much less time than the manual non-

computer based way. Tasks like, writer/scribe iden-

tiﬁcation, manuscript indexing through word spotting

and layout analysis are among the tasks that can be

solved through the use of computer vision methods

and have become active research areas on the ﬁeld of

computer vision and more speciﬁcally, on the area of

document image analysis.

As mentioned above, one of the tasks that needs to

be tackled is that of document indexing. Although,

in contemporary documents Optical Character Recog-

nition (OCR) methods have been dominant, this is

not the case when we refer to historical document

images. Various factors can severely affect the per-

formance of OCR systems rendering the whole pro-

cess as extremely challenging. Although there are at-

tempts to use OCR in historical documents(Jenckel

et al., 2016), there is still a lot of ground to cover until

we claim that we have approached the problem in a

satisfactory way. An alternative to OCR for historical

document indexing is word spotting. Word spotting

refers the the detection of the desired information di-

rectly on the document images without any OCR.

One of the most common ways to perform word spot-

ting is using feature-based methods where from the

query keywords and the target document images a,

usually large, number of features is extracted. Some

of these features belong to the information that we

desire to detect while the rest, which are the majority,

do not belong to the correct instances that we want

to spot. A very popular keypoint detection and de-

scriptor algorithm is Scale Invariant Feature Trans-

form (SIFT)(Lowe, 2004). Usually, the number of

SIFT keypoints that are extracted from document im-

ages is large and therefore, in this paper we present

a method for eliminating SIFT keypoints from doc-

ument images that do not belong to the areas that

we want to detect. The application of the proposed

method is presented as a ﬁrst step of a word spotting

664

Konidaris, T., Märgner, V., Mohammed, H. and Stiehl, H.

Efﬁcient Keypoint Reduction for Document Image Matching.

DOI: 10.5220/0007412006640670

In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2019), pages 664-670

ISBN: 978-989-758-351-3

process.

The rest of the paper is organized as follows: In Sec-

tion 2 we present related work and in Section 3 we

give a detailed description of the proposed method.

In Section 4 experimental results are shown and in

Section 5 conclusions are drawn.

2 RELATED WORK

In this section we present the different works on

the ﬁeld of word spotting and in particular Query-

by-Example(QBE) word spottting. The litera-

ture is divided into two main categories, namely,

segmentation-based and segmentation-free. The for-

mer employs the segmentation of the document im-

ages into lines (Kolcz et al., 2000)(Marcolino et al.,

2000)(Mondal et al., 2016), words(Rath and Man-

matha, 2003) or characters (Kim et al., 2005), while

the latter does not apply any kind of segmentation

on the document images (Leydier et al., 2009)(Rusi-

nol et al., 2011)(Konidaris et al., 2016)(Mhiri et al.,

2018). Deep learning is nowadays widely used and

word spotting has its share of attention. Various meth-

ods have been proposed that use CNNs and other

deep representations(Barakat et al., 2018)(Sudholt

and Fink, 2016)(Zhong et al., 2016)(Krishnan et al.,

2018).

The proposed method is a feature-based method for

word spotting and uses SIFT(Lowe, 2004) keypoints

and descriptors. There are various methods that are

also feature-based and try to tackle the task of word

spotting. In (Zagoris et al., 2017) word spotting is per-

formed using document oriented local features. HOG

features are used in the methods proposed in (Bolelli

et al., 2017)(Thontadari and Prabhakar, 2016). Bag-

of-visual-words (BoVW) for word spotting is em-

ployed in (Aldavert and Rusi

nol, 2018)(Shekhar and

Jawahar, 2012)(Rothacker and Fink, 2015).

Here we will present the various works on word spot-

ting that use SIFT features. Concerning SIFT key-

point reduction, the work presented in (Fujiwara et al.,

2013) tries to tackle this speciﬁc task. However,

the authors try to reduce the number of keypoints

on an image assuming that if a number of keypoints

have close similarity with a keypoint on the same

image, than those are removed since they constitute

a repeated pattern. Concering document images, in

(Aldavert et al., 2015) word spotting in handwritten

documents is presented. The authors use SIFT fea-

tures with a Bag-of-Visual-Words (BoVW) represen-

tation. The method applies segmentation of the doc-

uments on word level. Another method that used

SIFT and BoVW is proposed in (Rusinol et al., 2011).

This work follows a segmentation-free approach al-

though they apply a grid-based segmentation of the

documents in order to extract the desired features.

Yalniz and Manmatha(Yalniz and Manmatha, 2012)

present a word spotting method based on SIFT de-

scriptors applied to FAST(Rosten and Drummond,

2006) keypoints. The extracted features are fur-

ther quantized using K-Means and the matching is

performed using the Longest Common Subsequence

(LCS) method. A Bag-of-Features (BoF) represen-

tation is presented in (Rothacker et al., 2013). The

method uses SIFT descriptors to feed an HMM in

order to apply word spotting in handwritten docu-

ment following a segmentation-free approach. Ghosh

and Valveny (Ghosh and Valveny, 2015) present a

segmentation-free word spotting method based on

word attributes. Query words are encoded using Fis-

cher Vector representation and are used together with

pyramidal histogram of characters labels (PHOC) to

learn SVM-based attribute models. For the matching

process a sliding window is applied. The word at-

tributes used as the representation involve the calcula-

tion of SIFT descriptors. Another approach that uses

PHOC, Fischer Vectors and densely extracted SIFT

descriptors is proposed in (Almaz

an et al., 2014). The

SIFT features are extracted by a variable size patch

and their dimension is reduced using PCA. In the

work of Sﬁkas et. al.(Sﬁkas et al., 2015) a Gaussian

mixture model (GMM) is trained using SIFT descrip-

tors. Fisher vectors are then calculated for each im-

age as a function of their SIFT description and the

gradients of the GMM with respect to its parameters.

This results to a ﬁxed-length, highly discriminative

representation, that can be seen as an augmented bag

of visual words description that encodes higher order

statistics.

3 PROPOSED METHOD

In this paper we propose an alternative matching

scheme for matching SIFT descriptors as applied as

a ﬁrst step towards word spotting. The task is to ﬁnd

relevant keypoints on a target document image when

keypoints of a query keyword image are matched

against it. Ideally, the detected keypoints on the target

document image will be part of the correct instance of

the query keyword. The propose method aims to re-

duce the amount of keypoints detected in the images

but on the same time keeping the keypoints that are

most relevant to the keypoints of the query keyword

image. The method uses an iterative process to reduce

the keypoints following a different approach than the

one applied by the SIFT algorithm as proposed by

Efﬁcient Keypoint Reduction for Document Image Matching

665

Lowe(Lowe, 2004). Instead of getting only a single

keypoint matched in the target image for every key-

point in the query keyword image as it occurs when

using the original SIFT, we try to get all the strong

keypoints in the target image for every keypoint in the

query keyword image using the same matching crite-

rion as proposed in the original SIFT algorithm.

In the original paper, every descriptor in the query im-

age is compared with all the descriptors in the target

image. If the ratio of the two closest descriptors for

every query descriptor is less than a certain threshold

t than the keypoint that correspond to the closest de-

scriptor is kept as a valid keypoint. The distance ratio

threshold is shown in Equation 1.

< t (1)

where d

and d

are the distances of the two clos-

est keypoint descriptors of the document image to

a query keyword keypoint descriptor and threshold

t = 0.8. The original paper claims that this matching

scheme with t = 0.8 eliminates 90% of the false key-

point matches while discarding 5% of the correct key-

point matches. Although this may be the case where

there is a single object located in the images, when

it comes to document images there is a very impor-

tant issue that needs to be taken under consideration.

In document images the desired information that we

need to detect may have multiple correct instances in

the same page. This is a very common scenario in

word spotting methods. Consider the following ex-

ample as illustrated in Figure 1. We want to detect the

keyword “Begriffe” on a document image. The case

follows a segmentation-free word spotting approach.

In this particular example, the query keyword image

is one of the instances found on the page. The origi-

nal SIFT matching algorithm manages to successfully

detect the word on the document image, since it is the

query keyword used for matching, but fails to do so

with the rest of the correct instances. The bounding

boxes on the image indicate the correct instances of

the query keyword as found in the ground truth.

In the proposed method, concerning historical docu-

ment images, we follow a different approach. The

idea lies on the fact that it is very common to have

multiple instances of the information we want to lo-

cate. For that reason we want to be able to use the

matching of the descriptors as proposed in the origi-

nal SIFT algorithm but also enable the process to ﬁnd

all the relevant keypoints that belong to other correct

instances of the query keyword image. Based on that

thought, we introduce an iterative process that enables

us to succeed in the aforementioned task. For every

query keypoint we perform the original SIFT as de-

Figure 1: Matching the query keyword “Begriffe” with the

document image. The keyword is taken from the document

image. SIFT algorithm manages to detect only this word

omitting the other correct instances found in the document

image.

ﬁned in Equation 1. If there are keypoints that meet

the threshold criterion, these are kept as valid for fur-

ther processing and the process is repeated. At the

next iteration, we use the same query keypoint but this

time, without taking into consideration the already

detected valid keypoints. This will enable the algo-

rithm to detect other strong keypoints for the same

query keypoint. The assumption is that there might

be other keypoints on the document image that sat-

isfy the threshold criterion for the same query key-

point that belong to other correct instances of the the

query keyword image. This iterative process over-

comes the limitation of the original SIFT algorithm

to detect multiple instances of the correct information

on a document image, especially when a document

image contains the query keyword image and other

correct instances as shown in Figure 2. Algorithm 1

shows the various steps of the proposed method.

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

666

(a) original SIFT (b) Proposed, t = 0.80 (c) Proposed, t = 0.85

(d) Proposed, t = 0.90 (e) Proposed, t = 0.95 (f) Proposed, t = 0.99

Figure 2: The proposed method as compared with the original matching of the SIFT algorithm. The query word is taken

directly from the illustrated page. (a) is the result of the original SIFT matching where the algorithm correctly spots the

query word image but fails to spot the additional correct instances. (b)-(e) the results of the proposed method with t =

{0.80, 0.85, 0.90, 0.95, 0.99} respectively. The green bounding boxes indicate correct instances of the word as found in the

ground truth. For clarity only a portion of the document image is shown.

Algorithm 1: Iterative SIFT Matching.

1 Calculate descriptors for query and document

image

2 init valid pts

3 ∀ i ∈ query descr

4 while matched pts do

5 dist = kquery descr

, image descrk

6 sort(dist)

7 if

≤ t

8 idx = argmin(d

)

9 matched pts.append(image pts(idx))

10 remove image descr(idx)

11 else

12 matched pts = false

13 end

4 EXPERIMENTAL RESULTS

The experiments involve checking whether the re-

duction of the keypoints is efﬁcient in the sense that

the remaining keypoints are located inside the ground

truth bounding boxes. For the purpose of the ex-

periments, the setup proposed in (Konidaris et al.,

2016) is used. There are 100 document images(von

Eckartshausen, 1778) and 100 keyword images. We

have used ﬁve different values for threshold t, 0.80,

0.85, 0.90, 0.95 and 0.99, respectively. The idea

behind the various threshold values is that if for a

query keypoint the two closest keypoints are valid,

this means that their threshold ratio will be high. Ac-

cording to our experiments this seems to hold. The

experiments concern the mean average recall (mAR),

the average number of remaining keypoints on the

document images and the ratio of the remaining key-

points found inside the ground truth bounding boxes

over the number of the remaining keypoints. The ex-

periments do not evaluate the proposed method as a

complete word spotting method. Rather, the idea is

to use the proposed method as a ﬁrst step towards

word spotting where the keypoints that remain after

the elimination can be further processed yielding the

ﬁnal results.

Efﬁcient Keypoint Reduction for Document Image Matching

667

Figure 3: Mean average recall (mAR) for the different val-

ues of t for all query keywords.

Figure 4: Average number of remaining keypoints per page

for the different values of t.

The average number of keypoints per page in the doc-

ument images used for the experiments is 16154. Fig-

ure 3 shows the results concerning the mAR which

corresponds to the number of keypoints being found

in the ground truth bounding boxes. For this calcu-

lation a true positive requires at least one keypoint to

be found inside the ground truth. Furthermore, the

ground truth follows an extended format in the sense

that not only exact match words are included but also

words that include the query keyword in whole as

their part. This is primarily because the documents

have not undergone any segmentation and the cor-

rect instance of a word can be found anywhere on the

document images including when it is part of larger

words too.

It is clear that for the various values of threshold t,

the proposed method outperforms the original SIFT

matching process. This justiﬁes the assumption that

a query keypoint may have other strong keypoint

matches on the document images that are detected us-

ing the iterative process proposed in this paper.

Figure 4 shows the mean average number of remain-

ing keypoints for all queries in the document images

for the different values of t. SIFT manages to have

Figure 5: The ratio of the remaining keypoints inside the

ground truth bounding boxes over the total number of re-

maining keypoints.

Table 1: Performance of the proposed method compared to

the original SIFT for the different values of the threshold t.

t mP

GTRatio mAR

(Prop./SIFT) (Prop./SIFT) (Prop./SIFT)

0.80 9 / 9 44.86% / 46.41% 69.26% / 64.77%

0.85 16 / 15 36.20% / 38.61% 79.58% / 73.93%

0.90 30 / 25 28.01% / 31.63% 88.24% / 81.63%

0.95 66 / 42 20.04% / 26.13% 95.20% / 87.54%

0.99 237 / 66 10.79% / 22.20% 99.45% / 94.01%

less remaining keypoints but this is reasonable since

it does not perform any kind of iterations, contrary to

the proposed method where for each query keypoint

a number of iterations is performed in order to detect

all the keypoints that satisfy the matching criterion.

Figure 5 illustrates the ratio of remaining keypoints

found in the ground truth bounding boxes to all the re-

maining keypoints. The matching scheme of the SIFT

algorithm seems to have a better ratio but this can be

justiﬁed by the less number of remaining keypoints

than the proposed method as shown in the previous di-

agram. As we have already mentioned, in the original

SIFT algorithm for every query keyword keypoint we

get only a single matching keypoint on the target doc-

ument image. Table 1 summarizes the experiments

performed between the matching scheme proposed in

this paper and the original matching scheme of the

original algorithm.

where, mP

is the mean average remaining points,

GT Ratio is the average ground truth to remaining

points ratio, and mAR is the mean average recall for

all the query keywords.

Through the above experiments we provided an in-

sight on how SIFT matching can be altered in order all

the relevant keypoints to be extracted for every query

keyword keypoint. The selection of the threshold lies

solely upon the needs of the underline task. Lower

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

668

values of t will result in faster processing but lower

accuracy, while larger values of t will yield better

results but more keypoints which some may not be-

long to areas of interest. The proposed method shows

better performance than the original SIFT matching

scheme.

5 CONCLUSIONS

In this paper we propose an alternative method for

matching SIFT keypoints using their descriptors. The

method manages to reduce the amount of keypoints

used for further processing on the document images.

The proposed method applies an iterative process that

manages to eliminate more than 99% of the keypoints

while, on the same time, the remaining keypoints are

located in the areas of interest. The method in this

paper is suggested as a ﬁrst step for word spotting ap-

plications that follow a segmentation-free approach.

It allows the reduction of the keypoints signiﬁcantly,

which can lead to less document areas to be searched,

thus speeding up the entire process. The proposed

method as mentioned throughout this paper is not a

complete word spotting method. This is not the idea

behind it. Therefore, it could be possible to apply it

on other research areas where SIFT is used and there

is the need to discard non-relevant keypoints so as to

speed-up the entire process. The value of the thresh-

old t, can be chosen based on the needs of the un-

derlying task. The various values of t allows ﬁnding

keypoints with stronger relations between them, thus

leading to keypoints that belong to correct word in-

stances, as far as word spotting is concerned, or any

other type of information we need to locate on an im-

age.

ACKNOWLEDGEMENTS

This work has been funded by the German Research

Foundation (DFG) within the scope of the Collabora-

tive Research Centre (SFB 950) at the Centre for the

Study of Manuscript Cultures (CSMC) at Hamburg

University.

REFERENCES

Aldavert, D. and Rusi

nol, M. (2018). Synthetically gen-

erated semantic codebook for bag-of-visual-words

based word spotting. In 13th IAPR International

Workshop on Document Analysis Systems, DAS 2018,

Vienna, Austria, April 24-27, 2018, pages 223–228.

Aldavert, D., Rusi

nol, M., Toledo, R., and Llad

os, J. (2015).

A study of bag-of-visual-words representations for

handwritten keyword spotting. IJDAR, 18(3):223–

234.

Almaz

an, J., Gordo, A., Forn

es, A., and Valveny, E. (2014).

Word spotting and recognition with embedded at-

tributes. IEEE Trans. Pattern Anal. Mach. Intell.,

36(12):2552–2566.

Barakat, B. K., Alaasam, R., and El-Sana, J. (2018). Word

spotting using convolutional siamese network. In 13th

IAPR International Workshop on Document Analy-

sis Systems, DAS 2018, Vienna, Austria, April 24-27,

2018, pages 229–234.

Bolelli, F., Borghi, G., and Grana, C. (2017). Historical

handwritten text images word spotting through slid-

ing window HOG features. In Image Analysis and

Processing - ICIAP 2017 - 19th International Confer-

ence, Catania, Italy, September 11-15, 2017, Proceed-

ings, Part I, pages 729–738.

Fujiwara, Y., Okamoto, T., and Kondo, K. (2013). SIFT fea-

ture reduction based on feature similarity of repeated

patterns. In International Symposium on Intelligent

Signal Processing and Communication Systems, IS-

PACS 2013, Naha-shi, Japan, November 12-15, 2013,

pages 311–314.

Ghosh, S. K. and Valveny, E. (2015). A sliding win-

dow framework for word spotting based on word at-

tributes. In Pattern Recognition and Image Analysis

- 7th Iberian Conference, IbPRIA 2015, Santiago de

Compostela, Spain, June 17-19, 2015, Proceedings,

pages 652–661.

Jenckel, M., Bukhari, S. S., and Dengel, A. (2016). anyocr:

A sequence learning based OCR system for unlabeled

historical documents. In 23rd International Confer-

ence on Pattern Recognition, ICPR 2016, Canc

un,

Mexico, December 4-8, 2016, pages 4035–4040.

Kim, S., Park, S., Jeong, C., Kim, J., Park, H., and Lee,

G. (2005). Keyword spotting on korean document im-

ages by matching the keyword image. In Digital Li-

braries: Implementing Strategies and Sharing Experi-

ences, volume 3815, pages 158-166.

Kolcz, A., Alspector, J., Augusteijn, M., Carlson, R., and

Popescu, G. V. (2000). A line-oriented approach to

word spotting in handwritten documents. Journal of

Pattern Analysis and Applications, 3(2):153-168.

Konidaris, T., Kesidis, A. L., and Gatos, B. (2016). A

segmentation-free word spotting method for histori-

cal printed documents. Pattern Analysis and Applica-

tions, 19(4):963–976.

Krishnan, P., Dutta, K., and Jawahar, C. V. (2018).

Word spotting and recognition using deep embedding.

In 13th IAPR International Workshop on Document

Analysis Systems, DAS 2018, Vienna, Austria, April

24-27, 2018, pages 1–6.

Leydier, Y., Ouji, A., LeBourgeois, F., and Emptoz, H.

(2009). Towards an omnilingual word retrieval sys-

tem for ancient manuscripts. Pattern Recognition,

42(9):2089-2105.

Lowe, D. G. (2004). Distinctive image features from scale-

invariant keypoints. International Journal of Com-

puter Vision, 60(2):91-110.

Efﬁcient Keypoint Reduction for Document Image Matching

669

Marcolino, A., Ramos, V., Ramalho, M., and Pinto, J. C.

(2000). Line and word matching in old documents. In

Fifth IberoAmerican Symposium on Pattern Recogni-

tion (SIAPR), pages 123-135.

Mhiri, M., Abuelwafa, S., Desrosiers, C., and Cheriet, M.

(2018). Hierarchical representation learning using

spherical k-means for segmentation-free word spot-

ting. Pattern Recognition Letters, 101:52–59.

Mondal, T., Ragot, N., Ramel, J., and Pal, U. (2016).

Flexible sequence matching technique: An effective

learning-free approach for word spotting. Pattern

Recognition, 60:596–612.

Rath, T. M. and Manmatha, R. (2003). Features for word

spotting in historical manuscripts. In International

Conference of Document Analysis and Recognition,

pages 218-222.

Rosten, E. and Drummond, T. (2006). Machine learning for

high-speed corner detection. In European Conference

on Computer Vision, volume 1, pages 430-443.

Rothacker, L. and Fink, G. A. (2015). Segmentation-free

query-by-string word spotting with bag-of-features

hmms. In 13th International Conference on Docu-

ment Analysis and Recognition, ICDAR 2015, Nancy,

France, August 23-26, 2015, pages 661–665.

Rothacker, L., Rusi

nol, M., and Fink, G. A. (2013). Bag-

of-features hmms for segmentation-free word spotting

in handwritten documents. In 2013 12th International

Conference on Document Analysis and Recognition,

Washington, DC, USA, August 25-28, 2013, pages

1305–1309.

Rusinol, M., Aldavert, D., Toledo, R., and Llad

os, J. (2011).

Browsing heterogeneous document collections by a

segmentation-free word spotting method. In 11th

International Conference on Document Analysis and

Recognition (ICDAR’11), pages 63-67, China.

Sﬁkas, G., Giotis, A. P., Louloudis, G., and Gatos, B.

(2015). Using attributes for word spotting and recog-

nition in polytonic greek documents. In 13th Interna-

tional Conference on Document Analysis and Recog-

nition, ICDAR 2015, Nancy, France, August 23-26,

2015, pages 686–690.

Shekhar, R. and Jawahar, C. V. (2012). Word image retrieval

using bag of visual words. In 10th IAPR Interna-

tional Workshop on Document Analysis Systems, DAS

2012, Gold Coast, Queenslands, Australia, March 27-

29, 2012, pages 297–301.

Sudholt, S. and Fink, G. A. (2016). Phocnet: A deep convo-

lutional neural network for word spotting in handwrit-

ten documents. In 15th International Conference on

Frontiers in Handwriting Recognition, ICFHR 2016,

Shenzhen, China, October 23-26, 2016, pages 277–

282.

Thontadari, C. and Prabhakar, C. J. (2016). Scale space co-

occurrence HOG features for word spotting in hand-

written document images. IJCVIP, 6(2):71–86.

von Eckartshausen, C. (1778). Aufschl

usse zur Magie aus

gepr

uften Erfahrungen

uber verborgene philosophis-

che Wissenschaften und verdeckte Geheimnisse der

Natur. Bavarian State Library.

Yalniz, I. Z. and Manmatha, R. (2012). An efﬁcient frame-

work for searching text in noisy document images.

In Proceedings of Document Analysis Systems (DAS),

pages 48-52.

Zagoris, K., Pratikakis, I., and Gatos, B. (2017). Unsu-

pervised word spotting in historical handwritten doc-

ument images using document-oriented local features.

IEEE Trans. Image Processing, 26(8):4032–4041.

Zhong, Z., Pan, W., Jin, L., Mouch

ere, H., and Viard-

Gaudin, C. (2016). Spottingnet: Learning the sim-

ilarity of word images with convolutional neural net-

work for word spotting in handwritten historical docu-

ments. In 15th International Conference on Frontiers

in Handwriting Recognition, ICFHR 2016, Shenzhen,

China, October 23-26, 2016, pages 295–300.

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

670