values of t will result in faster processing but lower
accuracy, while larger values of t will yield better
results but more keypoints which some may not be-
long to areas of interest. The proposed method shows
better performance than the original SIFT matching
scheme.
5 CONCLUSIONS
In this paper we propose an alternative method for
matching SIFT keypoints using their descriptors. The
method manages to reduce the amount of keypoints
used for further processing on the document images.
The proposed method applies an iterative process that
manages to eliminate more than 99% of the keypoints
while, on the same time, the remaining keypoints are
located in the areas of interest. The method in this
paper is suggested as a first step for word spotting ap-
plications that follow a segmentation-free approach.
It allows the reduction of the keypoints significantly,
which can lead to less document areas to be searched,
thus speeding up the entire process. The proposed
method as mentioned throughout this paper is not a
complete word spotting method. This is not the idea
behind it. Therefore, it could be possible to apply it
on other research areas where SIFT is used and there
is the need to discard non-relevant keypoints so as to
speed-up the entire process. The value of the thresh-
old t, can be chosen based on the needs of the un-
derlying task. The various values of t allows finding
keypoints with stronger relations between them, thus
leading to keypoints that belong to correct word in-
stances, as far as word spotting is concerned, or any
other type of information we need to locate on an im-
age.
ACKNOWLEDGEMENTS
This work has been funded by the German Research
Foundation (DFG) within the scope of the Collabora-
tive Research Centre (SFB 950) at the Centre for the
Study of Manuscript Cultures (CSMC) at Hamburg
University.
REFERENCES
Aldavert, D. and Rusi
˜
nol, M. (2018). Synthetically gen-
erated semantic codebook for bag-of-visual-words
based word spotting. In 13th IAPR International
Workshop on Document Analysis Systems, DAS 2018,
Vienna, Austria, April 24-27, 2018, pages 223–228.
Aldavert, D., Rusi
˜
nol, M., Toledo, R., and Llad
´
os, J. (2015).
A study of bag-of-visual-words representations for
handwritten keyword spotting. IJDAR, 18(3):223–
234.
Almaz
´
an, J., Gordo, A., Forn
´
es, A., and Valveny, E. (2014).
Word spotting and recognition with embedded at-
tributes. IEEE Trans. Pattern Anal. Mach. Intell.,
36(12):2552–2566.
Barakat, B. K., Alaasam, R., and El-Sana, J. (2018). Word
spotting using convolutional siamese network. In 13th
IAPR International Workshop on Document Analy-
sis Systems, DAS 2018, Vienna, Austria, April 24-27,
2018, pages 229–234.
Bolelli, F., Borghi, G., and Grana, C. (2017). Historical
handwritten text images word spotting through slid-
ing window HOG features. In Image Analysis and
Processing - ICIAP 2017 - 19th International Confer-
ence, Catania, Italy, September 11-15, 2017, Proceed-
ings, Part I, pages 729–738.
Fujiwara, Y., Okamoto, T., and Kondo, K. (2013). SIFT fea-
ture reduction based on feature similarity of repeated
patterns. In International Symposium on Intelligent
Signal Processing and Communication Systems, IS-
PACS 2013, Naha-shi, Japan, November 12-15, 2013,
pages 311–314.
Ghosh, S. K. and Valveny, E. (2015). A sliding win-
dow framework for word spotting based on word at-
tributes. In Pattern Recognition and Image Analysis
- 7th Iberian Conference, IbPRIA 2015, Santiago de
Compostela, Spain, June 17-19, 2015, Proceedings,
pages 652–661.
Jenckel, M., Bukhari, S. S., and Dengel, A. (2016). anyocr:
A sequence learning based OCR system for unlabeled
historical documents. In 23rd International Confer-
ence on Pattern Recognition, ICPR 2016, Canc
´
un,
Mexico, December 4-8, 2016, pages 4035–4040.
Kim, S., Park, S., Jeong, C., Kim, J., Park, H., and Lee,
G. (2005). Keyword spotting on korean document im-
ages by matching the keyword image. In Digital Li-
braries: Implementing Strategies and Sharing Experi-
ences, volume 3815, pages 158-166.
Kolcz, A., Alspector, J., Augusteijn, M., Carlson, R., and
Popescu, G. V. (2000). A line-oriented approach to
word spotting in handwritten documents. Journal of
Pattern Analysis and Applications, 3(2):153-168.
Konidaris, T., Kesidis, A. L., and Gatos, B. (2016). A
segmentation-free word spotting method for histori-
cal printed documents. Pattern Analysis and Applica-
tions, 19(4):963–976.
Krishnan, P., Dutta, K., and Jawahar, C. V. (2018).
Word spotting and recognition using deep embedding.
In 13th IAPR International Workshop on Document
Analysis Systems, DAS 2018, Vienna, Austria, April
24-27, 2018, pages 1–6.
Leydier, Y., Ouji, A., LeBourgeois, F., and Emptoz, H.
(2009). Towards an omnilingual word retrieval sys-
tem for ancient manuscripts. Pattern Recognition,
42(9):2089-2105.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision, 60(2):91-110.
Efficient Keypoint Reduction for Document Image Matching
669