number of right predictions for both text detection and
recognition tasks were manually verified and counted
for both the samples of the image. Since the im-
age quality between the two images is similar and the
same text recognition model setting is used, it makes
the obtained results comparable. It was observed that
the morphological correction improved both the lo-
calization and prediction of hand written text by 25%
on average for each image. It was also noticed that
page numbers, headings and text that were close to
the boundaries had much better results than before.
An example of the result is shown in Figure 8.
5 CONCLUSIONS
A novel page detection algorithm has been presented
which eliminates border noise by segmenting the
main page region from the rest of the image. The
importance of using HSV colour model for histori-
cal document processing was elaborated. With less
assumptions, it was showed that the page detection
could also work for complex page structures. It
was also demonstrated that the detected page poly-
gon could be used as a feature for reducing deforma-
tion. Finally, the page with reduced deformations was
proved to perform better in automatic text detection
tasks.
ACKNOWLEDGEMENTS
The research activities described in this paper were
funded by The Department of Culture, Youth &
Media, Flanders (Belgium) for the Flore de Gand
project.
REFERENCES
Albiol, A., Torres, L., and Delp, E. J. (2001). Optimum
color spaces for skin detection. Proceedings 2001
International Conference on Image Processing (Cat.
No.01CH37205), 1:122–124 vol.1.
Bukhari, S. S., Shafait, F., and Breuel, T. M. (2012). Bor-
der noise removal of camera-captured document im-
ages using page frame detection. In Iwamura, M. and
Shafait, F., editors, Camera-Based Document Analysis
and Recognition, pages 126–137, Berlin, Heidelberg.
Springer Berlin Heidelberg.
Chakraborty, A. and Blumenstein, M. (2016a). Marginal
noise reduction in historical handwritten documents –
a survey. 2016 12th IAPR Workshop on Document
Analysis Systems (DAS), pages 323–328.
Chakraborty, A. and Blumenstein, M. (2016b). Preserving
text content from historical handwritten documents.
2016 12th IAPR Workshop on Document Analysis Sys-
tems (DAS), pages 329–334.
Fan, K.-C., Wang, Y.-K., and Lay, T.-R. (2002). Marginal
noise removal of document images. Pattern Recogni-
tion, 35(11):2593 – 2611.
Goodrich, B., Albrecht, D., and Tischer, P. (2009). Algo-
rithms for the computation of reduced convex hulls. In
Nicholson, A. and Li, X., editors, AI 2009: Advances
in Artificial Intelligence, pages 230–239, Berlin, Hei-
delberg. Springer Berlin Heidelberg.
James, S. P. (2013). Face image retrieval with hsv color
space using clustering techniques.
Markus, D., Florian, K., and Basilis, G. (2019). ICDAR
2019 Competition on Baseline Detection (cBAD).
Sanchez-Cuevas, M. C., Aguilar-Ponce, R. M., and
Tecpanecatl-Xihuitl, J. L. (2013). A comparison of
color models for color face segmentation. Procedia
Technology, 7:134 – 141. 3rd Iberoamerican Confer-
ence on Electronics Engineering and Computer Sci-
ence, CIIECC 2013.
Shafait, F. and Breuel, T. (2010a). A simple and effective
approach for border noise removal from document im-
ages. pages 1 – 5.
Shafait, F. and Breuel, T. (2010b). A simple and effective
approach for border noise removal from document im-
ages. pages 1 – 5.
Tensmeyer, C., Davis, B., Wigington, C., Lee, I., and Bar-
rett, B. (2017). Pagenet: Page boundary extraction in
historical handwritten documents. pages 59–64.
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M.,
Reddy, T., Cournapeau, D., Burovski, E., Peterson, P.,
Weckesser, W., Bright, J., van der Walt, S. J., Brett,
M., Wilson, J., Jarrod Millman, N., Nelson, A. R. J.,
Jones, E., Kern, R., Larson, E., Carey, C., Polat,
˙
I.,
Feng, Y., Moore, E. W., Vand erPlas, J., Laxalde,
D., Perktold, J., Cimrman, I., Quintero, E. A., Harris,
C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F.,
van Mulbregt, P., and Contributors, S. . . (2019). SciPy
1.0–Fundamental Algorithms for Scientific Comput-
ing in Python. arXiv e-prints, page arXiv:1907.10121.
Page Boundary Extraction of Bound Historical Herbaria
483