Page Boundary Extraction of Bound Historical Herbaria

Krishna Chandrasekar, Steven Verstockt

2020

Abstract

When digitizing bound historical collections such as herbaria it is important to extract the main page region so that it could be used for automated processing. The thickness of the herbaria books also gives rise to deformations during imaging which reduces the efficiency of automatic detection tasks. In this work we address these problems by proposing an automatic page detection algorithm that estimates all the boundaries of the page and performs morphological corrections in order to reduce deformations. The algorithm extracts features from Hue, Saturation and Value transformations of an RGB image to detect the main page polygon. The algorithm was evaluated on multiple textual and herbaria type historical collections and obtains over 94% mean intersection over union on all these datasets. Additionally, the algorithm was also subjected to an ablation test to demonstrate the importance of morphological corrections.

Download


Paper Citation


in Harvard Style

Chandrasekar K. and Verstockt S. (2020). Page Boundary Extraction of Bound Historical Herbaria. In Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: ARTIDIGH, ISBN 978-989-758-395-7, pages 476-483. DOI: 10.5220/0009154104760483


in Bibtex Style

@conference{artidigh20,
author={Krishna Chandrasekar and Steven Verstockt},
title={Page Boundary Extraction of Bound Historical Herbaria},
booktitle={Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: ARTIDIGH,},
year={2020},
pages={476-483},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009154104760483},
isbn={978-989-758-395-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 1: ARTIDIGH,
TI - Page Boundary Extraction of Bound Historical Herbaria
SN - 978-989-758-395-7
AU - Chandrasekar K.
AU - Verstockt S.
PY - 2020
SP - 476
EP - 483
DO - 10.5220/0009154104760483