
Figure 9: Examples of the Structured Data Files for Belfort
Civil Registers of Births.
riety of preprocessing steps, including binarization,
skew correction, and segmentation. Despite the nu-
merous impediments posed by these historical docu-
ments, such as different text styles, marginal annota-
tions, and the hybrid nature of the texts (printed and
handwritten text), we have developed distinctive so-
lutions that successfully carried out the segmentation
process. The creation of an automatic verification tool
and the XML file generator ensures that transcriptions
are properly formatted and aligned with their corre-
sponding image components. Our results show a high
level of accuracy in text line segmentation, which is
critical for the effective structuring of these valuable
historical documents
In conclusion, the work presented in this paper
makes a significant contribution to the field of hand-
written text recognition, particularly in the context
of historical documents, by introducing a new valu-
able structured dataset. However, employing artificial
intelligence techniques in the preprocessing phase
could further refine accuracy, especially in the seg-
mentation process. Future research will focus on de-
veloping an automatic document layout analysis tool
and a deep learning model to transcribe the remaining
records automatically, with the ultimate goal of facil-
itating the recognition and study of this rich cultural
heritage.
REFERENCES
Ahmed, A. S. (2018). Comparative study among sobel, pre-
witt and canny edge detection operators used in image
processing. J. Theor. Appl. Inf. Technol, 96(19):6517–
6525.
Al-Khalidi, F. Q., Alkindy, B., and Abbas, T. (2019). Ex-
tract the breast cancer in mammogram images. Tech-
nology, 10(02):96–105.
Alaei, A., Pal, U., and Nagabhushan, P. (2011). A new
scheme for unconstrained handwritten text-line seg-
mentation. Pattern Recognition, 44(4):917–928.
AlKendi, W., Gechter, F., Heyberger, L., and Guyeux, C.
(2024). Advancements and challenges in handwritten
text recognition: A comprehensive survey. Journal of
Imaging, 10(1):18.
Binmakhashen, G. M. and Mahmoud, S. A. (2019). Docu-
ment layout analysis: a comprehensive survey. ACM
Computing Surveys (CSUR), 52(6):1–36.
Biswas, B., Bhattacharya, U., and Chaudhuri, B. B. (2023).
Document image skew detection and correction: A
survey.
Bugeja, M., Dingli, A., and Seychell, D. (2020). An
overview of handwritten character recognition sys-
tems for historical documents. Rediscovering Her-
itage Through Technology: A Collection of Innovative
Research Case Studies That Are Reworking The Way
We Experience Heritage, pages 3–23.
Carbune, V., Gonnet, P., Deselaers, T., Rowley, H. A.,
Daryin, A., Calvo, M., Wang, L.-L., Keysers, D.,
Feuz, S., and Gervais, P. (2020). Fast multi-language
lstm-based online handwriting recognition. Interna-
tional Journal on Document Analysis and Recognition
(IJDAR), 23(2):89–102.
Chacko, B. P. and P., B. A. (2010). Pre and post processing
approaches in edge detection for character recogni-
tion. In 2010 12th International Conference on Fron-
tiers in Handwriting Recognition, pages 676–681.
Chakraborty, A. and Blumenstein, M. (2016). Marginal
noise reduction in historical handwritten documents–
a survey. In 2016 12th IAPR Workshop on Document
Analysis Systems (DAS), pages 323–328. IEEE.
Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K.
(2007). Image denoising by sparse 3-d transform-
domain collaborative filtering. IEEE Transactions on
image processing, 16(8):2080–2095.
Delsalle, P. (2009). Histoires de familles. Les registres
paroissiaux et d’
´
etat civil, du Moyen
ˆ
Age
`
a nos jours:
D
´
emographie et g
´
en
´
ealogie (Family History: Parish
and Civil Status Registers, from the Middle Ages to the
Present Day: Demography and Genealogy). Presses
universitaires de Franche-Comt
´
e, Besanc¸on.
Diem, M., Kleber, F., and Sablatnig, R. (2011). Text clas-
sification and document layout analysis of paper frag-
ments. In 2011 International Conference on Docu-
ment Analysis and Recognition, pages 854–858.
Gan, J., Wang, W., and Lu, K. (2020). In-air handwritten
chinese text recognition with temporal convolutional
recurrent network. Pattern Recognition, 97:107025.
IMPROVE 2024 - 4th International Conference on Image Processing and Vision Engineering
42