Contract Metadata Identification in Czech Scanned Documents
Hien Ha, Aleš Horák, Minh Bui
2021
Abstract
Although nowadays digital-born documents are generally prevalent, exchange of business documents often consists in processing their scanned image form as a general human-readable format with one-to-one correspondence to paper documents. Bulk processing of such scanned documents then requires human intervention to extract and enter the main document metadata. In this paper, we present the design and evaluation of a contract processing module in the OCRMiner system. The information extraction process allows to combine layout properties with text analysis as input to a rule-based extraction with confidence score propagation. The first results are evaluated with public Czech contract documents reaching the item extraction accuracy of almost 88%.
DownloadPaper Citation
in Harvard Style
Ha H., Horák A. and Bui M. (2021). Contract Metadata Identification in Czech Scanned Documents.In Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-484-8, pages 795-802. DOI: 10.5220/0010243807950802
in Bibtex Style
@conference{icaart21,
author={Hien Ha and Aleš Horák and Minh Bui},
title={Contract Metadata Identification in Czech Scanned Documents},
booktitle={Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2021},
pages={795-802},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010243807950802},
isbn={978-989-758-484-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Contract Metadata Identification in Czech Scanned Documents
SN - 978-989-758-484-8
AU - Ha H.
AU - Horák A.
AU - Bui M.
PY - 2021
SP - 795
EP - 802
DO - 10.5220/0010243807950802