loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Nihel Kooli and Abdel Belaïd

Affiliation: LORIA, France

Keyword(s): Entity Recognition, Entity Resolution, Entity Matching, OCRed Documents, Redundant Databases.

Related Ontology Subjects/Areas/Topics: Applications ; Data Engineering ; Information Retrieval ; Information Retrieval and Learning ; Ontologies and the Semantic Web ; Pattern Recognition ; Software Engineering ; Theory and Methods

Abstract: This paper presents an entity recognition approach on documents recognized by OCR (Optical Character Recognition). The recognition is formulated as a task of matching entities in a database with their representations in a document. A pre-processing step of entity resolution is performed on the database to provide a better representation of the entities. For this, a statistical model based on record linkage and record merge phases is used. Furthermore, documents recognized by OCR can contain noisy data and altered structure. An adapted method is proposed to retrieve the entities from their structures by tolerating possible OCR errors. A modified version of EROCS is applied to this problem by adapting the notion of segments to blocks provided by the OCR. It handles document segments to match the document to its corresponding entities. For efficiency, a process of data labeling in the document is applied in order to filter the compared entities and segments. The evaluation on business d ocuments shows a significant improvement of matching rates compared to those of EROCS. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 52.91.255.225

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Kooli, N. and Belaïd, A. (2015). Entity Matching in OCRed Documents with Redundant Databases. In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM; ISBN 978-989-758-076-5; ISSN 2184-4313, SciTePress, pages 165-172. DOI: 10.5220/0005177301650172

@conference{icpram15,
author={Nihel Kooli. and Abdel Belaïd.},
title={Entity Matching in OCRed Documents with Redundant Databases},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM},
year={2015},
pages={165-172},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005177301650172},
isbn={978-989-758-076-5},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM
TI - Entity Matching in OCRed Documents with Redundant Databases
SN - 978-989-758-076-5
IS - 2184-4313
AU - Kooli, N.
AU - Belaïd, A.
PY - 2015
SP - 165
EP - 172
DO - 10.5220/0005177301650172
PB - SciTePress