Evaluating and Exploring Text Fields Information Extraction into CIDOC-CRM

Davide Varagnolo, Guilherme Antas, Mariana Ramos, Sara Amaral, Dora Melo, Dora Melo, Irene Pimenta Rodrigues

2022

Abstract

This paper presents a method for extracting information from ISAD(G) elements, that contain semi-structured text descriptions. Natural language processing is done using Gate environment and defining the set of Jape rules necessary to process the text and extract the intended information. The evaluation of the information extraction processes is done in a sample of 800 records for each type of information, and a dataset that is manually built for each type of information considered, such as baptisms, passport requisitions testaments, etc. The implementation of several automatic information extraction processes enables the population of the CIDOC-CRM knowledge base with new linked events and entities automatically. The exploration of the information, migrated from DigitArq and extracted from text descriptions represented in CIDOC-CRM, is done through SPARQL queries enabling new visualisations of the archival records and the retrieval of information collected in different records from different archives.

Download


Paper Citation


in Harvard Style

Varagnolo D., Antas G., Ramos M., Amaral S., Melo D. and Rodrigues I. (2022). Evaluating and Exploring Text Fields Information Extraction into CIDOC-CRM. In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 2: KEOD; ISBN 978-989-758-614-9, SciTePress, pages 177-184. DOI: 10.5220/0011550700003335


in Bibtex Style

@conference{keod22,
author={Davide Varagnolo and Guilherme Antas and Mariana Ramos and Sara Amaral and Dora Melo and Irene Pimenta Rodrigues},
title={Evaluating and Exploring Text Fields Information Extraction into CIDOC-CRM},
booktitle={Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 2: KEOD},
year={2022},
pages={177-184},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011550700003335},
isbn={978-989-758-614-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 2: KEOD
TI - Evaluating and Exploring Text Fields Information Extraction into CIDOC-CRM
SN - 978-989-758-614-9
AU - Varagnolo D.
AU - Antas G.
AU - Ramos M.
AU - Amaral S.
AU - Melo D.
AU - Rodrigues I.
PY - 2022
SP - 177
EP - 184
DO - 10.5220/0011550700003335
PB - SciTePress