Evaluating and Exploring Text Fields Information Extraction into CIDOC-CRM
Davide Varagnolo, Guilherme Antas, Mariana Ramos, Sara Amaral, Dora Melo, Dora Melo, Irene Pimenta Rodrigues
2022
Abstract
This paper presents a method for extracting information from ISAD(G) elements, that contain semi-structured text descriptions. Natural language processing is done using Gate environment and defining the set of Jape rules necessary to process the text and extract the intended information. The evaluation of the information extraction processes is done in a sample of 800 records for each type of information, and a dataset that is manually built for each type of information considered, such as baptisms, passport requisitions testaments, etc. The implementation of several automatic information extraction processes enables the population of the CIDOC-CRM knowledge base with new linked events and entities automatically. The exploration of the information, migrated from DigitArq and extracted from text descriptions represented in CIDOC-CRM, is done through SPARQL queries enabling new visualisations of the archival records and the retrieval of information collected in different records from different archives.
DownloadPaper Citation
in Harvard Style
Varagnolo D., Antas G., Ramos M., Amaral S., Melo D. and Rodrigues I. (2022). Evaluating and Exploring Text Fields Information Extraction into CIDOC-CRM. In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 2: KEOD; ISBN 978-989-758-614-9, SciTePress, pages 177-184. DOI: 10.5220/0011550700003335
in Bibtex Style
@conference{keod22,
author={Davide Varagnolo and Guilherme Antas and Mariana Ramos and Sara Amaral and Dora Melo and Irene Pimenta Rodrigues},
title={Evaluating and Exploring Text Fields Information Extraction into CIDOC-CRM},
booktitle={Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 2: KEOD},
year={2022},
pages={177-184},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011550700003335},
isbn={978-989-758-614-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 2: KEOD
TI - Evaluating and Exploring Text Fields Information Extraction into CIDOC-CRM
SN - 978-989-758-614-9
AU - Varagnolo D.
AU - Antas G.
AU - Ramos M.
AU - Amaral S.
AU - Melo D.
AU - Rodrigues I.
PY - 2022
SP - 177
EP - 184
DO - 10.5220/0011550700003335
PB - SciTePress