Authors:
Davide Varagnolo
1
;
Guilherme Antas
2
;
Mariana Ramos
2
;
Sara Amaral
2
;
Dora Melo
2
;
3
and
Irene Pimenta Rodrigues
2
Affiliations:
1
Department of Informatics, University of Évora, Portugal
;
2
NOVA Laboratory for Computer Science and Informatics, NOVA LINCS, Portugal
;
3
Polytechnic of Coimbra, Coimbra Business School—ISCAC, Coimbra, Portugal
Keyword(s):
Natural Language Processing, Knowledge Representation, Knowledge Discovery, Semantic Web, Archives Linked Data Semantic Representation.
Abstract:
This paper presents a method for extracting information from ISAD(G) elements, that contain semi-structured text descriptions. Natural language processing is done using Gate environment and defining the set of Jape rules necessary to process the text and extract the intended information. The evaluation of the information extraction processes is done in a sample of 800 records for each type of information, and a dataset that is manually built for each type of information considered, such as baptisms, passport requisitions testaments, etc. The implementation of several automatic information extraction processes enables the population of the CIDOC-CRM knowledge base with new linked events and entities automatically. The exploration of the information, migrated from DigitArq and extracted from text descriptions represented in CIDOC-CRM, is done through SPARQL queries enabling new visualisations of the archival records and the retrieval of information collected in different records from d
ifferent archives.
(More)