Authors:
Dora Melo
1
;
Irene Pimenta Rodrigues
2
and
Inês Koch
3
Affiliations:
1
Coimbra Business School - ISCAC, Polytechnic Institute of Coimbra, Portugal
;
2
Department of Informatics, University of Évora, Portugal
;
3
INESC-TEC, Faculty of Engineering of the University of Porto, Porto, Portugal
Keyword(s):
Natural Language Processing, Knowledge Representation and Reasoning, Archives Ontology, Semantic Migration, ISAD(G), CIDOC-CRM, Linked Data.
Abstract:
This paper presents an automatic semantic migration prototype based on Knowledge Discovery from Digital Archive Data for ontology population in the domain of Archives metadata, ISAD(G). Natural Language Processing (NLP) techniques are used for language processing and Semantic Web techniques for querying and updating the Ontology ArchOnto, a CIDOC-CRM (Conceptual Reference Model) extension. This work is done in the context of project EPISA (Entity and Property Inference for Semantic Archives) where the Portuguese National Archives, Torre do Tombo (ANTT) is one of the partners. The data model and description vocabularies we adopted are built upon the CIDOC-CRM standard, an ontology, developed for museums by the International Committee for Documentation (CIDOC) of the International Council of Museums (ICOM). A detailed example of a baptism document metadata migration is presented to highlight the challenges on the natural language interpretation and the ontology representation.