ARKIVO Dataset: A Benchmark for Ontology-based Extraction Tools
Laura Pandolfo, Luca Pulina
2021
Abstract
The amount of data available on the Web has grown significantly in the past years, increasing thus the need for efficient techniques able to retrieve information from data in order to discover valuable and relevant knowledge. In the last decade, the intersection of the Information Extraction and Semantic Web areas is providing new opportunities for improving ontology-based information extraction tools. However, one of the critical aspects in the development and evaluation of this type of system is the limited availability of existing annotated documents, especially in domains such as the historical one. In this paper we present the current state of affairs about our work in building a large and real-world RDF dataset with the purpose to support the development of Ontology-Based extraction tools. The presented dataset is the result of the efforts made within the ARKIVO project and it counts about 300 thousand triples, which are the outcome of the manually annotation process executed by domain experts. ARKIVO dataset is freely available and it can be used as a benchmark for the evaluation of systems that automatically annotate and extract entities from documents.
DownloadPaper Citation
in Harvard Style
Pandolfo L. and Pulina L. (2021). ARKIVO Dataset: A Benchmark for Ontology-based Extraction Tools. In Proceedings of the 17th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-536-4, pages 341-345. DOI: 10.5220/0010677000003058
in Bibtex Style
@conference{webist21,
author={Laura Pandolfo and Luca Pulina},
title={ARKIVO Dataset: A Benchmark for Ontology-based Extraction Tools},
booktitle={Proceedings of the 17th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2021},
pages={341-345},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010677000003058},
isbn={978-989-758-536-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 17th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - ARKIVO Dataset: A Benchmark for Ontology-based Extraction Tools
SN - 978-989-758-536-4
AU - Pandolfo L.
AU - Pulina L.
PY - 2021
SP - 341
EP - 345
DO - 10.5220/0010677000003058