Can You Find All the Data You Expect in a Linked Dataset?

Walter Travassos Sarinho, Bernadette Farias Lóscio, Damires Souza

Abstract

The huge volume of datasets available on the Web has motivated the development of a new class of Web applications, which allow users to perform complex queries on top of a set of predefined linked datasets. However, given the large number of available datasets and the lack of information about their quality, the selection of datasets for a particular application may become a very complex and time consuming task. In this work, we argue that one possible way of helping the selection of datasets for a given application consists of evaluating the completeness of the dataset with respect to the data considered as important by the application users. With this in mind, we propose an approach to assess the completeness of a linked dataset, which considers a set of specific data requirements and allows saving large amounts of query processing. To provide a more detailed evaluation, we propose three distinct types of completeness: schema, literal and instance completeness. We present the definitions underlying our approach and some results obtained with the accomplished evaluation.

References

  1. Darari, F., Nutt, W., Pirró, G. and Razniewski, S. 2013. Completeness statements about RDF data sources and their use for query answering. In Proceedings of the 12th International Semantic Web Conference, ISWC 2013, Sydney, NSW, Australia, October 21-25.
  2. Fürber, C., and Hepp, M. 2011. Swiqa - a semantic web information quality assessment framework. In Proceedings of the 19th European Conference on Information Systems, ECIS 2011, Helsinki, Finland, June 9-11, 2011.
  3. Harth, A. and Speiser, S. 2012. On Completeness Classes for Query Evaluation on Linked Data. In Proceedings of the 26th AAAI Conference, AAAI 2012, Toronto, Canada, July 22-26.
  4. Heath, T. and Bizer, C. 2011. Linked Data. (1st ed.). Morgan & Claypool Publishers.
  5. Mendes, P., Mühleisen, H., and Bizer, C. 2012. Sieve: linked data quality assessment and fusion. In Proceedings of the 2012 Joint EDBT/ICDT Workshops (EDBT-ICDT 7812), Divesh Srivastava and Ismail Ari (Eds.). ACM, New York, NY, USA, 116- 123.
  6. DOI=http://doi.acm.org/10.1145/2320765.2320803.
  7. Naumann F. 1998. Data Fusion and Data Quality, In: New Techniques and Technologies for Statistics Seminar (NTTS'98). Sorrent, Italy, 1998.
  8. Pipino, L. L., Lee, Y. and Wang, R. 2002. Data quality assessment. Commun. ACM 45, 4 (April 2002), 211- 218. DOI=http://doi.acm.org/10.1145/505248.506010.
  9. Price, R. and Shanksa, G. 2005. A semiotic information quality framework: development and comparative analysis. JIT 20, 2 (June 2005), 88-102. DOI=10.1057/palgrave.jit.2000038.
  10. Roth, A. and Naumann, F. 2007. System P: Completenessdriven Query Answering in Peer Data Management Systems. In Proceedings of the Datenbanksysteme in Business, Technologie und Web, BTW 2007, Aachen, Germany, March 7-9.
  11. Schwarte, A., Haase, P., Hose, K, Schenkel, R. and Schmidt, M. 2011. FedX: Optimization Techniques for Federated Query Processing on Linked Data. In Proceeding of the 10th international conference on The semantic web (ISWC'11), Bonn (Germany).
  12. Wang, R. Y. and Strong, D. M. 1996. Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12, 4 (March 1996), 5-33.
  13. Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S. 2012. Quality assessment methodologies for linked open data. Available at: http://www.semantic-web-journal.net/content/ qualityassessment-methodologies-linked-open-data.
Download


Paper Citation


in Harvard Style

Travassos Sarinho W., Farias Lóscio B. and Souza D. (2015). Can You Find All the Data You Expect in a Linked Dataset? . In Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-097-0, pages 648-655. DOI: 10.5220/0005381806480655


in Bibtex Style

@conference{iceis15,
author={Walter Travassos Sarinho and Bernadette Farias Lóscio and Damires Souza},
title={Can You Find All the Data You Expect in a Linked Dataset?},
booktitle={Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2015},
pages={648-655},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005381806480655},
isbn={978-989-758-097-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - Can You Find All the Data You Expect in a Linked Dataset?
SN - 978-989-758-097-0
AU - Travassos Sarinho W.
AU - Farias Lóscio B.
AU - Souza D.
PY - 2015
SP - 648
EP - 655
DO - 10.5220/0005381806480655