On Metrics for Measuring Fragmentation of Federation over SPARQL Endpoints

Nur Aini Rakhmawati, Marcel Karnstedt, Michael Hausenblas, Stefan Decker

2014

Abstract

Processing a federated query in Linked Data is challenging because it needs to consider the number of sources, the source locations as well as heterogeneous system such as hardware, software and data structure and distribution. In this work, we investigate the relationship between the data distribution and the communication cost in a federated SPARQL query framework. We introduce the spreading factor as a dataset metric for computing the distribution of classes and properties throughout a set of data sources. To observe the relationship between the spreading factor and the communication cost, we generate 9 datasets by using several data fragmentation and allocation strategies. Our experimental results showed that the spreading factor is correlated with the communication cost between a federated engine and the SPARQL endpoints . In terms of partitioning strategies, partitioning triples based on the properties and classes can minimize the communication cost. However, such partitioning can also reduce the performance of SPARQL endpoint within the federation framework.

References

  1. Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. (2007). Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd VLDB, VLDB 7807, pages 411-422. VLDB Endowment.
  2. Arias, M., Fernández, J. D., Martínez-Prieto, M. A., and de la Fuente, P. (2011). An empirical study of realworld sparql queries. CoRR, abs/1103.5043.
  3. Duan, S., Kementsietsidis, A., Srinivas, K., and Udrea, O. (2011). Apples and oranges: a comparison of rdf benchmarks and real rdf datasets. In ACM SIGMOD.
  4. Görlitz, O. and Staab, S. (2011). SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions. In Proceedings of the 2nd International Workshop on COLD, Bonn, Germany.
  5. Guo, Y., Pan, Z., and Heflin, J. (2005). Lubm: A benchmark for owl knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web, 3(2-3):158 - 182.
  6. Huang, J., Abadi, D. J., and Ren, K. (2011). Scalable sparql querying of large rdf graphs. PVLDB, 4(11):1123- 1134.
  7. Karypis, G. and Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20(1):359-392.
  8. Montoya, G., Vidal, M.-E., Corcho, O., Ruckhaus, E., and Aranda, C. B. (2012). Benchmarking federated sparql query engines: Are existing testbeds enough? In ISWC(2), pages 313-324.
  9. Prasser, F., Kemper, A., and Kuhn, K. A. (2012). Efficient distributed query processing for autonomous rdf databases. EDBT 7812, pages 372-383, New York, NY, USA. ACM.
  10. Quilitz, B. and Leser, U. (2008). Querying distributed rdf data sources with sparql. ESWC'08, pages 524-538, Berlin, Heidelberg. Springer-Verlag.
  11. Rakhmawati, N. A. and Hausenblas, M. (2012). On the impact of data distribution in federated sparql queries. In ICSC 2012, pages 255 -260.
  12. Rakhmawati, N. A., Umbrich, J., Karnstedt, M., Hasnain, A., and Hausenblas, M. (2013). Querying over federated sparql endpoints - a state of the art survey. CoRR, abs/1306.1723.
  13. Schmidt, M., Grlitz, O., Haase, P., Ladwig, G., Schwarte, A., and Tran, T. (2011). Fedbench: A benchmark suite for federated semantic data query processing. In ISWC, volume 7031, pages 585-600. Springer.
  14. Schmidt, M., Hornung, T., Lausen, G., and Pinkel, C. (2009). Spˆ 2bench: a sparql performance benchmark. In ICDE'09., pages 222-233. IEEE.
  15. Schwarte, A., Haase, P., Schmidt, M., Hose, K., and Schenkel, R. (2012). An experience report of large scale federations. CoRR, abs/1210.5403.
  16. Stocker, M. and Seaborne, A. (2007). Arqo: The architecture for an arq static query optimizer.
  17. Wilkinson, K. (2006). Jena property table implementation. In In SSWS.
Download


Paper Citation


in Harvard Style

Aini Rakhmawati N., Karnstedt M., Hausenblas M. and Decker S. (2014). On Metrics for Measuring Fragmentation of Federation over SPARQL Endpoints . In Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-023-9, pages 119-126. DOI: 10.5220/0004760101190126


in Bibtex Style

@conference{webist14,
author={Nur Aini Rakhmawati and Marcel Karnstedt and Michael Hausenblas and Stefan Decker},
title={On Metrics for Measuring Fragmentation of Federation over SPARQL Endpoints},
booktitle={Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2014},
pages={119-126},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004760101190126},
isbn={978-989-758-023-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - On Metrics for Measuring Fragmentation of Federation over SPARQL Endpoints
SN - 978-989-758-023-9
AU - Aini Rakhmawati N.
AU - Karnstedt M.
AU - Hausenblas M.
AU - Decker S.
PY - 2014
SP - 119
EP - 126
DO - 10.5220/0004760101190126