Authors:
Nur Aini Rakhmawati
;
Marcel Karnstedt
;
Michael Hausenblas
and
Stefan Decker
Affiliation:
National University of Ireland, Ireland
Keyword(s):
Linked Data, Data Distribution, Federated SPARQL Query, SPARQL Endpoint.
Related
Ontology
Subjects/Areas/Topics:
Databases and Datawarehouses
;
Distributed and Parallel Applications
;
Internet Technology
;
Ontology and the Semantic Web
;
System Integration
;
Web Information Systems and Technologies
;
Web Interfaces and Applications
Abstract:
Processing a federated query in Linked Data is challenging because it needs to consider the number of sources,
the source locations as well as heterogeneous system such as hardware, software and data structure and distribution.
In this work, we investigate the relationship between the data distribution and the communication cost
in a federated SPARQL query framework. We introduce the spreading factor as a dataset metric for computing
the distribution of classes and properties throughout a set of data sources. To observe the relationship between
the spreading factor and the communication cost, we generate 9 datasets by using several data fragmentation
and allocation strategies. Our experimental results showed that the spreading factor is correlated with the communication
cost between a federated engine and the SPARQL endpoints . In terms of partitioning strategies,
partitioning triples based on the properties and classes can minimize the communication cost. However, such
partitioning
can also reduce the performance of SPARQL endpoint within the federation framework.
(More)