Authors:
Mariano Rico
1
;
Rizkallah Touma
2
;
Anna Queralt
2
and
María S. Pérez
1
Affiliations:
1
Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid and Spain
;
2
Barcelona Supercomputing Center (BSC), Barcelona and Spain
Keyword(s):
Query Augmentation, Linked Data, Semantic Web, SPARQL Endpoint, Query Type, Q-Type, Triple Pattern.
Abstract:
Linked Data repositories have become a popular source of publicly-available data. Users accessing this data through SPARQL endpoints usually launch several restrictive yet similar consecutive queries, either to find the information they need through trial-and-error or to query related resources. However, instead of executing each individual query separately, query augmentation aims at modifying the incoming queries to retrieve more data that is potentially relevant to subsequent requests. In this paper, we propose a novel approach to query augmentation for SPARQL endpoints based on machine learning. Our approach separates the structure of the query from its contents and measures two types of similarity, which are then used to predict the structure and contents of the augmented query. We test the approach on the real-world query logs of the Spanish and English DBpedia and show that our approach yields high-accuracy prediction. We also show that, by caching the results of the predicted
augmented queries, we can retrieve data relevant to several subsequent queries at once, achieving a higher cache hit rate than previous approaches.
(More)