SPARQL SELECT queries. We use syntactic parse
trees to measure the structural similarity of SPARQL
queries and create Query Types which we use to pre-
dict the structure of the next query. Independently,
we measure the similarity between the queries’ triple
patterns, and use the similarities to construct augmen-
ted triple patterns. We then combine the two predicti-
ons to construct an augmented query that can be used
to retrieve data relevant to subsequent queries in the
query session.
We evaluated our approach on the SPARQL end-
point query logs of the Spanish and English DBpedia.
The results show that the prediction of both Q-Types
and augmented triple patterns does not require a large
number of queries, only between 10 to 15, to achieve
high precision. This indicates that our approach can
be used in both long and short query sessions alike.
In general, the classification precision is higher for
the esDBpedia dataset, due to the fact that the enDB-
pedia logs are more diverse and contain more unique
queries. For a minority of cases, namely for queries
containing more than 6 triple patterns, the classifier
accuracy drops for the enDBpedia due to the insuf-
ficient size of this subset of queries. However, our
approach can still achieve a cache hit rate of around
85% for the enDBpedia dataset, which is considerably
higher than previous augmentation approaches.
In the future, we intend to implement a full ca-
ching and prefetching system using our proposed
query augmentation approach. We also plan to extend
our prediction method to take into account other fea-
tures of SELECT queries, such as FILTER clauses, as
well as other less common forms of SPARQL queries.
Finally, we want to distinguish human query sessions
from sessions made by machine agents to test the ef-
fectiveness of our approach on both types and opti-
mize it accordingly.
ACKNOWLEDGEMENTS
This work has been supported by the European
Union’s Horizon 2020 research and innovation pro-
gram (grant H2020-MSCA-ITN-2014-642963), the
Spanish Ministry of Science and Innovation (con-
tract TIN2015-65316, project RTC-2016-4952-7 and
contract TIN2016-78011-C4-4-R), the Spanish Mi-
nistry of Education, Culture and Sports (contract
CAS18/00333) and the Generalitat de Catalunya
(contract 2014-SGR-1051). The authors would also
like to thank Toni Cortes for his feedback.
REFERENCES
Bonifati, A., Martens, W., and Timm, T. (2017). An ana-
lytical study of large sparql query logs. Proc. VLDB
Endow., 11(2):149–161.
Dar, S., Franklin, M. J., J
´
onsson, B. T., Srivastava, D., and
Tan, M. (1996). Semantic data caching and replace-
ment. In Proceedings of the 22th International Confe-
rence on Very Large Data Bases, VLDB ’96, pages
330–341, San Francisco, CA, USA. Morgan Kauf-
mann Publishers Inc.
Dividino, R. and Gr
¨
oner, G. (2013). Which of the following
SPARQL queries are similar? why? In Proceedings of
the First International Workshop on Linked Data for
Information Extraction (LD4IE 2013), pages 1–12.
Elbassuoni, S., Ramanath, M., and Weikum, G. (2011).
Query relaxation for entity-relationship search. In
Proceedings of the 8th Extended Semantic Web Confe-
rence on The Semanic Web: Research and Applicati-
ons - Volume Part II, ESWC’11, pages 62–76, Berlin,
Heidelberg. Springer-Verlag.
Groppe, J., Groppe, S., Ebers, S., and Linnemann, V.
(2009). Efficient processing of SPARQL joins in me-
mory by dynamically restricting triple patterns. In
Proceedings of the 2009 ACM symposium on Applied
Computing, pages 1231–1238. ACM.
Hogan, A., Mellotte, M., Powell, G., and Stampouli, D.
(2012). Towards fuzzy query-relaxation for rdf. In
Proceedings of the 9th Extended Semantic Web Con-
ference, ESWC 2012, pages 687–702. Springer Berlin
Heidelberg.
Hurtado, C. A., Poulovassilis, A., and Wood, P. T. (2008).
Query relaxation in RDF. In Journal on Data Seman-
tics X, pages 31–61. Springer-Verlag.
Lorey, J. and Naumann, F. (2013a). Caching and Prefet-
ching Strategies for SPARQL Queries, pages 46–65.
Springer Berlin Heidelberg, Berlin, Heidelberg.
Lorey, J. and Naumann, F. (2013b). Detecting SPARQL
Query Templates for Data Prefetching, pages 124–
139. Springer Berlin Heidelberg, Berlin, Heidelberg.
Mario, A., Fern
´
andez, J. D., Mart
´
ınez-Prieto, M. A., and
de la Fuente, P. (2011). An empirical study of real-
world SPARQL queries. In 1st International Works-
hop on Usage Analysis and the Web of Data USEWOD
2011.
Martin, M., Unbehauen, J., and Auer, S. (2010). Improving
the performance of semantic web applications with
SPARQL query caching. In Proceedings of the 7th
International Conference on The Semantic Web: Re-
search and Applications - Volume Part II, ESWC’10,
pages 304–318, Berlin, Heidelberg. Springer-Verlag.
M
¨
oller, K., Hausenblas, M., Cyganiak, R., and Handschuh,
S. (2010). Learning from linked open data usage: pat-
terns & metrics. In Proceedings of the WebSci10: Ex-
tending the Frontiers of Society On-Line,.
P
´
erez, J., Arenas, M., and Gutierrez, C. (2009). Semantics
and complexity of SPARQL. ACM Trans. Database
Syst., 34(3):16:1–16:45.
Picalausa, F. and Vansummeren, S. (2011). What are real
SPARQL queries like? In Proceedings of the Interna-
WEBIST 2018 - 14th International Conference on Web Information Systems and Technologies
66