Authors:
YanWen Chen
1
;
Fabrice Huet
2
and
YiXiang Chen
1
Affiliations:
1
East China Normal University, China
;
2
INRIA and Sophia-Antipolis, France
Keyword(s):
Cloud Computing, MapReduce, RDF, Hadoop, Distributed Computing.
Related
Ontology
Subjects/Areas/Topics:
Cloud Application Architectures
;
Cloud Applications Performance and Monitoring
;
Cloud Computing
;
Platforms and Applications
Abstract:
With the prevalence of semantic web, a great deal of RDF data is created and has reached to tens of petabytes, which attracts people to pay more attention to processing data with high performance. In recent years, Hadoop, building on MapReduce framework, provides us a good way to process massive data in parallel. In this paper, we focus on using Hadoop to query RDF data from large data repositories. First, we proposed a prototype to process a SPARQL query. Then, we represented several ways to optimize our solution. Result shows that a better performance has been achieved, almost 70% improvement due to the optimization.