IMPLEMENTATION AND OPTIMIZATION OF RDF QUERY USING HADOOP

YanWen Chen, Fabrice Huet, YiXiang Chen

2011

Abstract

With the prevalence of semantic web, a great deal of RDF data is created and has reached to tens of petabytes, which attracts people to pay more attention to processing data with high performance. In recent years, Hadoop, building on MapReduce framework, provides us a good way to process massive data in parallel. In this paper, we focus on using Hadoop to query RDF data from large data repositories. First, we proposed a prototype to process a SPARQL query. Then, we represented several ways to optimize our solution. Result shows that a better performance has been achieved, almost 70% improvement due to the optimization.

References

  1. Andrew, N. and Jane, H. and Yuan-Fang, Li. (2008). A Scale-Out RDF Molecule Store for Distributed Processing of Biomedical Data. In Semantic Web for Health Care and Life Sciences Workshop, Beijing, China.
  2. Andrew, N. and Jane, H. and Yuan-Fang, Li. (2008). A Scale-Out RDF Molecule Store for Distributed Processing of Biomedical Data. In Semantic Web for Health Care and Life Sciences Workshop, Beijing, China.
  3. Georgia, D. S. and Dimitrios, A. K. and Theodore, S.P. (2008). Semantics-Aware Querying of WebDistributed RDF(S) Repositories. In SIEDL2008, Proceedings of 1st Workshop on Semantic Interoperability in the European Digital Library, pp. 39-50, 2008.
  4. Georgia, D. S. and Dimitrios, A. K. and Theodore, S.P. (2008). Semantics-Aware Querying of WebDistributed RDF(S) Repositories. In SIEDL2008, Proceedings of 1st Workshop on Semantic Interoperability in the European Digital Library, pp. 39-50, 2008.
  5. Hyunsik, C. and Jihoon, S. and YongHyun, C. and Min, K. S. and Yon, D C. (2009). SPIDER: A System for Scalable, Parallel/Distributed Evaluation of largescale RDF Data. In CIKM'09, November 2-6, 2009, Hong Kong, China, ACM 978-1-60558-512-3/09/11.
  6. Hyunsik, C. and Jihoon, S. and YongHyun, C. and Min, K. S. and Yon, D C. (2009). SPIDER: A System for Scalable, Parallel/Distributed Evaluation of largescale RDF Data. In CIKM'09, November 2-6, 2009, Hong Kong, China, ACM 978-1-60558-512-3/09/11.
  7. Min, C. and Martin, F. and Baoshi, Y. and Robet, M. (2004). A Subscribable Peer-to-Peer RDF Repository for Distributed Metadata Management. Journal of Web Semantics: Science, Services and Agents on the World Wide Web 2(2004) 109-130.
  8. Min, C. and Martin, F. and Baoshi, Y. and Robet, M. (2004). A Subscribable Peer-to-Peer RDF Repository for Distributed Metadata Management. Journal of Web Semantics: Science, Services and Agents on the World Wide Web 2(2004) 109-130.
  9. Mohammad, F. H. and Pankil, D. and Latifur, K. (2009). Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce. In CloudCom 2009, LNCS 5931, pp. 680-686. 2009.
  10. Mohammad, F. H. and Pankil, D. and Latifur, K. (2009). Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce. In CloudCom 2009, LNCS 5931, pp. 680-686. 2009.
  11. Tom, W. (2009). Hadoop: The Definitive Guide. O'Reilly Media, Yahoo!Press.
  12. Tom, W. (2009). Hadoop: The Definitive Guide. O'Reilly Media, Yahoo!Press.
  13. Yuanbo, G. and Zhengxiang, P. and Jeff, H. (2005). LUBM: A Benchmark for OWL Knowledge Base Systems. Journal of Web Semantics, 3(2), pp.158-182.
  14. Yuanbo, G. and Zhengxiang, P. and Jeff, H. (2005). LUBM: A Benchmark for OWL Knowledge Base Systems. Journal of Web Semantics, 3(2), pp.158-182.
Download


Paper Citation


in Harvard Style

Chen Y., Huet F. and Chen Y. (2011). IMPLEMENTATION AND OPTIMIZATION OF RDF QUERY USING HADOOP . In Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-8425-52-2, pages 512-515. DOI: 10.5220/0003387805120515


in Harvard Style

Chen Y., Huet F. and Chen Y. (2011). IMPLEMENTATION AND OPTIMIZATION OF RDF QUERY USING HADOOP . In Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-8425-52-2, pages 512-515. DOI: 10.5220/0003387805120515


in Bibtex Style

@conference{closer11,
author={YanWen Chen and Fabrice Huet and YiXiang Chen},
title={IMPLEMENTATION AND OPTIMIZATION OF RDF QUERY USING HADOOP},
booktitle={Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2011},
pages={512-515},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003387805120515},
isbn={978-989-8425-52-2},
}


in Bibtex Style

@conference{closer11,
author={YanWen Chen and Fabrice Huet and YiXiang Chen},
title={IMPLEMENTATION AND OPTIMIZATION OF RDF QUERY USING HADOOP},
booktitle={Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2011},
pages={512-515},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003387805120515},
isbn={978-989-8425-52-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - IMPLEMENTATION AND OPTIMIZATION OF RDF QUERY USING HADOOP
SN - 978-989-8425-52-2
AU - Chen Y.
AU - Huet F.
AU - Chen Y.
PY - 2011
SP - 512
EP - 515
DO - 10.5220/0003387805120515


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - IMPLEMENTATION AND OPTIMIZATION OF RDF QUERY USING HADOOP
SN - 978-989-8425-52-2
AU - Chen Y.
AU - Huet F.
AU - Chen Y.
PY - 2011
SP - 512
EP - 515
DO - 10.5220/0003387805120515