MITIGATION OF LARGE-SCALE RDF DATA LOADING WITH THE EMPLOYMENT OF A CLOUD COMPUTING SERVICE

Hyun Namgoong, Harshit Kumar, Hong-Gee Kim

Abstract

An expanding need for interoperability and structuralization of web data has made use of RDF (Resource Description Framework) plentiful. To guarantee a common usage of the data within various applications, several RDF stores providing data management services have been developed. Here, we represent a systematic approach to solve a late latency problem of data loading of the stores. It enables a fast loading performance for very large size of RDF data, and it is proven with an existing RDF store. This approach employs a cloud computing service and delegates preparation works to the machines which are temporarily borrowed at little payment. Our implementation for a native version of the Sesame RDF Repository was tested on LUBM 1000 University data (138 million triples), and it showed a local store loading time of 16.2 minutes with additional preparation time on a cloud service taking approximately an hour, which can be reduced by adding supplemental machines to the cluster.

References

  1. Bizer, C., Cyganiak, R., Heath, T., 2008. How to Publish Linked Data on the Web, Available at: http://www4.wiwiss.fu-berlin.de/bizer/pub/ LinkedData Tutorial/20070727/.
  2. Broekstra, J., Kampman, A., Harmelen., F., 2002. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, International Semantic Web Conference (ISWC 2002).
  3. Guo, Y., Pan, Z., Heflin., J., 2005. LUBM: A Benchmark for OWL Knowledge Base Systems, Journal of Web Semantics3.
  4. Schmidt, M., Hornung, T., Küchlin, N., Lausen, G., Pinkel, C., 2008. An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario, International Semantic Web Conference (ISWC 2008).
  5. Schmidt, M., Hornung , T., Küchlin, N., Lausen, G., Pinkel, C., 2008. An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario, International Semantic Web Conference (ISWC 2008).
  6. Liu, B., Hu, B., 2005. An evaluation of RDF storage systems for large data applications, In Proceedings of the First International Conference on Semantics, Knowledge and Grid.
  7. Dean, J., Ghemawat, S., 2008. MapReduce: simplified data processing on large clusters, Communications of the ACM, v.51.
Download


Paper Citation


in Harvard Style

Namgoong H., Kumar H. and Kim H. (2010). MITIGATION OF LARGE-SCALE RDF DATA LOADING WITH THE EMPLOYMENT OF A CLOUD COMPUTING SERVICE . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2010) ISBN 978-989-8425-29-4, pages 489-492. DOI: 10.5220/0003142204890492


in Bibtex Style

@conference{keod10,
author={Hyun Namgoong and Harshit Kumar and Hong-Gee Kim},
title={MITIGATION OF LARGE-SCALE RDF DATA LOADING WITH THE EMPLOYMENT OF A CLOUD COMPUTING SERVICE},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2010)},
year={2010},
pages={489-492},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003142204890492},
isbn={978-989-8425-29-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2010)
TI - MITIGATION OF LARGE-SCALE RDF DATA LOADING WITH THE EMPLOYMENT OF A CLOUD COMPUTING SERVICE
SN - 978-989-8425-29-4
AU - Namgoong H.
AU - Kumar H.
AU - Kim H.
PY - 2010
SP - 489
EP - 492
DO - 10.5220/0003142204890492