be archived at the local store. In the provided
implementation with Sesame, the construction of the
B tree needs to be archived at the local store. As we
described, ordered triples data will help the
construction task fast.
To get the experimental results of the local RDF
store’s work, we made a revision on the Native
version of Sesame. The revised Sesame reads value
storing files handed over by a master node of the
M/R cluster. Then, it executes pre-processing on
behalf of the store.
The environment is as follows:
Ubuntu 2.24.1 of Linux, CPU: Intel quad core 2.8GH,
Memory: 3.5GB Samsung, HDD: 500GB S-ATAII
(7200 rpm) with single partition, Java SDK 1.6 version,
JVM Heap Size: Minimum 1024 MB, Maximum 2048
MB.
Figure 3: Benchmark Times in Local Store.
The result in the Figure 3 shows very fast
loading time on the data set from LUBM 50 univ ~
1000 univ. It only takes about 16 minutes for 138
million triples (1000 univ) with higher TPS rates.
Also, for extremely large sized data sets which
include non realistic numbers of triples, it also
shows a reasonable loading time. A passable process
time on a cloud computing service for those sizes of
data could be archived similarly with the computing
times we have shown.
Figure 4: Very Large-Scale Benchmarks in Local Store.
5 CONCLUSIONS
We presented the practical approaches to
dynamically reduce large-scale RDF data loading
with the aid of a cloud computing service.
Experimental results giving insight on the overall
time spent are also provided a conversion from a
single machine-based job to multiple machine work.
For such conversions, M-R programming and simple
parallel processing are embodied. The
implementation for a native version of Sesame RDF
Repository delivers a very fast loading time marked
a local store loading time of 16.2 minutes with
additional preparation time on a cloud service,
which can be lessened by adding supplemental
machines.
ACKNOWLEDGEMENTS
This work was supported in part by MKE & KEIT
through the Development of Independent
Component based Service-Oriented Peta-Scale
Computing Platform Project.
REFERENCES
Bizer, C., Cyganiak, R., Heath, T., 2008. How to Publish
Linked Data on the Web, Available at:
http://www4.wiwiss.fu-berlin.de/bizer/pub/
LinkedData Tutorial/20070727/.
Broekstra, J., Kampman, A., Harmelen., F., 2002.
Sesame: A Generic Architecture for Storing and
Querying RDF and RDF Schema, International
Semantic Web Conference (ISWC 2002).
Guo, Y., Pan, Z., Heflin., J., 2005. LUBM: A Benchmark
for OWL Knowledge Base Systems, Journal of Web
Semantics3.
Schmidt, M., Hornung, T., Küchlin, N., Lausen, G.,
Pinkel, C., 2008. An Experimental Comparison of
RDF Data Management Approaches in a SPARQL
Benchmark Scenario, International Semantic Web
Conference (ISWC 2008).
Schmidt, M., Hornung , T., Küchlin, N., Lausen, G.,
Pinkel, C., 2008. An Experimental Comparison of
RDF Data Management Approaches in a SPARQL
Benchmark Scenario, International Semantic Web
Conference (ISWC 2008).
Liu, B., Hu, B., 2005. An evaluation of RDF storage
systems for large data applications, In Proceedings of
the First International Conference on Semantics,
Knowledge and Grid.
Erling, O., and Mikhailov, I., Towards Web-Scale RDF,
Available at: http://virtuoso.openlinksw.com/
dataspace/dav/ wiki/Main/VOSArticleWebScaleRDF.
Dean, J., Ghemawat, S., 2008. MapReduce: simplified
data processing on large clusters, Communications of
the ACM, v.51.
KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development
492