Authors:
Heba Mohamed
1
;
2
;
Said Fathalla
1
;
2
;
Jens Lehmann
3
;
2
and
Hajira Jabeen
4
Affiliations:
1
Faculty of Science, University of Alexandria, Alexandria, Egypt
;
2
Smart Data Analytics (SDA), University of Bonn, Bonn, Germany
;
3
NetMedia Department, Fraunhofer IAIS, Dresden Lab, Germany
;
4
Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, Cologne, Germany
Keyword(s):
Big Data, Distributed Computing, In-Memory Computation, Parallel Reasoning, OWL Horst Rules, OWL Axioms.
Abstract:
With the tremendous increase in the volume of semantic data on the Web, reasoning over such an amount of data has become a challenging task. On the other hand, the traditional centralized approaches are no longer feasible for large-scale data due to the limitations of software and hardware resources. Therefore, horizontal scalability is desirable. We develop a scalable distributed approach for RDFS and OWL Horst Reasoning over large-scale OWL datasets. The eminent feature of our approach is that it combines an optimized execution strategy, pre-shuffling method, and duplication elimination strategy, thus achieving an efficient distributed reasoning mechanism. We implemented our approach as open-source in Apache Spark using Resilient Distributed Datasets (RDD) as a parallel programming model. As a use case, our approach is used by the SANSA framework for large-scale semantic reasoning over OWL datasets. The evaluation results have shown the strength of the proposed approach for both da
ta and node scalability.
(More)