Surprisingly, after reviewing the literature, we found
no tool that can reason over large-scale OWL datasets.
The proposed approach is implemented as an open-
source distributed system for reasoning large-scale
OWL datasets using Spark. Compared to Hadoop
MapReduce, Spark enables efficient distributed pro-
cessing by supporting running multiple jobs at the
same node simultaneously as well as the ability to
cache data required for computations in the mem-
ory. The use of data storage in memory greatly
decreases the average time spent on network com-
munication (i.e., communication overhead) and data
read/write using disk-based approaches. We exploit
these advantages for supporting Semantic Web rea-
soning. Furthermore, the proposed approach com-
bines the contributions introduced by state-of-the-art
(i.e., the optimized execution strategy and the pre-
shuffling method). Besides, we proposed a novel du-
plicate elimination strategy that drastically reduces
the reasoning time. These tasks are considered the
most time-consuming tasks in the reasoning process.
The experiments proved that the proposed approach
is scalable in terms of both data and node scalabil-
ity. Our approach has successfully inferred around six
million axioms in 11 hours using only five nodes. In
conclusion, our approach achieved near-linear scala-
bility of output in the sense of speedup. We have suc-
cessfully integrated the proposed approach into the
SANSA framework, which ensures its sustainability
and usability.
To further our research, we plan to perform several
improvements, including code optimization, such as
using different persisting strategies, and build an op-
timal execution strategy based on the statistics of the
input OWL dataset using OWLStats (Mohamed et al.,
2020) approach from the SANSA framework. More-
over, we aim to design more reasoning profiles, such
as OWL EL and OWL RL.
ACKNOWLEDGEMENTS
This work has been supported by the following EU
Horizon2020 projects: LAMBDA project (GA no.
809965), CLEOPATRA project (GA no. 812997),
and PLATOON project (GA no. 872592).
REFERENCES
Al-Ajlan, A. (2015). The comparison between forward and
backward chaining. International Journal of Machine
Learning and Computing, 5(2):106.
Gu, R., Wang, S., Wang, F., Yuan, C., and Huang, Y. (2015).
Cichlid: efficient large scale rdfs/owl reasoning with
spark. In 2015 IEEE International Parallel and Dis-
tributed Processing Symposium, pages 700–709, Hy-
derabad, India. IEEE.
Guo, Y., Pan, Z., and Heflin, J. (2005). Lubm: A bench-
mark for owl knowledge base systems. Web Seman-
tics: Science, Services and Agents on the World Wide
Web, 3(2-3):158–182.
Hayes, P. (2004). Rdf semantics. https://www.w3.org/TR/
rdf-mt/#RDFRules.
Heino, N. and Pan, J. Z. (2012). Rdfs reasoning on mas-
sively parallel hardware. In International Seman-
tic Web Conference, pages 133–148, Boston, USA.
Springer.
Kim, J.-M. and Park, Y.-T. (2015). Scalable owl-horst on-
tology reasoning using spark. In 2015 International
Conference on Big Data and Smart Computing (BIG-
COMP), pages 79–86, Jeju, South Korea. IEEE.
Liu, Y. and McBrien, P. (2017). Spowl: Spark-based owl
2 reasoning materialisation. In Proceedings of the 4th
ACM SIGMOD Workshop on Algorithms and Systems
for MapReduce and Beyond, pages 1–10, Chicago,
USA. ACM.
Liu, Z., Feng, Z., Zhang, X., Wang, X., and Rao, G.
(2016). Rors: enhanced rule-based owl reasoning on
spark. In Asia-Pacific Web Conference, pages 444–
448, Suzhou, China. Springer.
Mohamed, H., Fathalla, S., Lehmann, J., and Jabeen, H.
(2020). OWLStats: Distributed computation of owl
dataset statistics. In IEEE/WIC/ACM International
Joint Conference on Web Intelligence and Intelligent
Agent Technology (WI-IAT), page In press. IEEE.
Sharma, T., Tiwari, N., and Kelkar, D. (2012). Study of dif-
ference between forward and backward reasoning. In-
ternational Journal of Emerging Technology and Ad-
vanced Engineering, 2(10):271–273.
Ter Horst, H. J. (2005). Completeness, decidability and
complexity of entailment for rdf schema and a seman-
tic extension involving the owl vocabulary. Journal of
web semantics, 3(2-3):79–115.
Urbani, J., Kotoulas, S., Maassen, J., Van Harmelen, F., and
Bal, H. (2010). Owl reasoning with webpie: calcu-
lating the closure of 100 billion triples. In Extended
Semantic Web Conference, pages 213–227, Greece.
Springer.
Urbani, J., Kotoulas, S., Maassen, J., Van Harmelen, F., and
Bal, H. (2012). Webpie: A web-scale parallel infer-
ence engine using mapreduce. Journal of Web Seman-
tics, 10:59–75.
Urbani, J., Kotoulas, S., Oren, E., and Van Harmelen, F.
(2009). Scalable distributed reasoning using mapre-
duce. In International semantic web conference, pages
634–649, Chantilly, VA, USA. Springer.
KEOD 2021 - 13th International Conference on Knowledge Engineering and Ontology Development
60