
It is obvious to observe that our parallel computa-
tion is significantly faster compared to the sequential
version across all dataset instances. Furthermore, our
analysis reveals that the average absolute acceleration
achieved using MapReduce is 50.8 times (Table 2),
and the absolute acceleration obtained through Spark
is 94.3 times (Table 3). This substantial acceleration
is notable, which reflects the effectiveness of our par-
allel computation approach.
Table 2: Acceleration of Parallel SubTree Kernel computa-
tion on different datasets using MapReduce.
D1 D2 D3 D4 Average
A
abs
57.2 62.9 42.6 40.7 50.8
Table 3: Acceleration of Parallel SubTree Kernel computa-
tion on different datasets using Spark.
D1 D2 D3 D4 Average
A
abs
61.4 95 86.8 134.3 94.3
5 CONCLUSION
The prefix tree automaton constitutes a common base
for the computation of different tree kernels: SubTree,
RootedTree, and SubSequenceTree kernels (Ouali-
Sebti, 2015). In this paper, we have shown a paral-
lel Algorithm that efficiently compute this common
structure (RWTA automaton) and we have used it for
the computation of the SubTree Kernel using MapRe-
duce and Spark frameworks.
Our parallel implementation of the SubTree kernel
computation has been tested on synthetic datasets
with different parameters. The results showed that our
parallel computation is by far more speed than the se-
quential version for all instances of datasets. Despite
that this work has shown the efficiency of the paral-
lel implementation compared to the sequential algo-
rithms, three main future works are envisaged. Firstly,
we have to devise some algorithms that generalise
the computation of others kernels such RootedTree,
and SubSequenceTree . . . . Some of them will deploy
tree automata intersection in addition to the associ-
ated weights computation. In fact, while the subtree
kernel is a simple summation of weights, the SubSe-
quenceTree needs more investigation on the weight
computations using the resulted RWTAs intersection.
Secondly, more large datasets have to be generated
and tested to confirm the output-sensitive results of
our solutions. Finally, one can investigate different
cluster architectures in order to give more insights and
recommendations on the cluster’ parameters tuning.
REFERENCES
Alian, M. and Awajan, A. (2023). Syntactic-semantic simi-
larity based on dependency tree kernel. Arabian Jour-
nal for Science and Engineering, pages 1–12.
Chali, Y. and Hasan, S. A. (2015). Towards topic-
to-question generation. Computational Linguistics,
41(1):1–20.
Collins, M. and Duffy, N. (2001). Convolution kernels for
natural language. In Dietterich, T., Becker, S., and
Ghahramani, Z., editors, Advances in Neural Informa-
tion Processing Systems, volume 14. MIT Press.
Collins, M. and Duffy, N. P. (2002). New ranking algo-
rithms for parsing and tagging: Kernels over discrete
structures, and the voted perceptron. In Annual Meet-
ing of the Association for Computational Linguistics.
Dayalan, M. (2004). Mapreduce: simplified data processing
on large clusters. In CACM.
´
Esik, Z. and Kuich, W. (2002). Formal tree series. BRICS
Report Series, (21).
Fu, D., Xu, Y., Yu, H., and Yang, B. (2017). Wastk: An
weighted abstract syntax tree kernel method for source
code plagiarism detection. Scientific Programming,
2017.
Gordon, M. and Ross-Murphy, S. B. (1975). The structure
and properties of molecular trees and networks. Pure
and Applied Chemistry, 43(1-2):1–26.
Haussler, D. et al. (1999). Convolution kernels on discrete
structures. Technical report, Citeseer.
L. Mignot, F. O. and Ziadi, D. (2023). New linear-time al-
gorithm for subtree kernel computation based on root-
weighted tree automata.
Maneth, S., Mihaylov, N., and Sakr, S. (2008). Xml tree
structure compression. In 2008 19th International
Workshop on Database and Expert Systems Applica-
tions, pages 243–247.
Mignot, L., Sebti, N. O., and Ziadi, D. (2015). Root-
weighted tree automata and their applications to tree
kernels. CoRR, abs/1501.03895.
Nasar, Z., Jaffry, S. W., and Malik, M. K. (2021). Named
entity recognition and relation extraction: State-of-
the-art. ACM Comput. Surv., 54(1).
Ouali-Sebti, N. (2015). Noyaux rationnels et automates
d’arbres.
Shatnawi, M. and Belkhouche, B. (2012). Parse trees of
arabic sentences using the natural language toolkit.
Thom, J. D. (2018). Combining tree kernels and text em-
beddings for plagiarism detection. PhD thesis, Stel-
lenbosch: Stellenbosch University.
Vishwanathan, S. V. N. and Smola, A. (2002). Fast kernels
for string and tree matching. In NIPS.
Warikoo, N., Chang, Y.-C., and Hsu, W.-L. (2018). Lptk:
a linguistic pattern-aware dependency tree kernel ap-
proach for the biocreative vi chemprot task. Database,
2018.
ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods
336