Table 6: The number (#cases) of cases that τ
TAI
<
min{τ
TOP
, τ
BOT
} with their ratios (%) in all the pairs
(#pairs) with the maximum difference (max.).
data #pairs #cases % max.
N-glycans 131,841 59,921 45.45 9
dblp
0.1%
13,279,281 0 0.00 0
SwissProt 23,143,806 5,933,179 25.64 2
TPC-H
◦
28 0 0 0
Auction
−
33,411 0 0 0
Nasa
−
◦
528 94 17.80 1
Protein
−
◦
13,258,675 637,773 4.81 2
University
−
◦
325 0 0 0
for caterpillars with the algorithms designed by (Ya-
mamoto et al., 2014) for standard trees. Table 7 illus-
trates the running time of computing τ
TOP
and τ
LCA
by
using such algorithms which refer to τ
T
TOP
and τ
T
LCA
.
Here, “–” denotes time out over 10,000 seconds.
Table 7: The running time (sec.) of computing τ
TOP
and
τ
LCA
by using the algorithms in this paper and the algo-
rithms τ
T
TOP
and τ
T
LCA
in (Yamamoto et al., 2014).
data τ
TOP
τ
LCA
τ
T
TOP
τ
T
LCA
N-glycans 1.23 2,804.82 11.77 25.64
dblp
0.1%
343.70 1,505.05 – –
SwissProt 1,594.42 9,819.62 – –
TPC-H
◦
0.64×10
−3
1.77×10
−3
3.77×10
−3
7.45×10
−3
Auction
−
0.23 0.87 1.20 2.12
Nasa
−
◦
0.34×10
−2
4.91×10
−2
5.64×10
−2
10.68×10
−2
Protein
−
◦
118.20 433.22 628.79 1156.32
University
−
◦
0.40×10
−3
2.84×10
−3
2.93×10
−3
2.19×10
−3
Table 7 shows that the algorithm of computing
τ
TOP
in this paper is much faster than τ
T
TOP
. Also,
except N-glycans and University
−
◦
, the algorithm of
computing τ
LCA
in this paper is faster than τ
T
LCA
.
5 CONCLUSION
In this paper, we have designed the algorithms of
computing τ
TOP
and τ
BOT
for caterpillars in O(n) time
and τ
LCA
in O(n
2
) time. Also, we have given ex-
perimental results of computing τ
TOP
, τ
LCA
and τ
BOT
for caterpillars in real data. Then, the usage of
min{τ
TOP
, τ
BOT
} have provided to fast approximate to
τ
TAI
for caterpillars. Also, the algorithms in this pa-
per have been almost fast and faster than the previous
algorithms for trees (Yamamoto et al., 2014).
Since the algorithm of computing τ
LCA
for cater-
pillars is slow for N-glycan, it is a future work to im-
prove the implementation, in particular, to apply to
larger number of caterpillars such as all-glycans in
KEGG and CSLOGS
5
. Also it is a future work to in-
vestigate the other variations of the edit distance for
caterpillars presented in (Yoshino and Hirata, 2017).
REFERENCES
Akutsu, T., Fukagawa, D., Halld´orsson, M. M., Takasu, A.,
and Tanaka, K. (2013). Approximation and parame-
terized algorithms for common subtrees and edit dis-
tance between unordered trees. Theoret. Comput. Sci.,
470:10–22.
Chawathe, S. S. (1999). Comparing hierarchical data in ex-
ternal memory. In Proc. VLDB’99, pages 90–101.
Deza, M. M. and Deza, E. (2016). Encyclopedia of dis-
tances (4th ed.). Springer.
Gallian, J. A. (2007). A dynamic survey of graph labeling.
Electorn. J. Combin., 14:DS6.
Hirata, K., Yamamoto, Y., and Kuboyama, T. (2011). Im-
proved MAX SNP-hard results for finding an edit dis-
tance between unordered trees. In Proc. CPM’11
(LNCS 6661), pages 402–415.
Kuboyama, T. (2007). Matching and learning in trees. Ph.D
thesis, University of Tokyo.
Muraka, K., Yoshino, T., and Hirata, K. (2018). Computing
edit distance between rooted labeled caterpillars. In
Proc. FedCSIS’18, pages 245–252.
Selkow, S. M. (1977). The tree-to-tree editing problem. In-
form. Process. Lett., 6:184–186.
Tai, K.-C. (1979). The tree-to-tree correction problem. J.
ACM, 26:422–433.
Ukita, Y., Yoshino, T., and Hirata, K. (2021). Caterpil-
lar alignment distance for rooted labeled caterpillars:
Distance based on alignments required to be caterpil-
lars. In Recent advance in computational optimiza-
tion, pages 111–134.
Valiente, G. (2001). An efficient bottom-up distance be-
tween trees. In Proc. SPIRE’01, pages 212–219.
Yamamoto, Y., Hirata, K., and Kuboyama, T. (2014).
Tractable and intractable variations of unordered tree
edit distance. Internat. J. Found. Comput. Sci.,
25:307–329.
Yoshino, T. and Hirata, K. (2017). Tai mapping hierarchy
for rooted labeled trees through common subforest.
Theory of Comput. Sys., 60:769–787.
Zhang, K. and Jiang, T. (1994). Some MAX SNP-hard re-
sults concerning unordered labeled trees. Inform. Pro-
cess. Lett., 49:249–254.
Zhang, K., Wang, J., and Shasha, D. (1996). On the editing
distance between undirected acyclic graphs. Internat.
J. Found. Comput. Sci., 7:43–58.
5
http://www.cs.rpi.edu/˜zaki/www-
new/pmwiki.php/Software/Software