ping obtained by repeating recursively, after selecting
vertices (as leaves in heavy caterpillars) to bridge the
Tai mapping between the heavy caterpillars, to com-
pute the edit distance (the Tai mapping) between the
heavy caterpillars of the complete subtree rooted by
the selected vertices.
Then, in this paper, we show that the heavy cater-
pillar distances τ
HC
and τ
c
HC
provide the upper bound
of τ
TAI
, that is, τ
TAI
≤ τ
c
HC
≤ τ
HC
. For the maxi-
mum height h and the maximum number λ of leaves
in given two trees, we can compute τ
HC
in O(h
2
λ
3
)
time under the general cost function and in O(h
2
λ)
time under the unit cost function, and τ
c
HC
(T
1
,T
2
) in
O(h
2
λ
4
) time under the general cost function and in
O(h
2
λ
2
) time under the unit cost function. Further-
more, we show that τ
HC
and τ
c
HC
are incomparable
with τ
ILST
, τ
ALN
and τ
SG
. Hence, the heavy caterpillar
distances τ
HC
and τ
c
HC
provide another tractable vari-
ations of the edit distance τ
TAI
incomparable with the
isolated-subtree distance τ
ILST
.
2 PRELIMINARIES
A tree T is a connected graph (V,E) without cycles,
where V is the set of vertices and E is the set of edges.
We denote V and E by V(T) and E(T). The size of
T is |V| and denoted by |T|. We sometime denote
v ∈ V(T) by v ∈ T. We denote an empty tree (
/
0,
/
0) by
/
0. A rooted tree is a tree with one node r chosen as its
root. We denote the root of a rooted tree T by r(T).
Let T be a rooted tree such that r = r(T) and
u, v, w ∈ T. We denote the unique path from r to v, that
is, the tree (V
′
,E
′
) such that V
′
= {v
1
,... , v
k
}, v
1
= r,
v
k
= v and (v
i
,v
i+1
) ∈ E
′
for every i (1 ≤ i ≤ k − 1),
by UP
r
(v).
The parent of v(6= r), which we denote by par(v),
is its adjacent node on UP
r
(v) and the ancestors of
v(6= r) are the nodes on UP
r
(v) − {v}. We denote the
set of all ancestors of v by anc(v). We say that u is a
child of v if v is the parent of u and u is a descendant
of v if v is an ancestor of u. We denote the set of
children of v by ch(v) and that v is a ancestor of u
by u ≤ v. We call a node with no children a leaf and
denote the set of all the leaves in T by lv(T).
A rooted path P is a rooted tree
({v
1
,... , v
n
}, {(v
i
,v
i+1
) | 1 ≤ i ≤ n − 1}) such
that r(P) = v
1
. We call the node v
n
(the leaf of P) an
endpoint of P and denote it by e(P).
The degree of v, denoted by d(v), is the number of
children of v, and the degree of T, denoted by d(T), is
max{d(v) | v ∈ T}. The height of v, denoted by h(v),
is max{|UP
v
(w)| | w ∈ lv(T[v])}, and the height of T,
denoted by h(T), is max{h(v) | v ∈ T}.
We use the ancestor orders < and ≤, that is, u < v
if v is an ancestor of u and u ≤ v if u < v or u = v.
We say that w is the least common ancestor of u and
v, denoted by u ⊔ v, if u ≤ w, v ≤ w and there exists
no node w
′
∈ T such that w
′
≤ w, u ≤ w
′
and v ≤
w
′
. Let T be a rooted tree (V,E) and v a node in T.
A complete subtree of T at v, denoted by T[v], is a
rooted tree T
′
= (V
′
,E
′
) such that r(T
′
) = v, V
′
=
{u ∈ V | u ≤ v} and E
′
= {(u, w) ∈ E | u,w ∈ V
′
}.
We say that u is to the left of v in T if pre(u) ≤
pre(v) for the preorder number pre in T and post(u) ≤
post(v) for the postorder number post in T. We say
that a rooted tree is ordered if a left-to-right order
among siblings is given; unordered otherwise. We say
that a rooted tree is labeled if each node is assigned a
symbol from a fixed finite alphabet Σ. For a node v,
we denote the label of v by l(v), and sometimes iden-
tify v with l(v). In this paper, we call a rooted labeled
unordered tree a tree simply.
Furthermore, we call a set of trees a forest. In
particular, we denote the forest obtained by deleting
v in T[v] by T(v).
Definition 1 (Caterpillar (cf., (Gallian, 2007))). We
say that a tree is a caterpillar if it is transformed to a
rooted path after removing all the leaves in it. For a
caterpillarC, we call the remained rooted path a back-
bone of C and denote it by bb(C).
It is obvious that r(C) = r(bb(C)) and V(C) =
bb(C) ∪ lv(C) for a caterpillar C, that is, every node
in a caterpillar is either a leaf or an element of the
backbone.
Next, we introduce a tree edit distance and a Tai
mapping.
Definition 2 (Edit operations (Tai, 1979)). The edit
operations of a tree T are defined as follows, see Fig-
ure 1.
1. Substitution: Change the label of the node v in T.
2. Deletion: Delete a node v in T with parent v
′
,
making the children of v become the children of
v
′
. The children are inserted in the place of v as
a subset of the children of v
′
. In particular, if v is
the root in T, then the result applying the deletion
is a forest consisting of the children of the root.
3. Insertion: The complement of deletion. Insert a
node v as a child of v
′
in T making v the parent of
a subset of the children of v
′
.
Let ε 6∈ Σ denote a special blank symbol and define
Σ
ε
= Σ∪ {ε}. Then, we represent each edit operation
by (l
1
7→ l
2
), where (l
1
,l
2
) ∈ (Σ
ε
×Σ
ε
−{(ε, ε)}). The
operation is a substitution if l
1
6= ε and l
2
6= ε, a dele-
tion if l
2
= ε, and an insertion if l
1
= ε. For nodes v
and w, we also denote (l(v) 7→ l(w)) by (v 7→ w). We
define a cost function γ : (Σ
ε
× Σ
ε
\ {(ε, ε)}) 7→ R
+
on