Furthermore, if we adopt the unit cost function, then
we can compute d
V
(C
1
,C
2
), d
∗
V
(C
1
,C
2
), d
H
(C
1
,C
2
)
and d
∗
H
(C
1
,C
2
) in O(h
2
) time, O(h
2
+ λ) time, O(λ)
time and O(λ + h) time, respectively. Hence, we can
compute the vertical distances in quadratic time and
the horizontal distances in linear time with respect to
the number of nodes.
Finally, we give experimental results to evaluate
the running time and the approximation for caterpil-
lars in real data.
2 PRELIMINARIES
A tree T is a connected graph (V,E) without cycles,
where V is the set of vertices and E is the set of edges.
We denote V and E by V(T) and E(T). The size of
T is |V| and denoted by |T|. We sometime denote
v ∈ V(T) by v ∈ T. We denote an empty tree (
/
0,
/
0) by
/
0. A rooted tree is a tree with one node r chosen as its
root. We denote the root of a rooted tree T by r(T).
Let T be a rooted tree such that r = r(T) and
u,v, w∈ T. We denote the unique path from r to v, that
is, the tree (V
′
,E
′
) such that V
′
= {v
1
,... , v
k
}, v
1
= r,
v
k
= v and (v
i
,v
i+1
) ∈ E
′
for every i (1 ≤ i ≤ k − 1),
by UP
r
(v).
The parent of v(6= r), which we denote by par(v),
is its adjacent node on UP
r
(v) and the ancestors of
v(6= r) are the nodes on UP
r
(v)−{v}. We say that u is
a child of v if v is the parent of u and u is a descendant
of v if v is an ancestor of u. We denote the set of
children of v by ch(v) and that v is a ancestor of u
by u ≤ v. We call a node with no children a leaf and
denote the set of all the leaves in T by lv(T).
A rooted path P is a rooted tree
({v
1
,... , v
n
},{(v
i
,v
i+1
) | 1 ≤ i ≤ n − 1}) such
that r(P) = v
1
. We call the node v
n
(the leaf of P) an
endpoint of P and denote it by e(P).
The degree of v, denoted by d(v), is the number of
children of v, and the degree of T, denoted by d(T), is
max{d(v) | v ∈ T}. The height of v, denoted by h(v),
is max{|UP
v
(w)| | w ∈ lv(T[v])}, and the height of T,
denoted by h(T), is max{h(v) | v ∈ T}.
We say that u is to the left of v in T if pre(u) ≤
pre(v) for the preorder number pre in T and post(u) ≤
post(v) for the postorder number post in T. We say
that a rooted tree is ordered if a left-to-right order
among siblings is given; unordered otherwise. We say
that a rooted tree is labeled if each node is assigned a
symbol from a fixed finite alphabet Σ. For a node v,
we denote the label of v by l(v), and sometimes iden-
tify v with l(v). In this paper, we call a rooted labeled
unordered tree a tree simply.
Definition 1 (Caterpillar (cf., (Gallian, 2007))). We
say that a tree is a caterpillar if it is transformed to a
rooted path after removing all the leaves in it. For a
caterpillarC, we call the remained rooted path a back-
bone of C and denote it by bb(C).
It is obvious that r(C) = r(bb(C)) and V(C) =
bb(C) ∪ lv(C) for a caterpillar C, that is, every node
in a caterpillar is either a leaf or an element of the
backbone.
Next, we introduce a tree edit distance and a Tai
mapping.
Definition 2 (Edit operations (Tai, 1979)). The edit
operations of a tree T are defined as follows, see Fig-
ure 1.
1. Substitution: Change the label of the node v in T.
2. Deletion: Delete a node v in T with parent v
′
,
making the children of v become the children of
v
′
. The children are inserted in the place of v as
a subset of the children of v
′
. In particular, if v is
the root in T, then the result applying the deletion
is a forest consisting of the children of the root.
3. Insertion: The complement of deletion. Insert a
node v as a child of v
′
in T making v the parent of
a subset of the children of v
′
.
Substitution (v 7→ w)
v
7→
w
Deletion (v 7→ ε)
v
′
v
7→
v
′
Insertion (ε 7→ v)
v
′
7→
v
′
v
Figure 1: Edit operations for trees.
Let ε 6∈ Σ denote a special blank symbol and define
Σ
ε
= Σ∪ {ε}. Then, we represent each edit operation
by (l
1
7→ l
2
), where (l
1
,l
2
) ∈ (Σ
ε
×Σ
ε
−{(ε,ε)}). The
operation is a substitution if l
1
6= ε and l
2
6= ε, a dele-
tion if l
2
= ε, and an insertion if l
1
= ε. For nodes v
and w, we also denote (l(v) 7→ l(w)) by (v 7→ w). We
define a cost function γ : (Σ
ε
× Σ
ε
\ {(ε,ε)}) 7→ R
+
on
pairs of labels. We often constrain a cost function γ to
be a metric, that is, γ(l
1
,l
2
) ≥ 0, γ(l
1
,l
2
) = 0 iff l
1
= l
2
,
γ(l
1
,l
2
) = γ(l
2
,l
1
) and γ(l
1
,l
3
) ≤ γ(l
1
,l
2
)+γ(l
2
,l
3
). In
particular, we call the cost function that γ(l
1
,l
2
) = 1
if l
1
6= l
2
a unit cost function.