
Function tree-inclusion(T, S) 
Input: T - target tree; S - pattern tree. 
Output: 1 if T includes S; 0 if T doesn’t include S. 
begin 
1.   if |T| < |S| then {if S is a forest: <S
1
, ..., S
l
> 
2.                         then S := <S
1
, ..., S
i
> for some i such that 
             |<S
1
, ..., S
i
>| ≤ |T| < |<S
1
, ..., S
i
+1>| 
3.  else return 0;} 
4. let r
1
 and r
2
 be the roots of T and S, respectively; 
5.  (*If S is a forest, construct a virtual root r2 for it, 
which   matches any label.*) 
6. let T
1
, ..., T
k
 be the subtrees of r
1
; 
7. let S
1
, ..., S
l
 be the subtrees of r
2
; 
8.  if label(r
1
) = label(r
2
) 
9.  then {if r
1
 is a leaf then {if r
2 
is not a virtual root 
   then return 1 else return 0;}  
10.  if r
2 
is not a virtual root then mark(r
1
) := 1; 
11.  temp := <S
1
, ..., S
l
>; S
0
 := φ; 
12.  i := 1; j := 0; x := 0; (*i is used to scan T
1
, ..., T
k
; and 
  j is used to scan S
1
, ..., S
l
.*) 
13.  while (i 
≤
 k ∧ temp ≠ φ) do 
14. {x := tree-inclusion(T
i
, temp); 
15.  if x > 0 then temp := temp/<S
j
+1, ..., S
j
+x>; 
16.  else 
 {let v be the T
i
’s root; let u be the S
j
+1’s root; 
17.  if v and u have the same label and mark(v) = 0 
18.  then {x := tree-inclusion(T
i
, S
j
+1); 
     temp := temp/<S
j
+x>;} 
              (*In the case that j = 0 and x = 0, S
j
+x  = S
0
 = φ.*) 
19.  else mark(v) := 0}     
          (*mark(v) is used only once in this case. Afterwards, 
            it will be set to 0 for the subsequent computation.*) 
20.  i := i + 1; j := j + x;} 
21.  if  temp  ≠ φ  then {if  r
2
 is a virtual root then 
return j 
22.  else return 0;} 
23.   else {if r
2
 is a virtual root then return l 
24.     else return 1;}} 
25.  else {for i = 1 to k do 
26.   { x := tree-inclusion(T
i
, S); 
27.   if x = number-of-trees(S) then return 1;} 
 (* number-of-trees(S) is the number of the trees in S. A 
tree can be considered as a forest containg only that tree.*) 
28.   return  0;} 
end 
In Algorithm tree-inclusion(T,  S), line 1 checks 
whether |T| < |S|. If it is the case, the algorithm 
returns 0 if S is a tree. If S is a forest, we will check 
T against the first i subtrees such that |<S
1
, ..., S
i
>| 
≤ |T| < |<S
1
, ..., S
i+1
>| (see line 2). In addition, when 
we check T against a forest <S
1
, ..., S
l
>, a virtual root 
for it is constructed, which matches any label. Thus, 
we will actually check the subtrees of T’s root: 
T
1
, ..., T
k
 against S
1
, ..., and S
l
 to see whether they 
include <S
1
, ..., S
l
> (see line 5). This is performed in 
a while-loop over T
i
’s. In each step, a  
recursive call: tree-inclusion(T
i
, <
i
l
, ..., S
S
l
>) (i = 1, 
...,  j for some j) is carried out, which returns an 
integer  x, indicating that T
i
 includes <
i
l
, ..., 
1−+ xl
i
> (see line 14). If x = 0, i.e., the subtrees of 
T
S
S
i
’s root do not include any subtree in 
i
l
, ..., SS
l
, we 
need to check whether T
i
 include 
i
l
 since when we 
check  T
S
i
 against 
i
l
, ..., S
S
l
, what we have really 
done is to check the subtrees of T
i
’s root, not T
i
 itself 
(see lines 16 - 19). If S is a tree, the algorithm return 
1 if it is included; otherwise, 0 (see line 22 and 24). 
Finally, we note that if the root of T does not match 
the root of S, the algorithm tries to find the first T
i
 
that contains the whole S (see lines 25 - 28). 
In addition, we should pay attention to how mark(v) 
is used (see lines 10, 17, and 19). Each time when v 
is checked against a node (not a virtual node) in S, 
mark(v) is set to 1. It is used to avoid the call tree-
inclusion(T[v],  S
j
+1) after tree-inclusion(T[v], <S
j+1
, 
...,  S
l
>) returns back if |S
j+1
|  ≤ |T[v]|  < |<S
j+1
,  S
j+2
>| 
(see line 17), where T[v] represents a subtree (in T) 
rooted at v. It is because in this case tree-
inclusion(T[v],  S
j+1
) must have been invoked during 
the execution of tree-inclusion(T[v], <S
j+1
, ..., S
l
>) 
and v has been definitely checked against S
j
+1’s root 
in this process, which is recorded by setting mark(v) 
to 1 and used to avoid a second checking. However, 
it is used only in this case. After that, it should be set 
to 0 again for the rest part of the computation. This 
arrangement is correct because during the execution 
of tree-inclusion(T[v], <S
j+1
, ..., S
l
>), if |S
j+1
| ≤ |T[v]| 
< |<S
j+1
, S
j+2
>|, v itself will be checked against S
j+1
’s 
root. If |T[v]|  ≥ |<S
j+1
, ..., S
j+i
>| for some i > 1, we 
will check the subtrees of v against <S
j+1
, ..., S
j+i
> 
and  v is not really checked. In addition, in the rest 
part of the execution of tree-inclusion(T[v], <S
j+1
, ..., 
S
l
>),  v  is not checked. So, upon the return of tree-
inclusion(T[v], <S
j+1
, ..., S
l
>), we check the value of 
mark(v) to see whether tree-inclusion(T[v], S
j+1
) has 
been invoked. Obviously, after this checking, 
mark(v) should be set to 0 again for the subsequent 
computation. 
Finally, we can show that the time complexity of the 
algorithm is bounded by O(|T|⋅height(S)). It is 
because although a node in T may be checked more 
than once, it is checked against different nodes in S, 
and all those nodes in S are on a same path. It is also 
easy to see that the algorithm needs no extra space. 
In the following, we apply the algorithm to the trees 
shown in Fig. 5 and trace the computation step-by 
step for a better understanding. 
Example 2. Consider two ordered, labeled trees T 
and  S shown in Fig. 5, where each node in T is 
identified with t
i
, such as t
0
,  t
1
,  t
11
, and so on; and 
ON THE TREE INCLUSION AND QUERY EVALUATION IN DOCUMENT DATABASES
187