Function tree-inclusion(T, S)
Input: T - target tree; S - pattern tree.
Output: 1 if T includes S; 0 if T doesn’t include S.
begin
1. if |T| < |S| then {if S is a forest: <S
1
, ..., S
l
>
2. then S := <S
1
, ..., S
i
> for some i such that
|<S
1
, ..., S
i
>| ≤ |T| < |<S
1
, ..., S
i
+1>|
3. else return 0;}
4. let r
1
and r
2
be the roots of T and S, respectively;
5. (*If S is a forest, construct a virtual root r2 for it,
which matches any label.*)
6. let T
1
, ..., T
k
be the subtrees of r
1
;
7. let S
1
, ..., S
l
be the subtrees of r
2
;
8. if label(r
1
) = label(r
2
)
9. then {if r
1
is a leaf then {if r
2
is not a virtual root
then return 1 else return 0;}
10. if r
2
is not a virtual root then mark(r
1
) := 1;
11. temp := <S
1
, ..., S
l
>; S
0
:= φ;
12. i := 1; j := 0; x := 0; (*i is used to scan T
1
, ..., T
k
; and
j is used to scan S
1
, ..., S
l
.*)
13. while (i
≤
k ∧ temp ≠ φ) do
14. {x := tree-inclusion(T
i
, temp);
15. if x > 0 then temp := temp/<S
j
+1, ..., S
j
+x>;
16. else
{let v be the T
i
’s root; let u be the S
j
+1’s root;
17. if v and u have the same label and mark(v) = 0
18. then {x := tree-inclusion(T
i
, S
j
+1);
temp := temp/<S
j
+x>;}
(*In the case that j = 0 and x = 0, S
j
+x = S
0
= φ.*)
19. else mark(v) := 0}
(*mark(v) is used only once in this case. Afterwards,
it will be set to 0 for the subsequent computation.*)
20. i := i + 1; j := j + x;}
21. if temp ≠ φ then {if r
2
is a virtual root then
return j
22. else return 0;}
23. else {if r
2
is a virtual root then return l
24. else return 1;}}
25. else {for i = 1 to k do
26. { x := tree-inclusion(T
i
, S);
27. if x = number-of-trees(S) then return 1;}
(* number-of-trees(S) is the number of the trees in S. A
tree can be considered as a forest containg only that tree.*)
28. return 0;}
end
In Algorithm tree-inclusion(T, S), line 1 checks
whether |T| < |S|. If it is the case, the algorithm
returns 0 if S is a tree. If S is a forest, we will check
T against the first i subtrees such that |<S
1
, ..., S
i
>|
≤ |T| < |<S
1
, ..., S
i+1
>| (see line 2). In addition, when
we check T against a forest <S
1
, ..., S
l
>, a virtual root
for it is constructed, which matches any label. Thus,
we will actually check the subtrees of T’s root:
T
1
, ..., T
k
against S
1
, ..., and S
l
to see whether they
include <S
1
, ..., S
l
> (see line 5). This is performed in
a while-loop over T
i
’s. In each step, a
recursive call: tree-inclusion(T
i
, <
i
l
, ..., S
S
l
>) (i = 1,
..., j for some j) is carried out, which returns an
integer x, indicating that T
i
includes <
i
l
, ...,
1−+ xl
i
> (see line 14). If x = 0, i.e., the subtrees of
T
S
S
i
’s root do not include any subtree in
i
l
, ..., SS
l
, we
need to check whether T
i
include
i
l
since when we
check T
S
i
against
i
l
, ..., S
S
l
, what we have really
done is to check the subtrees of T
i
’s root, not T
i
itself
(see lines 16 - 19). If S is a tree, the algorithm return
1 if it is included; otherwise, 0 (see line 22 and 24).
Finally, we note that if the root of T does not match
the root of S, the algorithm tries to find the first T
i
that contains the whole S (see lines 25 - 28).
In addition, we should pay attention to how mark(v)
is used (see lines 10, 17, and 19). Each time when v
is checked against a node (not a virtual node) in S,
mark(v) is set to 1. It is used to avoid the call tree-
inclusion(T[v], S
j
+1) after tree-inclusion(T[v], <S
j+1
,
..., S
l
>) returns back if |S
j+1
| ≤ |T[v]| < |<S
j+1
, S
j+2
>|
(see line 17), where T[v] represents a subtree (in T)
rooted at v. It is because in this case tree-
inclusion(T[v], S
j+1
) must have been invoked during
the execution of tree-inclusion(T[v], <S
j+1
, ..., S
l
>)
and v has been definitely checked against S
j
+1’s root
in this process, which is recorded by setting mark(v)
to 1 and used to avoid a second checking. However,
it is used only in this case. After that, it should be set
to 0 again for the rest part of the computation. This
arrangement is correct because during the execution
of tree-inclusion(T[v], <S
j+1
, ..., S
l
>), if |S
j+1
| ≤ |T[v]|
< |<S
j+1
, S
j+2
>|, v itself will be checked against S
j+1
’s
root. If |T[v]| ≥ |<S
j+1
, ..., S
j+i
>| for some i > 1, we
will check the subtrees of v against <S
j+1
, ..., S
j+i
>
and v is not really checked. In addition, in the rest
part of the execution of tree-inclusion(T[v], <S
j+1
, ...,
S
l
>), v is not checked. So, upon the return of tree-
inclusion(T[v], <S
j+1
, ..., S
l
>), we check the value of
mark(v) to see whether tree-inclusion(T[v], S
j+1
) has
been invoked. Obviously, after this checking,
mark(v) should be set to 0 again for the subsequent
computation.
Finally, we can show that the time complexity of the
algorithm is bounded by O(|T|⋅height(S)). It is
because although a node in T may be checked more
than once, it is checked against different nodes in S,
and all those nodes in S are on a same path. It is also
easy to see that the algorithm needs no extra space.
In the following, we apply the algorithm to the trees
shown in Fig. 5 and trace the computation step-by
step for a better understanding.
Example 2. Consider two ordered, labeled trees T
and S shown in Fig. 5, where each node in T is
identified with t
i
, such as t
0
, t
1
, t
11
, and so on; and
ON THE TREE INCLUSION AND QUERY EVALUATION IN DOCUMENT DATABASES
187