As we will discuss, this type of document
requires more expressive data manipulation, and so
we propose a more general query tree where a leaf
node may be single- or set-valued, and an internal
node may have operators other than and or or
associated with it. We define the general synthesized
query tree as below.
Definition 2: a general synthesized query tree
(GSQT) is a tree where each leaf node v is associated
with a query Q(v), which returns a value or a set of
values, and each internal node v is labelled with a
tag T(v), a function f, and each node will be assigned
a value V(v), as follows:
a) for a leaf node, its value V(v) is equal to the
return value of
()
, i.e.,
()
()
, and
b) for an internal node, with children v
1
, ... v
n
,
()
1
()
2
()…
n
(),,,()
In Figure 6, the same requirements as before are
illustrated, but we imply that the set of required
courses is obtained using a query submitted against
some data store. Here, we assume that external data
can be obtained from any available or required data
store.
Now, given that the required courses for the 3-
Year BSc (Geography) are kept elsewhere, to
determine if a student has successfully passed all
courses, the process of evaluating the requirement
has to be carried out differently from before. To
evaluate the requirement, the graduation officer must
run two queries and combine their results as we
explain next.
First, a list of courses successfully passed by the
student is obtained. Let us name this result
SuccessResult and assume this result is a relation
with two attributes: student number and course
number. Since we are considering a single student,
the same student number will appear in each tuple.
The other list obtained is a list of required courses.
Let us name this result RequiredList and assume this
result is a relation with one attribute: course number.
Note these two relations have one common attribute:
course number. The graduation officer needs to
determine if the set of courses successfully passed
includes the set of required courses. To do this, the
relational algebra division operator (Elmasri and
Navathe, 2003) should be conducted:
SuccessResult[studentNum, courseNum]
÷
RequiredList[courseNum].
The result of this operation is a relation of one
attribute: student number. In the result, a student
number appears if the student number appears in
SuccessResult with some course numbers which
form a super-set of RequiredList. In our example, if
the student has successfully taken each required
course, then the result of division is a relation of one
tuple having the student number of that student. If
the student has not taken all of the required courses
then our result is a relation of zero tuples - an empty
relation. The division operator is difficult to explain.
It is even more difficult to express in the standard
relational language SQL and error-prone since it is
not directly supported in that language. For this
reason, the document designer may prefer a different
approach where division is directly supported. We
note that the division can be expressed simply, as
shown in Figure 6.
In Figure 7, we illustrate a subtree rooted at
Major in the GSQT for our running example, for
which various functions are required to manipulate
the values obtained from descendant nodes in the
GSQT. For instance, associated with v
8
, we have a
division operation while for v
6
, the operation is the
projection.
In the Figure, the functions f( ) and g( ) are
defined as follows:
f(x, y): if x
∈ y, returns true; otherwise, false.
g(x): if 30
≤ x ≤ 48, returns true; otherwise, false.
As with the other operations, they take the values
from the corresponding child nodes as the
parameters. We also note that each leaf node in the
tree is associated with a query, which provides the
initial values for computation. Therefore, the
evaluation of V(v) for any node is performed
bottom-up. For instance, the value of v
8
, V(v
8
), is
calculated by dividing the result of Q(v
10
) through
the result of Q(v
11
) (i.e., Q(v
10
) ÷ Q(v
11
); both of
them come from its children); V(v
3
) is obtained by
computing g(V(v
6
)), and so on.
The GSQT is similar to the concept of query
trees used for constructing query execution plans in
relational database systems (Elmasri and Navathe,
2003). We note that, however, our documents have a
number of queries and for the purpose of evaluating
sub-rules separately, it is necessary for each sub-rule
to be self contained and for its query requirement to
be expressed independently of other rules.
select all
required courses
select all
courses taken
by student s
1
Q
3
= Q
1
divide Q
2
from external source
Figure 6: Division operation
Q
1
:
Q
2
:
XML-BASED EVALUATION OF SYNTHESIZED QUERIES
29