ditions, it is possible to specify whether to use a
boolean semantics or to perform a ranked retrieval.
On the contrary, basic conditions are always treated
as mandatory: in order to be retrieved, an element
must be reachable by exactly following the specified
path expression, and all the conditions on values must
be satisfied.
In the effort of providing a uniform treatment of
basic and full-text conditions, the key idea is to con-
sider the searched path expression and the specified
conditions on values as desirable properties to enjoy
for an element to be returned, instead that consider-
ing them as mandatory constraints. Therefore, an el-
ement should be returned even if it does not perfectly
respect basic conditions, and a score value should in-
dicate how well such conditions are satisfied.
1.2 A Motivating Example
Consider the XML document shown graphically in
Figure 1. Suppose a user writes an XQuery ex-
pression containing the for clause for $a in doc(
“bib.xml”)/bib/book/author. The user need is
probably to find all book authors, including those who
just co-authored a book. The for clause, however,
will find only those authors that are the single authors
of at least one book. If the for clause has an approx-
imated behavior, it could also return a subtree reach-
able by following an relaxed version of the original
path expression, for example /bib/book//author,
therefore including co-authors in the result.
This relaxed query would find all the book au-
thors, but not the paper authors. It could be the
case that such authors are also of interest for the
user. The user need could be satifisfied by fur-
ther relaxing the query, i.e. the path expression
/bib/book//author could be transformed into the
path expression /bib//author.
Suppose now the user writes an expres-
sion containing the clause for $a in doc(
“bib.xml”)/bib/book/title. Such a query finds
all the book titles, but ignores paper titles, which
could also be interesting for the user. In fact some
semantic relationship exists between the words
book and paper: using some lexical database (e.g.
(Princeton University, 2007)) we can find that both
these words are hyponyms of the word publication.
Considering such a relationship, a different kind of
relaxation could treat the path /bib/paper/title
as a approximated version of /bib/book/title and
therefore include in the result also the paper titles.
Let us now consider a for clause including a filter
predicate based on the full-text operator ftcontains,
like the following:
for $a in doc("bib.xml")//paper
[//section/title ftcontains "INEX"]
We are looking for papers that include in a section
title the word INEX. The paper shown in Figure 1 is
not returned, because the titles of the various sections
do not include the searched word. However, the word
is included in the content of the first section of the
paper, therefore the paper is probably of interest for
the user. A possible relaxation could transform the
previous query by removing the last step in the path
expression, thus obtaining:
for $a in doc("bib.xml")//paper
[//section ftcontains "INEX"]
As a final example, consider the partial query
for $b in doc("bib.xml")/bib/book
where $b /price < 39
The user wants to find books with a price lower than
39. However, it could be the case that very few books
satisfy such a constraint (in the document of Fig-
ure 1, no book satisfies the constraint); consequently,
the user could also be interested in books having a
price of 39, or even in books having a price not much
greater than 39. A relaxed version of the where clause
could return such books, by substituting the condi-
tion $b/price < 39 with an approximated version
of it, obtained by changing the comparison operator
($b/price ≤ 39) or even increasing the threshold
price ($b/price < 45).
1.3 Our Contribution
The purpose of this paper is to formally define the
notion of query relaxation that has been informally
presented. Section 2 represents the core of the pa-
per; here we introduce the various relaxation opera-
tors, that perform one of the following tasks: 1) given
a path expression, define a set of relaxed path expres-
sions; 2) given a predicate on a element value, define
a set of relaxed predicates.
With respect to (Amer-Yahia et al., 2004), the pa-
per that mainly influenced us, our work has the advan-
tage of considering a wider spectrum of relaxations.
Moreover, we incorporate the notion of approxima-
tion into a general algebraic framework suitable for
representing queries over XML.
The relaxation operators are then used in Section
3 to define a set of approximated algebraic operators.
These operators are a variant of some of those pre-
viously defined for AFTX; AFTX, which is briefly
reviewed in the same section, is an algebra working
on forests of trees, i.e. ordered lists of trees. For a
deeper treatment of AFTX data model, algebraic op-
AN APPROXIMATION-AWARE ALGEBRA FOR XML FULL-TEXT QUERIES
63