PREFERENCE RULES IN DATABASE QUERYING
Sergio Greco, Cristian Molinaro and Francesco Parisi
DEIS, University of Calabria, 87036 Rende, Italy
Keywords:
Deductive databases, prioritized queries, preferences.
Abstract:
The paper proposes the use of preferences for querying databases. In expressing queries it is natural to express
preferences among tuples belonging to the answer. This can be done in commercial DBMS, for instance, by
ordering the tuples in the result. The paper presents a different proposal, based on similar approaches deeply
investigated in the artificial intelligence field, where preferences are used to restrict the result of queries posed
over databases. In our proposal a query over a database D B is a triple hq, P , Φi, where q denotes the output
relation, P a Datalog program (or an SQL query) used to compute the result and Φ is a set of preference
rules used to introduce preferences on the computed tuples. In our proposal tuples which are ”dominated” by
other tuples do not belong to the result and cannot be used to infer other tuples. A new stratified semantics is
presented where the program P is partitioned into strata and the preference rules associated to each stratum
of P are divided into layers; the result of a query is carried out by computing one stratum at time and by
applying the preference rules, one layer at time. We show that our technique is sound and that the complexity
of computing queries with preference rules is still polynomial.
1 INTRODUCTION
The growing volume of available information poses
new challenges to the database and artificial intelli-
gence communities. Recent researches have investi-
gated new techniques in accessing large volumes of
data such as user-centered access to information, in-
formation filtering and extraction and policies to re-
duce data presented to users. An interesting direction
deeply studied in the artificial intelligence and non-
monotonic reasoning fields consists in the use of pre-
ferences to express priorities on the alternative sce-
narios.
The paper presents a logical framework wherein pre-
ferences are used to restrict the result of queries posed
over a database. This is an important aspect in query-
ing large databases such as those used by search en-
gines. In this context, the result of a query contains
only tuples which are not dominated by other tuples
and dominated tuples cannot be used to infer new in-
formation. The novelty of the presented approach is
that preferences are stratified and applied one stratum
at time. A second innovative aspect of this proposal
is that preferences on both base and derived atoms
are considered as well as general (recursive) queries
which can be expressed by means of stratified Data-
log.
Example 1 Consider a database
D B =
{fish, beef} and a program P consisting of
the two rules:
red-wine beef
white-wine fish
Assume now to have a query defined by the rules in
P and the preference
ρ
1
= red-wine white-wine beef
stating that if there is beef, we prefer red-wine to
white-wine. The set of preferred atoms contains
the base atoms fish and beef and the derived atom
red-wine (the atom white-wine is not preferred).
Assume now to also have the preference ρ
2
= fish
beef stating that we prefer fish to beef. In this
case, first the preference rule ρ
2
, and next the pre-
ference rule ρ
1
, are considered. However, ρ
1
can-
not be applied as beef is not in the preferred set
119
Greco S., Molinaro C. and Parisi F. (2007).
PREFERENCE RULES IN DATABASE QUERYING.
In Proceedings of the Ninth International Conference on Enterprise Information Systems - DISI, pages 119-124
DOI: 10.5220/0002389901190124
Copyright
c
SciTePress
of atoms. Consequently, the set of preferred atoms,
with respect to the preference rules ρ
2
and ρ
1
, is
{fish, white-wine}.
Contributions. In this paper we study the use of
preferences in querying databases. We consider gen-
eral (stratified) Datalog queries and general preferen-
ces: the head of preference rules may contain atoms
belonging to different relations and the body consists
of a conjunction of literals. A semantics where both
query and preferences are partitioned into strata is
defined. Under such a semantics, the query is com-
puted one stratum at time and for each stratum (of
the query), the preferences are applied one stratum at
time.
Related Work. The increased interest in preferen-
ces in logic programs is reflected by an extensive
number of proposals and systems for preference han-
dling. Most of the approaches propose an extension
of logic programming by adding preference informa-
tion. The most common form of preference consists
in specifying a strict partial order on rules (Delgrande
et al., 2003; Gelfond and Son, 1997; Sakama and
Inoue, 2000; Zhang and Foo, 1997), whereas more
sophisticated forms of preferences also allow priori-
ties to be specified between conjunctive (disjunctive)
knowledge with preconditions (Brewka et al., 2003;
Sakama and Inoue, 2000) and numerical penalties for
suboptimal options (Brewka, 2004).
Considering the use of preferences in querying
databases, an extension of relational calculus express-
ing preferences for tuples in terms of logical con-
ditions has been proposed in (Lacroix and Lavency,
1987). Preferences requiring non-deterministic
choice among atoms which minimize or maximize the
value of some attribute has been proposed in (Greco
and Zaniolo, 2002). An extension of Datalog with
preference relations, subsuming the approach propo-
sed in (Lacroix and Lavency, 1987), has been pro-
posed in (Kostler et al., 1995), whereas an exten-
sion of SQL including preferences has been propo-
sed in (Kieβling, 2002; Kieβling and Kostler, 2002).
In the last proposal several built-in operators and a
formal definition of their combinations (i.e. intersec-
tion, union, Pareto composition, etc.) has been con-
sidered. Borzsonyi et al. proposed the skyline opera-
tor (Borzsonyi et al., 2001), to filter out a set of “inter-
esting” point (i.e. not dominated by any other point)
from a potential large set of points. An extension of
SQL with a skyline operator has been also proposed.
A framework for specifying preferences using logical
formulas and its embedding into relational algebra has
been introduced in (Chomicki, 2003). The paper also
introduces the winnow operator which generalizes the
skyline operator. The implementation of winnow and
ranking is also studied in (Torlone and Ciaccia, 2002).
Algorithms for computing skyline operators are also
studied in (Kossmann et al., 2002; Papadias et al.,
2003; Chomicki et al., 2003). In (Agrawal and Wim-
mers, 2002) the use of quantitative preferences (scor-
ing functions) in queries is proposed.
In this work, in contrast with previous proposals, gen-
eral preferences and a different (stratified) semantics,
which we believe to be more intuitive, are considered.
2 BACKGROUND
Familiarity with disjunctive logic programs and di-
sjunctive deductive databases is assumed (Ullman,
1988).
Datalog Programs. A term is either a constant or a
variable. An atom is of the form p(t
1
, . . . , t
h
), where
p is a predicate symbol of arity h and t
1
, . . . , t
h
are
terms. A literal is either an atom A or its negation
not A. A (Datalog) rule r is a clause of the form
A B
1
, ..., B
m
, not B
m+1
, ..., not B
n
, ϕ n 0
where A, B
1
, . . . , B
n
are atoms, whereas ϕ is a conjunc-
tion of built-in atoms of the form uθv where u and v
are terms and θ is a comparison predicate. A is the
head of r (denoted by Head(r)), whereas the conjunc-
tion B
1
, ..., B
m
, not B
m+1
, ..., not B
n
, ϕ is the body of r
(denoted by Body(r)). It is assumed that each rule
is safe, i.e. a variable appearing in the head or in a
negative literal also appears in a positive body literal.
A (Datalog) program is a finite set of rules. A not-free
program is called positive. The Herbrand Universe
U
P
of a program
P is the set of all constants appear-
ing in
P , and its Herbrand Base B
P
is the set of all
ground atoms constructed from the predicates appear-
ing in
P and the constants from U
P
. A term (resp. an
atom, a literal, a rule or a program) is ground if no
variable occurs in it. A rule r
is a ground instance
of a rule r if r
is obtained from r by replacing every
variable in r with some constant in
U
P
; ground(
P )
denotes the set of all ground instances of the rules in
P .
An interpretation M for a Datalog program
P is any
subset of
B
P
; M is a model of
P if it satisfies all rules
in ground(
P ). The (model-theoretic) semantics for
positive
P assigns to P the set of its minimal models
M M (P ), where a model M for P is minimal if no
proper subset of M is a model for
P . For any interpre-
tation M,
P
M
is the ground positive program derived
from ground(
P ) by 1) removing all rules that contain
ICEIS 2007 - International Conference on Enterprise Information Systems
120
a negative literal notA in the body and A M, and
2) removing all negative literals from the remaining
rules. An interpretation M is a stable model of
P if
and only if M
M M (P
M
) (Gelfond and Lifschitz,
1988). For general
P , the stable model semantics as-
signs to
P the set SM (P ) of its stable models. It is
well-known that stable models are minimal models
(i.e. SM (P ) M M (P )) and that for negation free
programs minimal and stable model semantics coin-
cide (i.e.
SM (P ) = M M (P )).
Given a Datalog program
P , G
g
(
P ) = (V
g
, E
g
)
denotes the dependency graph associated with
ground(
P ) where V
g
consists of all ground atoms ap-
pearing in ground(
P ), whereas there is an arc from
B to A in E
g
if there is a rule r in ground(
P ) such
that Head(r) = A and B Body(r); the arc is said to
be marked negatively if B appears negated in the body
of r. The dependency graph
G (P ) = (V, E) associated
with
P is built by considering the ground program de-
rived from
P by eliminating all terms (i.e. every atom
p(t) is replaced by p). A ground atom p(t) depends
on a ground atom q(u) if there is a path in G
g
(
P )
from q(u) to p(t). Analogously, a predicate symbol p
depends on a predicate symbol q if there is a path in
G (P ) from q to p. The dependency is negated if there
is an arc marked negatively in the path.
A partition π
0
, . . . , π
k
of the set of all predicate sym-
bols of a Datalog program
P , where each π
i
is called
stratum, is a stratification of
P if for each rule r in P
the predicates that appear only positively in the body
of r are in strata lower than or equal to the stratum of
the predicate in the head of r, and the predicates that
appear negatively are in strata lower than the stratum
of the predicate in the head of r. The stratification of
the predicates defines a stratification of the rules of
P
into strata hP
1
, . . . ,
P
k
i where a stratum
P
i
contains
rules which define predicates in π
i
. A Datalog prog-
ram is called stratified if it has a stratification. Strat-
ified (normal) programs have a unique stable model
which coincides with the stratified model, obtained
by computing the fixpoints of every stratum in their
order.
Queries. Predicate symbols are partitioned into two
distinct sets: base predicates and derived predicates.
Base predicates correspond to database relations de-
fined over a given domain and they do not appear in
the head of any rule, whereas derived predicates are
defined by means of rules. Given a set of ground
atoms
D B , a predicate symbol p and a stratified prog-
ram
P , D B [p] denotes the set of p-tuples in D B ,
while
P
D B
denotes the program derived from the
union of
P with the facts in D B , i.e. P
D B
=
P D B .
The semantics of
P
D B
is given by the stratified mo-
del (which coincide with the unique stable model)
of
P
D B
. The answer to a query Q = (g,
P ) over a
database
D B , denoted by Q(D B ), is given by M [g]
where
M = SM (P
D B
). In the following we also de-
note with
P (D B ) = SM (P
D B
) the application of
P
to D B ; therefore Q(D B ) = P (D B )[g].
3 PREFERENCE RULES AND
QUERIES
This section presents a framework for expressing pre-
ferences in the evaluation of queries posed on a given
database. The framework is based on the introduc-
tion of preference rules, whose syntax is inspired to
the management of priorities in the artificial intelli-
gence field, logic programming and database query-
ing (Brewka et al., 2003; Delgrande et al., 2003; Gel-
fond and Son, 1997; Sakama and Inoue, 2000; Zhang
and Foo, 1997).
3.1 Syntax
A prioritized program consists of a set of stan-
dard rules (Datalog program) and a set of preference
rules. As rules expressing preferences eliminate tu-
ples which are derived by means of standard rules
(Datalog program) we first introduce a standard strat-
ification of the Datalog program to fix the order in
which standard rules are applied. Preference rules are
associated to each subprogram (stratum) and applied
after the subprogram has been evaluated. Let start by
introducing the concept of standard stratification.
Definition 1 The standard stratification of a strati-
fied program
P consists of k strata hP
1
, ...,
P
k
i where
k is the minimal value such that for each
P
i
and for
each pair of predicates p and q defined in
P
i
either
they are mutually recursive or they are independent
(i.e. p does not depend on q and q does not depend on
p).
In the following, given an atom p(t), str(p(t)) de-
notes the stratum of the predicate symbol p (or equiv-
alently of the subprogram in which p is defined) in the
standard stratification.
Definition 2 A preference rule ρ is of the form:
A C B
1
, ..., B
m
, not B
m+1
, ..., not B
n
, ϕ (1)
where where A,C, B
1
, . . . , B
n
are atoms, and ϕ is a
conjunction of built-in atoms.
Also in this case we assume that rules are safe. In
the above definition A C is called head of the pre-
ference rule (denoted as Head(ρ)), whereas the con-
junction B
1
, ..., B
m
, not B
m+1
, ..., not B
n
, ϕ is called
PREFERENCE RULES IN DATABASE QUERYING
121
body (denoted as Body(ρ)). Moreover, we denote
with Head
1
(ρ) and Head
2
(ρ) the first and the second
atom in the head of ρ, respectively (i.e. Head
1
(ρ) = A
and Head
2
(ρ) = C).
The intuitive meaning of a ground preference rule ρ is
that if the body of ρ is true, then the atom A is prefer-
able to C (we also say that the atom C is dominated
by the atom A). This means that in the evaluation of a
prioritized program h
P , Φi the model defining its se-
mantics cannot contain the atom C if it contains the
atom A and the body of the preference rule is true.
Let Φ be a preference program, i.e. a set of pre-
ference rules. The transitive closure of ground(Φ)
is Φ
g
= ground(Φ) {(A C body
1
, body
2
|
A B body
1
Φ
g
B C body
2
Φ
g
}.
Analogously, we define Φ
as the closure of the set of
ground preference rules derived from Φ by replacing
every atom p(t) with p and deleting built-in atoms.
Definition 3 A (ground) preference program Φ
g
is
layered if it is possible to partition it into n layers
hΦ
g
[1], . . . , Φ
g
[n]i as follows:
For each ground atom A such that there is no
ground rule ρ Φ
g
such that Head
2
(ρ) = A,
layer(A) = 0;
For every ground atom C such that there is a rule
ρ of the form (1) (i.e. such that Head
2
(ρ) = C),
layer(C) > max{layer(B
1
), . . . , layer(B
n
), 0} and
layer(C) layer(A);
The layer of a preference rule ρ Φ
g
, denoted as
layer(ρ), is equal to layer(Head
2
(ρ));
Φ
g
[i] consists of all preference rules associated
with the layer i.
Example 2 Consider the set of preference rules Φ:
ρ
1
: fish beef
ρ
2
: red-wine white-wine beef
ρ
3
: white-wine red-wine fish
The transitive closure Φ
consists of the rules
ρ
1
, ρ
2
, ρ
3
plus the following rules
ρ
4
: red-wine red-wine beef, fish
ρ
5
: white-wine white-wine fish, beef
Φ
is partitioned into the two layers Φ
[1] = {ρ
1
} and
Φ
[2] = {ρ
2
, ρ
3
, ρ
4
, ρ
5
}.
As it will be clear in the next subsection, preference
rules of the form A A body are useless and can
be deleted. Therefore, in the above example Φ
[2] =
{ρ
2
, ρ
3
}.
Example 3 Consider the set of preference rules Φ:
ρ
1
: fish beef white-wine
ρ
2
: red-wine white-wine beef
According to ρ
1
the layer of beef must be greater
than the layer of white-wine, whereas according to
ρ
2
the layer of white-wine must be greater than the
layer of beef. Thus, the set of preference rules is not
layered.
Observe that in the above definition, in order to com-
pute the closure of the ground instantiation of Φ, we
need to know the database
D B containing all con-
stants in the database domain. Therefore, checking
whether Φ
g
can be partitioned into layers cannot be
done at compile-time. It is possible to define suffi-
cient conditions which guarantee that the set of prefe-
rence rules can be partitioned into layers by consider-
ing the (ground) program Φ
instead of the program
Φ
g
. This means that if Φ
can be partitioned into lay-
ers, the set Φ
g
can be partitioned into layers as well,
although the layers of Φ
g
may be different from the
layers of Φ
(the layers of Φ
g
define a “refinement”
of the layers of Φ
).
Definition 4 A prioritized query is of the form
hq,
P , Φi where q is a predicate symbol denoting the
output relation,
P is a (stratified) Datalog program
and Φ is a set of preference rules.
As said before, the intuitive meaning of a prioritized
query hq,
P , Φi over a database D B is that the atoms
derived from
P and D B must satisfy the preference
conditions defined in Φ.
Definition 5 A prioritized query Q = hq,
P , Φi is said
to be well formed if Φ
g
is layered and for every
ground atom C such that there is a rule ρ of the form
(1) (i.e. such that Head
2
(ρ) = C) it holds that
1. str(C) max{str(A), str(B
1
), . . . , str(B
n
)}, and
2. A, B
1
, ..., B
m
do not depend on C in
P .
In the following we assume that our queries are well
formed. Sufficient conditions can be defined on the
base of the dependency graph
G (P ).
3.2 Semantics
First we analyze the case where Φ defines preferen-
ces on databases atoms and next we consider the case
where Φ expresses preferences on base and derived
atoms, i.e. also on atoms defined in
P .
3.2.1 Preferences On Base Atoms
It is assumed here to have a query Q = hq, P ,Φi and
that the preference rules in Φ express preferences only
among base atoms. As said before, Φ
g
can be parti-
tioned into n layers
b
Φ
g
= hΦ
g
[1], ..., Φ
g
[n]i.
Definition 6 Let
D B be a set of ground atoms,
Φ a set of preference rules such that
b
Φ
g
=
hΦ
g
[1], ..., Φ
g
[n]i, and t, u two atoms in
D B . We say
ICEIS 2007 - International Conference on Enterprise Information Systems
122
that t is preferable to u with respect to Φ
g
[i] (denotes
as t
Φ[i]
u) if
(t u body
1
) Φ
g
[i] s.t.
D B |= body
1
, and
6 (u t body
2
) Φ
g
[i] s.t. D B |= body
2
.
The set of tuples in D B which are preferred with re-
spect to Φ
g
[i] is Φ
g
[i](
D B ) = {t | t D B 6 u
D B s.t. u
Φ[i]
t}.
Observe that Φ
g
could contain preference rules of the
form A A body. Such preferences are useless as
they are not used to infer preferences among ground
atoms and can be deleted from Φ
g
.
Example 4 Consider the database
D B = {fish,
beef, red-wine, white-wine, pie,ice-cream} and
the following preference rules Φ:
ρ
1
: pie ice-cream
ρ
2
: red-wine white-wine fish
ρ
3
: white-wine red-wine beef
The set Φ
g
consists, without considering useless
rules, of a unique layer Φ
g
[1] = {ρ
1
, ρ
2
, ρ
3
}. The ap-
plication of Φ
g
[1] to
D B gives the set Φ
g
[1](
D B ) =
{fish, beef, red-wine, white-wine, pie}
Definition 7 Let
D B be a database and Q = hq, P , Φi
be a query such that Φ expresses preferences only
on base atoms and the set of ground preference rules
Φ
g
is layered into
b
Φ
g
= hΦ
g
[1], ..., Φ
g
[n]i. Then the
set of preferred tuples with respect to
b
Φ
g
is
M =
P (
b
Φ
g
(
D B ))
=
P (Φ
g
[n](Φ
g
[n 1]·· ·(Φ
g
[1](D B ))···))).
The answer to the query Q is given by M [q].
Example 5 Consider the database
D B = {fish,
beef, red-wine, white-wine, pie} and the prefe-
rence rules Φ of Example 2. Φ
g
is equal to Φ and it is
layered into
b
Φ
g
= hΦ
g
[1], Φ
g
[2]i = h{ρ
1
}, {ρ
2
, ρ
3
}i
The application of Φ
g
[1] to
D B gives the set M
1
=
Φ
g
[1](
D B ) = {fish, red-wine, white-wine, pie}
The application of Φ
g
[2] to
M
1
gives the set
M
2
=
Φ
g
[2](
M
1
) = {fish,white-wine, pie}
3.2.2 General Preferences
We consider now general prioritized queries Q =
hq,
P , Φi where P is a stratified Datalog program and
Φ expresses preferences also on derived atoms.
Let hq,
P , Φi be a prioritized query and D B a
database. Let h
P
1
, . . . ,
P
k
i be the standard stratifica-
tion of ground(
P ) and let P
0
= {A | A
D B }.
Then, Φ
g
[
P
i
], for i [0..k], denotes the following set
of preference rules in Φ
g
:
Φ
g
[
P
i
] = {A C body | (C body
)
P
i
}
Definition 8 Let
D B be a database and let Q =
hq,
P , Φi be a prioritized query and hP
1
, ...,
P
k
i the
standard stratification of
P . The application of P and
Φ to
D B is defined as follows: M
0
=
b
Φ
g
[
P
0
](
D B )
and for each i in [1..k], M
i
=
b
Φ
g
[
P
i
](
P
i
(M
i1
)).
The answer to the query Q over the database
D B , de-
noted as Q(
D B ), is given by M
k
[q].
Our proposal is sound, i.e. for each ground preference
rule A C body in Φ
g
, if M
k
|= (body A) then
M
k
6|= C. Moreover, it can be shown that the compu-
tational complexity of Q(
D B ) is polynomial time.
4 CONCLUSIONS
This paper has introduced prioritized queries, a form
of queries well-suited for expressing preferences
among tuples either belonging to the source database
or derived by means of the program specified in the
query. It has been shown that prioritized queries are
well-suited to express queries wherein we are inter-
ested only in preferred tuples. A stratified semantics
for computing prioritized queries has been presented
where the program
P is partitioned into strata and the
preference rules associated to each stratum of
P are
divided into layers; a query is evaluated by computing
one stratum at time and by applying the preference
rules, one layer at time. The computational comple-
xity of computing prioritized queries remains polyno-
mial.
REFERENCES
Agrawal, R., and Wimmers, E. L. (2002). A framework
for expressing and combining preferences. Proc. SIG-
MOD, pp. 297-306.
Borzsonyi S., Kossmann D., Stocker K. (2001). The skyline
operator, Proc. ICDE, 421-430.
Brewka, G. (2004). Complex Preferences for Answer Set
Optimization, KR, 213-223.
Brewka G., Niemela I., Truszczynski M. (2003). Answer
Set Optimization. IJCAI, 867-872.
Chomicki, J. (2003). Preference Formulas in Relational
Queries. ACM TODS, 28(4), 1-40.
Chomicki, J., Godfrey, P., Gryz, J., and Liang, D. (2003).
Skyline with presorting. Proc. ICDE.
Delgrande, J., P., Schaub, T., Tompits, H. (2003). A Frame-
work for Compiling Preferences in Logic Programs.
TPLP, 3(2), 129-187.
Gelfond, M., Son, T.C. (1997). Reasoning with prioritized
defaults. LPKR, 164-223.
PREFERENCE RULES IN DATABASE QUERYING
123
Gelfond, M., Lifschitz, V. (1988). The Stable Model Se-
mantics for Logic Programming, ICLP.
Greco S., Zaniolo C. (2002). Greedy by Choice, Proc.
PODS.
Kieβling, W. (2002). Foundations of preferences in
database systems, Proc. VLDB.
Kieβling, W., Kostler, G. (2002). Preference SQL - Design,
Implementation, experience, VLDB.
Kossmann, D., Ramsak, F., and Rost, S. (2002). Shoot-
ing stars in the sky: An online algorithm for skyline
queries. Proc. VLDB.
Kostler, G., Kieβling, W., Thone, H., Guntzer, U. (1995).
Fixpoint iteration with subsumption in deductive
databases. JIIS, 4, 123-148.
Lacroix M., Lavency P.(1987). Prefences: Putting More
Knowledge Into Queries. VLDB, 217-225.
Papadias, D., Tao, Y., Fu, G., and Seeger, B. (2003). An
optimal and progressive algorithm for skyline queries,
Proc. SIGMOD, pp. 467-478.
Sakama, C., Inoue, K. (2000). Priorized logic programming
and its application to commonsense reasoning. Artifi-
cial Intelligence, 123, 185-222.
Torlone, R., Paolo Ciaccia. (2002). Finding the Best when
it’s a Matter of Preference, Proc. SEBD, pp. 347-360.
Ullman, J. K. (1988). Principles of Database and
Knowledge-Base Systems, Vol. 1, Computer Science
Press.
Zhang, Y., Foo, N. (1997). Answer sets for prioritized logic
programs. ILPS, 69-83.
ICEIS 2007 - International Conference on Enterprise Information Systems
124