3 PREFERENCES AND THEIR
IMPLEMENTATION IN R
In this section we present the theoretical founda-
tions of preferences according to (Kießling, 2002;
Kießling, 2005) tightly together with their implemen-
tation R-Pref. Due to space restrictions we refer to the
documentation and fully available source code on the
web for further details about R-Pref (Roocks, 2013).
The following code samples are restricted to the es-
sential parts while some technical details are omitted.
The code examples show that the R implementation is
very near to the specification.
Definition 1 (Preference). A preference P = (A, <
P
),
where A is a set of attributes, is a strict partial order on
the domain of A. Thus <
P
is irreflexive and transitive.
Thereby x <
P
y is interpreted as “I like y more than x”.
In R-Pref a preference is an object of the reference
class preference having (amongst others) the fields
col (a character-vector representing A) and a compare
function cmp (representing <
P
).
The result of a preference is computed by the pref-
erence selection, also called winnow by (Chomicki,
2003).
Definition 2 (Preference Selection). The BMO-set of
a preference P = (A, <
P
) on an input database rela-
tion R contains all tuples that are not dominated w.r.t.
the preference. It is computed by the preference se-
lection operator σ and finds all best matching tuples t
for P, where t.A is the projection to the attribute set A.
σ[P](R) := {t ∈ R | @t
0
∈ R : t.A <
P
t
0
.A}
In the following the projection will be mostly
omitted, i.e., we write just t <
P
t
0
for t.A <
P
t
0
.A.
In R-Pref this is performed by the sigma function.
For a preference pref and a dataset tbl the R code
implementing the BMO-set definition is essentially:
for( i in 1:nrow( t bl ) )
ind [ i ] = !any( p re f$cmp ( tbl , tbl [ i ,]))
res = tbl [ ind ,]
Therein !any corresponds to @ and the call of cmp
represents <
P
. Of course, this is not an efficient al-
gorithm but shows that the implementation is a close
representation of its formal foundations.
3.1 Base Preference Constructors
To specify a preference, a variety of intuitive base
preference constructors together with some complex
preference constructors has been defined. Subse-
quently, we present some selected preference con-
structors. More preference constructors as well as
their formal definition can be found in (Kießling,
2002; Kießling, 2005; Kießling et al., 2011).
Definition 3 (SCORE
d
Preference). Assume a scoring
function f : dom(A) → R
+
0
, and some d ∈ R
+
0
. Then P
is called a SCORE
d
preference, iff for x,y ∈ dom(A):
x <
P
y ⇐⇒ f
d
(x) > f
d
(y)
where f
d
: dom(A) → R
+
0
is defined as:
f
d
(v) :=
(
f (v) if d = 0
l
f (v)
d
m
if d > 0
In R-Pref this is realized with the score(column,
scr_fnc, dval) function in a few code lines.
An important sub-constructor of SCORE
d
is the
BETWEEN
d
(A, [low, up]) preference expressing the
wish for a value between a lower and an upper bound.
Its scoring function equals
f (v) = max{low − v, 0, v − up}
In R-Pref the implementation is essentially:
between = function( co lumn , low , up , . .. )
score( colu mn , function( va ls )
pmax( low - vals , 0, vals - up ) , .. .)
Thereby “...” bypasses additional arguments like
the d-parameter to score. The R funtion pmax
is the parallel maximum, which returns a vector
of logicals, if val is a vector. Sub-constructors of
BETWEEN are, e.g., the AROUND
d
(A, z)-preference
and the HIGHEST
d
(A)-preference. We just consider
their implementation as this is very close to the defi-
nition:
around = function( c ol um n , cen te r , .. .)
between( column , c en te r , cen te r , .. .)
highest = function( co lumn , ... )
around( co lu mn , s u pre m a [[ co lum n ]] , ...)
Thereby suprema is a variable containing the max-
imal values of the given dataset for every numerical
column, determined initially in sigma. Next to the
numerical preferences there are also preferences on
categorical domains, e.g., the LAYERED-preference.
Definition 4 (LAYERED
m
Preference). Let L =
(L
1
, ..., L
m
) be an ordered list of m sets forming a par-
tition of dom(A) for an attribute A. The preference P
is a LAYERED
m
(A, (L
1
, ..., L
m
)) preference if its scor-
ing function equals
f (v) = i − 1 ⇐⇒ x ∈ L
i
.
For convenience, one of the L
i
may be named
“OTHERS”, representing the set dom(A)\
S
j6=i
L
j
.
The essential part in the implementation of the
score-function for LAYERED is:
res = rep( Inf , length( val s ))
for( i in 1:length( lay e rs ))
res [ val s %in% l aye rs [[ i ] ]] = i -1
DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications
106