products. Let’s consider a reverse top-2 query for
lcd1. By calculation, we know that lcd1 is the top-
2 products for customers Bell and Carl. Therefore the
query results will be two vectors that describe Bell
and Carl’s weights, namely (0.5,0.5) and (0.2,0.8). If
we issue a reverse top-2 query for lcd3. The query
will return only one vector that describes Adam’s
weights, namely (0.8,0.2). Obviously, lcd1 has a
larger group of potential buyers than lcd3.
Table 1: Specifications for LCDs.
LCD Screen Size Refresh Rate
lcd1 20 120
lcd2 30 80
lcd3 60 65
lcd4 40 100
lcd5 50 70
Table 2: Customer Preferences.
Customer Weight on Weight on
Screen Size Refresh Rate
Adam 0.8 0.2
Bell 0.5 0.5
Carl 0.2 0.8
Reverse top-k queries can not only help producers
(or manufacturers) predict the popularity of a particu-
lar product. They can also help them design effective
marketing strategies. For instance, to advertise lcd1
to Bell and Carl, and to advertise lcd3 to Adam.
Reverse top-k queries are different from reverse
nearest neighbor (RNN) queries (Korn and Muthukr-
ishnan, 2000) and reverse skyline queries (Dellis and
Seeger, 2007). Generally speaking, reverse top-k
queries provide a more generic way to identify poten-
tial interested customers(and their preferences) for a
given product. Existing research findings in the fields
of RNN and reverse skyline queries can not be applied
to reverse top-k queries.
The rest of the paper is organized as follows: Sec-
tion 2 formally define reverse top-k queries. Section
3 explains the original RTA algorithm and our IRTA
algorithm. Section 4 shows the experimental results
and concludes this paper.
2 PROBLEM STATEMENT
Consider an n dimensional data space D. A dataset S
on D has the cardinality |S| and represents the set of
products. Each product p ∈ S can be plotted on D as
a point. The n coordinates of p are (p
1
, p
2
,.. ., p
n
),
where p
i
stands for the ith attribute of p. Without
loss of generality, we assume that each product p has
non-negative, numerical attribute values. We further
assume smaller attribute values are preferred.
Top-k queries use a scoring function to determine
the rank of each product. The most commonly used
scoring function is a linear function that calculates
the weighted sum of attribute values. Each attribute
value p
i
has a corresponding weight w
i
, which indi-
cates p
i
’s relative importance to the rank. The linear
scoring function for p, denoted as f
w
(p), is defined
as: f
w
(p) =
∑
n
i=1
w
i
× p
i
.
A linear top-k query takes three parameters and
can be denoted as T OP
k
(S,w), where S is a dataset on
an n-dimensional data space, w is an n-dimensional
vector that represents a certain customer’s prefer-
ences. Formally, a top-k query can be defined as fol-
lows:
Definition 1. Given a dataset S on an n-dimensional
data space, a positive integer k and an n-dimensional
vector w, A top-k query T OP
k
(S,w) returns a set of
points P such that P ⊆ S, |P| = k, and ∀p
i
, p
j
: p
i
∈
P, p
j
/∈ P ⇒ f
w
(p
i
) ≤ f
w
(p
j
).
A reverse top-k query takes four parameters and
can be denoted by RT OP
k
(S,W, p), where S and W are
two datasets on an n-dimensional data space, where S
represents the set of products and W the set of user
preferences, respectively, and p represents a certain
product. A reverse top-k query is formally defined as
follows:
Definition 2. Given two datasets S and W on an n-
dimensional data space, a positive integer k and a
product p, A reverse top-k query RT OP
k
(S,W, p) re-
turns a set of weights
¯
W , such that ∀ ¯w
i
∈
¯
W , q ∈
TOP
k
(S, ¯w
i
).
3 THE RTA AND IRTA
ALGORITHMS
3.1 The RTA Algorithm
A naive brute force approach to answer a reverse top-
k query has to process a top-k query for each weight
vector w ∈W that represents user preferences. As pro-
cessing just a single top-k query involves non-trivial
calculations, evaluating |W | top-k queries can be pro-
hibitively expensive. The brute force approach is im-
practical when |S| or (and) |W | is (are) large.
Vlachou et al (Vlachou et al., 2010) proposed
the RTA (Reverse top-k Threshold Algorithm) to re-
duce the number of top-k evaluations. The algorithm
makes use of the calculated top-k results to deter-
mine if it is necessary to evaluate top-k queries for
ICEIS 2011 - 13th International Conference on Enterprise Information Systems
136