nearest-neighbor (RkNN) query (Kang, 2007), (Korn,
2000), (Lin, 2003), (Stanoi, 2000), (Stanoi, 2001),
(Tao, 2004), (Wu, 2008), (Yang, 2001) has also
received significant research attentions since it was
introduced in (Korn, 2000). An RkNN query
regarding a query point q finds all data points which
regard q as one of their corresponding k nearest
neighbors. Since q is close to such data points, q is
said to have high influence on these data points. The
RkNN answer set with respect to q is called the
influence set of q (Korn, 2000).
In some applications, skyline queries may be
issued with a range constraint. Consider a scenario
as follows. There are some office buildings and
restaurants located in a city. Each restaurant has its
own scores in different criteria such as service or
average price. A lot of workers from the office
buildings have to find a restaurant for lunch. They
may issue a range query with a distance r to indicate
that only the restaurants within this distance will be
considered. Moreover, they most likely will choose
the skyline restaurants within this distance to have
lunch. That is, a worker may issue a constrained
skyline query to find their target restaurants. For a
restaurant, we define its popularity by the number of
times it appears as an answer in the constrained
skyline queries issued from the workers. The
popularity of a restaurant can be computed by
reverse constrained skyline queries.
Now assume we want to open new restaurants in
the city at several candidate locations. We want to
determine top-k candidates based on their popularity
such that a good business can be expected. For
solving this novel top-k query, in this paper, we
propose a basic method and an advanced method.
Three pruning strategies are provided for reducing
the number of competitors while computing the
number of potential customers for each candidate.
Moreover, a pruning strategy focuses on reducing
the number of customers which cannot be the
potential customers of a target candidate. Rooted at
these four strategies, the advanced method
outperforms the basic method, substantially reducing
the computation time. The experiment results
demonstrate that the pruning strategies have a strong
pruning power.
The remainder of the paper is organized as
follows. The formal problem definition and a basic
solution to this problem are given in Section 2. An
advanced solution and its index structures are
described in Section 3. The performance evaluation
on the proposed algorithm is reported in Section 4.
Finally, Section 5 concludes this work.
2 PRELIMINARIES
In this section, we formally define the problem to be
solved and also propose a basic solution for it.
2.1 Problem Formulation
Referring to the scenario mentioned in Section 1, we
have two datasets including a set of office buildings
(customers) and a set of existing restaurants. In
addition, we have another dataset of candidates for
opening new restaurants. All of the datasets are on a
two dimensional space used to represent their
locations and moreover, the datasets of candidates
and the existing restaurants have the other n
attributes representing the features of the restaurants
such as service or average price.
Assume each customer finds a restaurant within
a distance r from his/her location. This search area
forms a circle with the center being the location of
the customer and a radius of r as shown in Figure 1,
where the triangle point represents the customer. If a
restaurant is located within this search area and is
the skyline point among all restaurants in this area
considering the other n attributes, this restaurant gets
one point from the corresponding customer. For
example, there are five restaurants located in the
search area as shown in Figure 1. The values of the
other 2 attributes of these restaurants representing
service ranking and food ranking are (6, 3), (5, 4),
(4, 5), (7, 5), and (6, 6), respectively. As a result, the
three restaurants with attributes (6, 3), (5, 4), and (4,
5) are skyline restaurants in this search area
(assuming smaller values of the attributes are better).
Each of them gets one point from the corresponding
customer.
Figure 1: An illustration of the search area of a customer.
The problem of determining the top-k candidates
by reverse constrained skyline queries is formally
defined as follows. There are three sets of data
points on a two dimensional space, representing
customers, competitors, and candidates. Moreover,
the competitors and candidates have the other n
DATA2015-4thInternationalConferenceonDataManagementTechnologiesandApplications
102