2 RELATED WORK
Many researchers have used Skyline requests to
reduce the search space. The Skyline operator was
first introduced by Börzönyi et al., (2001). Then
many efficient skyline algorithms have been
proposed. Algorithms like Block Nested Loop
(BNL), Divide-and-Conquer (D&C), Sort Filter
Skyline (SFS), Bitmap and Nearest Neighbor (NN)
can process skyline query in the datasets without
indexes.
These algorithms can eliminate the low-quality
web services among large amounts of candidates and
return a much smaller and a higher quality set to the
user. But the obvious limitation of these studies is
that they can only compute Skyline on one
combination of QoS parameter.
However, in practice, different users may be
interested in different combinations of QoS
parameters and choose different preferences. In
order to tackle this issue Yang et al., (2015)
introduced the service SkyCube which consists in
applying the Skyline on all possible combinations of
QoS parameters. As it is computed previously in an
off-line manner, using SkyCube method can speed
up the response time in real-time web service
selection. Unfortunately, the current SkyCube
computation solutions suffer from the issue of
dimension scalability. For this reason, the
researchers moved towards the parallel Skyline
approach.
All of the aforementioned scientific publications
point the readers to one of the major difficulties
when mashing up cloud services, namely: the large
amount of services in the service pool to choose
from, the diversity set of services representing
different QoS, and the extended attribute dimensions
to treat.
As such, some works introduced Skyline
algorithms by incorporating the MapReduce
paradigm to exploit distributed parallelism in a cloud
mashup. A good evaluation of various MapReduce
(MR) in Skyline processing was given by Zhang et
al., (2015).
The grid-partitioning MapReduce is introduced
by Zhang et al., (2011) and the MR-angular
partitioning of Skyline space is studied in the
research of Vlachou et al., (2008). With such
partitioning methods, sometimes one partition may
contain data that does not satisfy a specific request.
Then, as an important variant of Skyline, the K-
Representative Skyline is a useful tool if the size of
the full Skyline is large (Bency, 2014). Bai et al.,
(2016) introduced the distance-based representative
Skyline (k-DRS). This method is based on a 2-step
approach to solve the k-DRS problem efficiently.
Step 1 divides the full Skyline set into k clusters. In
step 2, a point in each cluster is selected as the
representative Skyline point. The k selected Skyline
points consist of the k-DRS. So, the concept of
Skyline used in the selection of web services
according to their QoS applied to allow the
reduction of the search space in the local phase, and
subsequently, ensure a considerable time saving
during the overall optimization process.
In this paper, we present a parallel Skyline
service selection method designed to improve the
efficiency of the existing partitioning approaches
and propose a more intelligent and reduced search
space for the web services selection. This approach
is based on a cluster partitioning based method is
employed with representative Skyline for web
service selection.
3 SKYLINE
Skyline is a mechanism that acts as a filter in the
search space that will select only the interesting
points. Skyline can be formally defined as follows:
Given a set of points P in a space with q
dimensions, Skyline points are the points who are
not dominated by any other point in the search space
according to those dimensions.
Given a data space D defined by a set of q
dimensions {d
1
, ..., d
q
} and a dataset P on D with
cardinality n, a point p
∈
P can be represented as
p = {p
1
, ..., p
q
} where p
i
is a value on dimension d
i
.
Without loss of generality, let us assume that the
value p
i
in any dimension d
i
is greater or equal to
zero (p
i
≥ 0) and that for all dimensions the
minimum values are more preferable
Definition 1 (Dominate). A point p
∈
P is said to
dominate another point
q
∈
P, denoted as p
≺
q, if (1) on every dimension
d
i
∈
D, p
i
≤ q
i
; and (2) on at least one dimension
d
j
∈
D, p
j
<q
j
.
Definition 2 (Skyline Point). The Skyline is a set of
points SKY
P
⊆
P which are not dominated by any
other point in P. The points in SKY
P
are called
Skyline points (H. Köhler, J. Yang, and X. Zhou,
2011).
3.1 Parallel Skyline
Due to the complexity of the computation on the
large datasets, parallel approaches have been studied