Authors:
Ankita Atrey
1
;
Gregory Van Seghbroeck
1
;
Higinio Mora
2
;
Filip De Turck
1
and
Bruno Volckaert
1
Affiliations:
1
IDLAB-imec, Technologie Park, Ghent University, Ghent and Belgium
;
2
University of Alicante, Alicante and Spain
Keyword(s):
Data Placement, Replica Placement, Geographically Distributed Clouds, Location-Based Services, Online Social Networks, Scalability, Overlapping Clustering.
Abstract:
The increased reliance of data management applications on cloud computing technologies has rendered research
in identifying solutions to the data placement problem to be of paramount importance. The objective
of the classical data placement problem is to optimally partition, while also allowing for replication, the set of
data-items into distributed data centers to minimize the overall network communication cost. Despite significant
advancement in data placement research, replica placement has seldom been studied in unison with data
placement. More specifically, most of the existing solutions employ a two-phase approach: 1) data placement,
followed by 2) replication. Replication should however be seen as an integral part of data placement, and
should be studied as a joint optimization problem with the latter. In this paper, we propose a unified paradigm
of data placement, called CPR, which combines data placement and replication of data-intensive services into
geographically
distributed clouds as a joint optimization problem. Underneath CPR, lies an overlapping correlation
clustering algorithm capable of assigning a data-item to multiple data centers, thereby enabling us to
jointly solve data placement and replication. Experiments on a real-world trace-based online social network
dataset show that CPR is effective and scalable. Empirically, it is 35% better in efficacy on the evaluated
metrics, while being up to 8 times faster in execution time when compared to state-of-the-art techniques.
(More)