resources instead of a harsh contest for market
shares by clarifying the three stages of cloud
federations (Celesti et al., 2010). They mention that
currently most CPs are built based on stage 1 of
cloud computing which is the most primitive stage
of relying only on resources owned by one provider.
Stage 2 is an evolution of that where each provider
still holds tight to its resources but also buys
resources from other providers if it suits its needs.
Stage 3 on the other hand is creating a common pool
of resources by regarding all resources, both the
ones owned by the CP and the ones rented from
other CPs, as the same.
Rochwerger et. al. are concerned with another
aspect of cloud federations, the limited
interoperability that CPs are providing (Rochwerger
et al., 2011). They are also trying to implement a
system that creates a common pool of resources
between several CPs. They mention several concerns
that need to be addressed in such an effort, revolving
around the optimization of resource usage and cost
efficiency. Above all else, they mention that the CPs
have created their systems without thinking about
interoperability, which makes a middleware
necessary in order to provide a level of abstraction to
the deployment process.
On the other hand, standardization is trying to
tackle the interoperability issues of cloud
federations. Open Cloud Computing Interface
(OCCI) is an open standard as well as a working
group improving three basic aspects of cloud IaaS
services; portability, interoperability and integration
(Metsch, 2006). It aims to achieve this by providing
a slim (about 15 commands) RestFul API for IaaS
management, including resource and virtual machine
management, based on the HTTP and other, already
established, standards.
Since its creation, OCCI has gathered a lot of
support by academia and already has
implementations for an impressive number of cloud
management systems such as OpenStack,
CloudStack, OpenNebula, jClouds, Eukalyptus,
BigGrid, Okeanos, Morfeo Claudia and others
(OCCI-WG, 2016).
4.2 Cloud Data Management
4.2.1 Major Cloud Provider Solutions
As discussed in a previous section, Amazon is the
most popular cloud provider in the market. They are
providing two services for data management using
their cloud infrastructure, one for relational
databases and one for non-relational ones. In this
paper we will focus on the non-relational one which
is closer to our big data needs. The service
mentioned is called Amazon Elastic MapReduce
(EMR) (AWS, 2018). It is using the cloud
infrastructure of Amazon to provide big data
management solutions in many of the most popular
data store systems of the market, including HDFS,
Presto, Spark and others.
Google, another major player in the cloud
computing market, has created their custom solution
for data management called Dremel (Melnik et al.,
2010). The relevant software provided to the users
though is called BigQuery, which is actually based
on the Dremel software (Sato, 2012). Sato mentions
that Dremel is complementing the classical
MapReduce by improving the seek time, making it
possible to execute a read query in a 35.7 GB dataset
in under 10 seconds. Also, all this is done using
classical SQL queries, so Google actually created a
powerful, scalable data management solution
without the need to create a new query language.
The third great provider is Microsoft, with their
Azure Cosmos DB software (Shukla, 2017). Cosmos
DB is a cloud data base engine based on atom-
record-sequence (ARS). This enables it to function
as an extremely scalable data management engine
while providing support to multiple popular data
management systems like DocumentDB SQL,
MongoDB and Gremlin. It also provides easy API
access to most programming languages, using
simple JSON representations, enabling users to
access its functionality from their customized
clients.
4.2.2 Polyglot Persistence
When talking about traditional data management, we
imagine a data administrator managing a database in
a high end data server. This image gets more and
more obsolete as the cloud technology and parallel
computing are advancing, both in technological level
and in low cost solutions. Regardless of all the
advances of such a solution, including cost
effectiveness and easier scalability, a new problem
arises, that of the polyglot data stores.
A cloud consists of many machines and many
different data store systems, either due to machine
limitations or due to the need for specialized tasks
(Kolev et al., 2015). When trying to use these
different data store systems in an interconnected
cloud we encounter the polyglot persistence (Fowler,
2011) problem, which tries to manage a group of
different data stores, talking in different languages,
by using a common interface.
BUDaMaF
459