project(Lynn et al., 2016) is described. and a demon-
strative use case application is given to illustrate the
applicability of the proposed solution. Finally, some
conclusions are drawn.
2 RELATED WORK
The rapid adoption of cloud computing in both public
and private sectors is resulting in hyper-scale cloud
deployment. This trend poses challenges to cloud
management and cloud architecture design. Exist-
ing cloud platforms regardless whether they make use
of virtualization, containerization or bare metal offer-
ings all focus on the management of homogeneous
resources with respect to the desirable non-functional
requirements, for example, scalability and elasticity.
Google Borg (Verma et al., 2015) is a platform for
managing large-scale bare metal environments used
by Google, internally. Borg manages tens of thou-
sands of servers simultaneously. The Borg architec-
ture consists of three main component types: Borg
masters, job schedulers, and Borglet agents. A typ-
ical Borg instance consists of a single Borg master,
a single job scheduler and multiple Borglet agents.
The Borg master is the central point for managing
and scheduling jobs and requests. A Borg master
and job scheduler are replicated in several copies
for high-availability purpose, however, only a single
Borg master and a single job scheduler are active at
one time. This centralized management approach re-
quires Borg masters and job schedulers (the origi-
nal and all the replicas) to be large enough to scale
out as required. The Borg job scheduler may poten-
tially manage a very high volume of jobs at simulta-
neously, this has made Borg more suitable for long-
running services and batch jobs, since that those job
profiles reduce the loads on the job scheduler. In con-
trast, Fuxi (Zhang et al., 2014) platform from Alibaba
Inc., uses a similar monolithic scheduling approach,
but with incremental communication and locality tree
mechanisms that support rapid decision making.
More contemporary systems are becoming dis-
tributed to address the scalability issue, nevertheless,
masters continue to retain their centralized manage-
ment approach. In contrast, job schedulers are be-
coming ever more decentralized in their management
decisions. This decentralized approach sometimes re-
sults in scheduling conflicts, however, the probabil-
ity of this happening is low. Examples of these sys-
tems include Google Omega (Schwarzkopf et al.,
2013) and Microsoft Apollo (Boutin et al., 2014).
Google Omega employs multiple schedulers work-
ing in parallel to speed up resource allocation and
job scheduling. Since there is no explicit commu-
nication between these schedulers, it cannot be said
that this approach improves resource allocation and
job scheduling decisions, rather it increases the num-
ber of such decisions being made per unit time. Mi-
crosoft Apollo (Boutin et al., 2014) employs a simi-
lar scheduling framework. But it is also incorporates
global knowledge that can be used by each sched-
uler to make optimistic scheduling decisions. Apollo
enables each scheduler to reason about future re-
source availability and implement a deferred correc-
tion mechanism to optimistically defer any correc-
tions until after tasks are dispatched. Identified po-
tential conflicts may not be realized in some situations
since the global knowledge is by definition imperfect.
Consequently, all other thing being equal, by delay-
ing conflict resolution to the latest possible opportu-
nity, at which time they may disappear, Apollo may
perform better than Google Omega. Google Borg,
Google Omega and Microsoft Apollo work with bare
metal servers and schedule jobs onto physical nodes.
In contrast, Kubernetes, Mesos and OpenStack at-
tempt to improve resource utilization by introducing
containerization and virtualization.
Kubernetes (Kubernetes, 2016)(Burns et al.,
2016) is another Google technology and an evolu-
tion of Google Omega. In the Kubernetes system,
schedulers cooperate in making scheduling decisions
and hence attempt to improve resource allocation.
This cooperation comes at the cost of sharing the en-
tire cluster’s status information, whereas, conflicting
scheduling decisions can be avoided, this comes at the
cost of dynamically making the distributed schedul-
ing decisions. Kubernetes is designed to work exclu-
sively with containers as a resource management tech-
nology. It improves service deployment and resource
management in a complex distributed container envi-
ronment.
Apache Mesos (Hindman et al., 2011) is another
management platform which enables multiple differ-
ent scheduling frameworks to manage the same envi-
ronment. This is achieved by employing a coordinator
service assigning resources controls to a single sched-
uler during its decision making processes. This can
potentially lead to an inefficient use of resources when
the request is lightweight and available resources are
significantly large.
OpenStack (Nova, 2016) is an open-source cloud
platform focusing on the management of a virtualiza-
tion environment. OpenStack uses a front-end API
server to receive requests and a centralized coordi-
nator service (nova-conductor) for coordinating var-
ious components (e.g., networking, image, storage,
and compute). The nova-conductor uses a scheduler
CLOSER 2017 - 7th International Conference on Cloud Computing and Services Science
116