Uniﬁed Cloud Orchestration Framework for Elastic High Performance

Computing in the Cloud

Lukasz Miroslaw, Michael Pantic and Henrik Nordborg

Institute for Energy Technology, HSR Hochschule Rapperswil, Rapperswil, Switzerland

Keywords:

Microsoft Azure, High Performance Computing, Cloud Computing, Cloud Orchestrator, Computational Fluid

Dynamics, Simulations.

Abstract:

The demand for computational power and storage in industry and academia is continuously increasing. One

of the key drivers of this demand is the increased use of numerical simulations, such as Computational Fluid

Dynamics for product development. This type of simulations generates huge amounts of data and demands

massively parallel computing power. Traditionally, this computational power is provided by clusters, which

require large investments in hardware and maintenance. Cloud computing offers more ﬂexibility at signiﬁ-

cantly lower costs but the deployment of numerical applications is time-consuming, error-prone and requires

a high level of expertise. The purpose of this paper is to demonstrate the SimplyHPC framework that au-

tomatizes the deployment of the cluster in the cloud, deploys and executes large scale and parallel numerical

simulations, and ﬁnally downloads the results and shuts down the cluster. Using this tool, we have been able

to successfully run the widely accepted solvers, namely PETSc, HPCG and ANSYS CFX, in a performant and

scalable manner on Microsoft Azure. It has been shown that the cloud computing performance is comparable

to on-premises clusters in terms of efﬁciency and scalability and should be considered as an economically

viable alternative.

1 INTRODUCTION

Numerical simulations have become an essential part

of modern R&D, as they allow companies to develop

smarter products faster. Ideally, computer simulations

can be used to test the performance of a product be-

fore the ﬁrst prototype has been built, providing valu-

able information very early in the product develop-

ment cycle. Computer simulations also allow compa-

nies to test a large number of design variation and to

use numerical optimization algorithms to improve the

performance of products.

Computational ﬂuid dynamics (or CFD for short)

represents one of the greatest challenges in terms of

computational requirements. The advances in paral-

lel computing facilitated the handling of the immense

amount of data generated during complex simula-

tions. Today, industrial-relevant simulations can only

be efﬁciently executed on parallel compute clusters

with hundreds of CPU cores connected through a fast

high-performance network such as Inﬁniband. Un-

fortunately, such a computational infrastructure repre-

sents a signiﬁcant investment and only makes sense if

the computational cluster is heavily utilized to amor-

tize the costs. This is generally not an issue for uni-

versities and research institutes, but represents a real

problem for proﬁt driven companies, who only occa-

sionally need access to signiﬁcant computing power.

An underutilized compute cluster is simply too expen-

sive to maintain.

Most modern CFD solvers support parallel pro-

cessing through the Message Passing Interface (MPI).

Recently, support for multi-core processors, GPUs,

and the Intel Xeon Phi have begun to be implemented

(Tomczak et al., 2013) (Che et al., 2015). Generally

speaking, CFD simulations are very well suited for

parallel processing. They can be easily parallelized

using domain decomposition and the computational

performance is mainly limited by memory bandwidth.

Cloud computing has become ubiquitous due to its

ﬂexibility and cost-efﬁciency. It has clear advan-

tages over traditional on-premises systems for insti-

tutions with limited budgets as no direct investments

are needed and machines can be rented on a daily,

hourly or even minute-by-minute basis. Users bene-

ﬁt from the cloud through the possibility of launching

jobs of different sizes on dynamically sized systems

and the cloud operators achieve economies of scale by

building large data centers where the resources can be

shared between different workloads.

Miroslaw, L., Pantic, M. and Nordborg, H.

Uniﬁed Cloud Orchestration Framework for Elastic High Performance Computing in the Cloud.

DOI: 10.5220/0005842402910298

In Proceedings of the International Conference on Internet of Things and Big Data (IoTBD 2016), pages 291-298

ISBN: 978-989-758-183-0

291

An increasing amount of computing power is now

hosted on cloud platforms such as Amazon Elas-

tic Compute (EC2), Google Compute Engine or Mi-

crosoft Azure and more and more software and ser-

vices are being hosted in the cloud. Many indepen-

dent software vendors started to offer cloud services

to scale scientiﬁc applications for speciﬁc industries.

Companies like Rescale, Ciespace, Ubercloud, Sabal-

core, Univa, Penguin Computing provide cloud ser-

vices for weather research, computational ﬂuid dy-

namics, structural mechanics, quantitative ﬁnance,

combustion, electromagnetics and molecular dynam-

ics (Gentzsch, 2012). They offer access to VMs with

pre-installed software or a web portal where the sci-

entiﬁc applications, such as ANSYS, Gromacs, Open-

FOAM or Gerris solvers (Popinet, 2003) can be exe-

cuted.

While this may be a satisfying solution for some, the

ability to run custom or third-party software on the

cloud infrastructure requires much more complicated

procedures. A scientiﬁc application that needs to be

deployed in the cloud usually consists of a command

line-tool that requires complex deployment steps such

as installation, conﬁguration as well as setting up sys-

tem speciﬁc services and network policies, compara-

ble to the effort of administering an on-premises clus-

ter. Access to the deployed applications from a lo-

cal machine is also a tedious procedure and requires a

signiﬁcant amount of effort. This partly explains why

this scenario is often too cumbersome for a domain

engineer or a scientist.

A simple mechanism is needed to simplify the

whole process and lower the entry barrier for cloud-

based numerical simulations. Such mechanisms exist

for Amazon EC2 or Eucalyptus but the second biggest

public cloud provider, i.e. Microsoft Azure still lacks

an easy-to-use framework to simplify cloud orchestra-

tion, and this is the reason why we targeted this cloud

provider.

In this paper, we present a uniﬁed framework,

named ”SimplyHPC”, that greatly simpliﬁes the use

of a distributed application on Microsoft Azure. The

framework combines Azure speciﬁc HPC libraries,

deployment tools, machine and MPI conﬁguration

into a common platform. Its functionality is, among

others, exposed through PowerShell commandlets

that ease the submission process while keeping the de-

sired command line environment and ﬂexibility. The

framework provides tools with the ability to elasti-

cally deploy an arbitrary number of virtual machines

dynamically, submit the packed application together

with input and conﬁguration ﬁles, execute it as a cloud

service using MPI and, once the results are ready,

download them to the local machine and stop the vir-

tual machines. The results presented here is a fol-

low up research of our recent studies (Miroslaw et al.,

2015)

Our paper focuses on the following aspects. Sec-

tion 3 describes the main components of the proposed

framework as well as the platform architecture. Sec-

tion 4 demonstrates the utility of the tool on two scien-

tiﬁc applications, namely PETSc and HPCG. In addi-

tion we present the scalability study of ANSYS CFX,

a commercial ﬂuid dynamics code for realistic, indus-

trial simulations. The paper ends with conclusions

and future plans.

2 BACKGROUND

This section examines the typical impediments in de-

ployment of HPC applications. It also brieﬂy exam-

ines related technologies and introduces the platforms

and technologies used in performance studies in the

cloud and in the on-premises cluster.

Due to the fact that the cloud providers are com-

mercial entities that compete on the market, it is very

difﬁcult to create a single orchestrator that supports

the major cloud platforms. The cloud infrastructure

changes very quickly, the new APIs, services and

tools are released frequently and addressing them in

one consistent software is very difﬁcult. For example

existing libraries such as Apache jclouds or libcloud

do not support HPC functionality based on Microsoft

technologies. This is the reason why we decided to

target the Microsoft Azure platform with our cloud

orchestrator.

Cloud orchestrators, such as the one presented in

this paper, manage the interactions and interconnec-

tions among cloud-based and on-premises units as

well as expose various automated processes and as-

sociated resources such as storage and network. They

have become a desired alternative to standard deploy-

ment procedures because of lower level of expertise

and reduced deployment time. Also their ability to

perform the vertical and horizontal scaling is funda-

mental to the adoption of the framework in HPC sce-

narios.

HPC Pack is a cloud orchestrator developed by Mi-

crosoft for monitoring, executing and running jobs in

both on-premises and in the cloud. It exposes func-

tionality that is typical for cluster management soft-

ware such as deployment of clusters with different

conﬁgurations, a scheduler that maximizes an utiliza-

tion of the cluster based on priorities, policies and

usage patterns and a batch system for submission

of multiple jobs with different amount of resources.

The framework also allows for deployment of hy-

IoTBD 2016 - International Conference on Internet of Things and Big Data

292

brid clouds, for example deploying a head node on-

premises and controlling Azure PaaS nodes or de-

ploying the head node as a IaaS VM and controling

PaaS VMs from the local machine. Although it is a

recommended framework to control the cluster in the

Azure cloud, we will show that HPC Pack is not suit-

able for high performance computing by comparing

its deployment time with SimpyHPC.

2.1 Related Work

Although cloud orchestrators play an important role,

still there are only a few easy to use, out-of-the box

platforms that simplify deployment, monitoring and

accessing scientiﬁc applications in the cloud. The

best situation is in Amazon EC2 where frameworks

to simplify the process of orchestration of speciﬁc

scientiﬁc applications exist. Linh Manh Pham et.

al presented a distributed application Roboconf or-

chestrator for multi-cloud platforms and domain spe-

ciﬁc language to describe applications and their ex-

ecution environment (Pham et al., 2015). Wong and

Goscinski propose a uniﬁed framework composed of

a series of scripts for running gene sequencing anal-

ysis jobs through a web portal (Wong and Goscin-

ski, 2013) on Amazon EC2 (IaaS) infrastructure. For

this purpose they designed a HPC service software

library for accessing high level HPC resources from

the IaaS cloud and demonstrate their framework in

running mpiBLAS . Balgacem et al. measured the

multiphase ﬂow on a hybrid platform combined with

Amazon EC2 and a private cluster (Ben Belgacem

and Chopard, 2015). Marozzo et. al (Marozzo Fab-

rizio, 2013) present a web-based framework for run-

ning pre-deﬁned workﬂows composed of web ser-

vices in Microsoft Azure. The authors demonstrate

the framework in data mining applications by deploy-

ing an execution workﬂow running a single classiﬁca-

tion algorithm and measure strong scalability in clas-

sifying different data sets that were previously parti-

tioned. Last but not least, it is important to mention

the recently released Azure Batch service aimed at

running cloud-scale jobs. However, there is a signiﬁ-

cant amount of programming skills needed to deploy

an application.

2.2 Microsoft Azure

Microsoft Azure is a cloud computing platform that

provides, among many other services, virtual ma-

chines (VMs) in both IaaS and Paas models. PaaS

machines, also called Worker roles, are mainly tar-

geted at developers seeking to execute custom soft-

ware in a high-availability or distributed fashion. In

this model, software running on worker roles can be

automatically scaled and copied to many nodes.

A set of worker roles constitutes a Cloud Ser-

vice on Microsoft Azure. Instead of conﬁguring all

VMs manually, the user uploads a packaged applica-

tion together with conﬁguration ﬁles for a cloud ser-

vice. This ”deployment package” is then installed on

fresh VMs automatically and in parallel. This pro-

cedure usually takes a few minutes and is attractive

in terms of usage costs, as these non-persistent VMs

can be created when needed and deleted immediately

after usage. Regular VMs (IaaS) are bare machines

deployed with a speciﬁc OS that cannot be scaled

manually or programatically within the Azure Cloud

but can also be scaled and managed through external

frameworks such as Ansible, Chef or Puppet.

In our framework we decided to take advantage

of the cloud service (PaaS) to simplify the orchestra-

tion procedure and use Blob and Table storage enti-

ties for data management. Because the storage can be

locally or geo-redundant, a high availability and per-

sistence of the results from the simulations could be

easily guaranteed.

2.3 Numerical Software

Sparse linear algebra inspired the use cases presented

in this study. Large sparse matrices are generated

during the discretization of partial differential equa-

tions and often appear in scientiﬁc or engineering ap-

plications. We have selected two scientiﬁc software

tools, namely PETSc and High Performance Conju-

gate Gradient (HPCG) Benchmark to demonstrate the

utility of our framework, to measure the performance

on both Microsoft Azure and the local cluster.

In addition to the numerical tools, we have se-

lected ANSYS CFX, a high performance ﬂuid dy-

namic software that is well recognized in CFD com-

munity and has been successfully applied to different

ﬂuid ﬂow problems.

3 ARCHITECTURE

In this section, the three main architectural aspects

of SimplyHPC are presented. Firstly, we show the

system components visible to a single node such

as cluster settings, communication setup and Azure-

speciﬁc elements. Secondly, we discuss the struc-

ture of the SimplyHPC framework at the cluster level

and explain how the software automatizes the setup

of nodes and clusters using the given system architec-

ture. Thirdly, we brieﬂy present the software archi-

tecture.

Uniﬁed Cloud Orchestration Framework for Elastic High Performance Computing in the Cloud

293

3.1 Node Architecture

As implied in Sec. 2, there are different type of ma-

chines available on Microsoft Azure. Here, Worker

Roles are used, as they have various advantages over

traditional virtual machines, especially in terms of

manageability and scalability. Worker roles are set

up using a Deployment Package. This package con-

tains a full description of the virtual machine, includ-

ing machine-speciﬁc conﬁguration such as local stor-

age requirements, network settings, security settings

as well as software speciﬁc settings. New Worker

roles are then automatically set-up according to this

deployment package. This is done by the server

management software, subsequently called the Azure

Cloud Controller that exposes these APIs and is pro-

vided by Microsoft as the main interface to the Azure

Cloud.

In Fig. 1, the schematic architecture of a single

node is shown (one worker role equals to one node of

a cloud cluster).

Azure Framework

SimplyHPC Role DLL

MPIWrapper

SMPD

MSMPI

Deployment

Infiniband/Ethernet

Internet / Azure Network

Loc.

Stor

Azure

Stor.

MPI Context

Figure 1: Schematic view of a single compute node on

Azure.

It is important to note that this node architecture

applies to all nodes in a cloud cluster, and is automat-

ically conﬁgured using the aforementioned deploy-

ment package.

3.2 Cluster Architecture

Fig. 2 visualizes all relevant components of a de-

ployed HPC cluster that has been set up using the

SimplyHPC framework. A cloud service (dotted rect-

angle in ﬁgure) represents a bundle of multiple nodes

using the same deployment template. Using the logi-

cal entity of a cloud service, nodes can be easily con-

ﬁgured in terms of their number, size and the virtual

network they belong to. When using A8/A9 nodes

for scientiﬁc computing, the inter-node network (de-

picted as the upper arrow in Fig. 2) is realized with In-

ﬁniband. In this mode all inter-node storage and MPI

requests are transparently redirected to this very low

latency, high bandwidth interconnect. Such conﬁgu-

ration is crucial for performance with most scientiﬁc

software that take advantage of concurrency.

The availability of fast inter-node connections

strongly inﬂuences the choice of storage for different

tasks.

Instance 0

Infiniband/Ethernet

Internet / Azure Network

Local

Storage

Azure Table

Storage

Azure Blob

Storage

Azure

Virtual Drive

Instance 1

Local

Storage

Instance n

Local

Storage

Job Config

Job Data /

Results

Software

Proc

Cloud Service

Figure 2: Schematic view of a SimplyHPC Cluster on

Azure.

3.3 Software Architecture

SimplyHPC has been written in .NET and uses Azure

mechanisms that automate the process of setting up

the cluster described before. There are multiple lev-

els at which the SimplyHPC interacts with Microsoft

Azure. It is linked to the Microsoft Azure SDK and

uses parts of it to automatize the compilation of the

deployment packages. On a run-time level, it inter-

acts directly with the management interface of Mi-

crosoft Azure to create cloud services, upload deploy-

ment packages, monitor machine status, etc.

We expose two different front ends to the user,

namely PowerShell commandlets and a high-level

API with a facade that hides all the complexity of or-

chestrating the cloud. PowerShell is a object-oriented

command line environment used to script and autom-

atize common tasks.

PowerShell commandlets are building blocks that

allow administrators to prepare custom scripts. Its

interface provides a number of useful commandlets

needed to deploy a scientiﬁc application in the cloud,

execute it, download the results back to the user and

destroy the cluster. The users of the software have

also the possibility to adapt the code to their needs

since the software is freely available at Github (see

the last Section for more details).

IoTBD 2016 - International Conference on Internet of Things and Big Data

294

4 RESULTS

In the following sub-sections we compare the set-up

time of the cluster created with SimplyHPC with HPC

Pack, since this is a standard tool for cluster deploy-

ment that is currently supported and recommended by

Microsoft. We also measure the performance of cloud

clusters on Microsoft Azure and regular on-premises

cluster using test cases relevant for CFD applications.

Although these measurements are not representative

due to different hardware conﬁgurations on both sys-

tems, they still provide insights on the performance

of numerical code in the cloud. These examples serve

also as an indication that running different kinds of

simulation with SimplyHPC is feasible.

All tests were performed either on Microsoft

Azure (cloud cluster) or on the on-premises cluster.

On-premises cluster is a privately owned and main-

tained system at Microsoft Innovation Center at the

HSR Hochschule Rapperswil (HSR), Switzerland. It

is composed of 32 nodes with 48GB RAM and two In-

tel Xeon E5645 6 cores each, the nodes are connected

with 40Gbit Inﬁniband Network. On each node, Win-

dows Server 2012 R2 with HPC Pack 2012 R2 is

installed. On Microsoft Azure two VM types were

used: A8 nodes with 8 core Intel Xeon E5-2670 and

56 GB RAM and A4 nodes with 8 core CPU, 14 GB

RAM and GigE interconnect. High-bandwidth In-

ﬁniband interconnect between the nodes was possible

only for A8 and A9 nodes. For performance studies

of PETSc and HPCG we used A4 and A8 nodes while

for ANSYS performance tests we used A8 nodes only.

For more detailed information about Microsoft Azure

node types see the ofﬁcial speciﬁcation.

4.1 Set-up time

The deployment time of a cluster in the cloud is cru-

cial when using dynamically allocated systems and

running computationally intensive tasks composed of

a single job or a small series of jobs. In such scenar-

ios the cluster is created to perform the computational

tasks and is destroyed afterwards. We do not consider

the set-up time on on-premises systems and focus on

the cloud-based scenario only.

We compared the time needed for setting up a

cloud cluster consisting of A8 nodes with SimplyHPC

and Microsoft HPC Pack. This time should be of the

same order regardless of the node size, because the

deployment processes are similar. HPC Pack provides

precise statistics on the deployment procedure that in-

cludes also creating a storage account, a virtual net-

work, a cloud service, a domain controller as well as

deployment and conﬁguration time for the head node

and compute nodes. This is also the reason why the

deployment procedure takes more than an hour. Since

in SimplyHPC most of these processes are omitted,

as expected the time needed to deploy the head node

and compute nodes was much shorter, e.g. we mea-

sured the time needed to deploy a new cloud service

with a single job. Since HPC Pack neither provides

automatic job submission nor it retrieves the result

we have not taken these processes into account in our

measurements.

00:00:00

00:30:00

01:00:00

01:30:00

02:00:00

02:30:00

03:00:00

8 16 32 8 16 32

Time [hh:mm:ss]

Number of A8 nodes

Set-up time Cloud Cluster w A8 nodes

Preparation

Deployment

Prepare Cloud Service

Create domain controller

Create nodes

HPCPackSimplyHPC

Figure 3: Deployment time of cluster composed of differ-

ent number of A8 nodes with Microsoft HPC pack (right)

and SimplyHPC framework (left). Time is provided in

hh:mm:ss format.

4.2 Sparse Matrix Solver Performance

The objective of the second experiment was to com-

pare the performance of a cloud cluster on Microsoft

Azure with a traditional on-premises cluster using

common distributed parallel sparse matrix solvers

such as HPCG (High Performance Conjugate Gradi-

ents) Benchmark as a standardized performance test

and PETSc solver being applied to two different real

world problems. On Microsoft Azure, the HPCG re-

sults scale nearly linearly with the number of cores,

with a maximum speed-up of approx. 36 with 64

cores (compared to 1 core) for A8 nodes. With A4

nodes, the scaling is not as desirable, probably also

due to the lack of the low-latency Inﬁniband intercon-

nect. The dynamic cloud-based cluster easily outper-

formed the on-premises cluster (see Fig. 4).

PETSc was run with two different matrices: ruep

and nord. The ruep matrix is a 1162315 × 1162315

sparse real symmetric matrix with a total of 34.28 ∗

non-zero values. The second matrix, nord, was

generated from a thermal transient heat conduction

simulation on a 3-dimensional Cartesian mesh of

moderate size. It is a 398718 × 398718 sparse real

symmetric matrix with a total of 1.1 ∗ 10

non-zero

values.

In Fig. 4 the measurements for the PETSc trials

are shown. From the diagram it is clear that the per-

Uniﬁed Cloud Orchestration Framework for Elastic High Performance Computing in the Cloud

295

formance on Azure nodes is close to ideal, while on

the on-premises cluster is much worse, mostly be-

cause the memory bandwidth is insufﬁcient for this

core conﬁguration and code. Because of the huge dif-

ference in the performance between A8 and A4 nodes,

only cloud clusters with A8 nodes were taken into ac-

count for subsequent tests with ANSYS CFX on Mi-

crosoft Azure.

1 2 4 8 16 32 64

HPCG Result [GFLOPs]

Total number of cores

GFLOPs HPCG

Azure A8

Azure A4

Azure A8 (ideal scaling)

On-Premises Cluster

1 2 4 8 16 32 64

Throughput PETSc solver [GFLOPs]

Total number of cores

GFLOPs PETSc Solver Matrix 'ruep'

Azure A8

Azure A4

Azure A8 (ideal scaling)

On-Premises Cluster

1 2 4 8 16 32 64

Throughput PETSc solver [GFLOPs]

Total number of cores

GFLOPs PETSc Solver Matrix 'nord'

Azure A8

Azure A4

Azure A8 (ideal scaling)

On-Premises Cluster

Figure 4: Performance in GFLOPs of HPCG (up) and

PETSc (middle and bottom) solvers with different node

sizes on Microsoft Azure and on the on-premises cluster.

4.3 ANSYS CFX Performance Study

We used ANSYS CFX as an example of a widely used

tool for large-scale physical simulations. A set of in-

dependent simulations of a heat transfer simulation in

a heat exchange system (subsequently called the pipe

case).

In all the cases we modeled the steady state ﬂows

with residual targets of 1e − 06 and maximal number

of iterations of 200.

4.3.1 Strong and Weak Scaling of the Heat

Exchange Simulation

For the strong scaling tests, the pipe01–pipe08 prob-

lems were solved on different cluster conﬁgurations.

The size of each pipe case was doubled and ranged

from ca. 200 thousands elements to ca. 16 million

elements. Note that the largest problem, pipe08, pro-

duced a memory overﬂow in a one core conﬁgura-

tion and therefore the pipe08 strong scaling efﬁciency

could not be computed. For the other cases, the strong

scaling efﬁciency has been calculated according to

formulas proposed by (Kaminsky, 2015). For the sec-

ond largest problem, pipe07, the sequential run-time

has been extrapolated, as 180 of 200 iterations could

be calculated before the memory overﬂow and the in-

crease of calculation time per iteration was approxi-

mately a constant fraction.

We could observe a good scaling for ANSYS CFX

(see Fig. 5). The run-time is comparable, although

the strong scaling efﬁciency on the cloud cluster is

much better - this might be due to memory bandwidth

limitations on the on-premises cluster (scaling from 1

to 6 cores greatly decreased efﬁciency). In the case

of large cloud clusters (e.g. 128 cores on 32 nodes)

much larger problems are needed to fully utilize their

capabilities. As it can be seen from the ﬁgure the

strong scaling efﬁciency for small problems rapidly

decays. Even for larger problems the efﬁciency is

worse with 256 cores, partly due to communication

overhead.

A test with 64 with a total of 512 cores resulted in

a MPI Broadcast timeout which might be due to the

inefﬁcient use of MPI collective operations or subop-

timal broadcast algorithms (also reﬂected in the mea-

sured communication overhead). A hybrid approach

(multi-threading together with MPI) and/or MPI tun-

ing could circumvent such problems.

In order to measure weak scaling and weak scal-

ing efﬁciency the problem size per processor core is

kept constant while increasing the number of proces-

sor cores. The weak scaling efﬁciency is calculated

according to formulas described in (Kaminsky, 2015).

In Fig. 6 the weak scaling measured on cloud clus-

ters and the on-premises cluster is visualized. For

ideal weak scaling a horizontal curve is observed, as

the increase of problem data is proportional to the in-

crease of cores. Here it is also evident that for large

problems both clusters scale well.

IoTBD 2016 - International Conference on Internet of Things and Big Data

296

Strong Scaling of Ansys CFX on Azure A8 nodes

1E1

1E2

1E3

1E4

1E5

1 8 16 32 64 128 256

Calculation time [s]

Total number of cores

Runtime

0.0

0.2

0.4

0.6

0.8

1.0

1 8 16 32 64 128 256

Eﬃciency

Total number of cores

Eﬃciency

pipe01

pipe02

pipe06

pipe04

pipe05

pipe07

pipe08

Strong Scaling of Ansys CFX on On-Premises Cluster nodes

1E1

1E2

1E3

1E4

1E5

1 6 12 24 60 256

Calculation time [s]

Total number of cores

Runtime

0.0

0.2

0.4

0.6

0.8

1.0

1 6 12 24 60 256

Eﬃciency

Total number of cores

Eﬃciency

pipe01

pipe02

pipe06

pipe04

pipe05

pipe07

pipe08

Figure 5: Run-time and strong scaling efﬁciency of ANSYS CFX for pipe01–pipe08 in the cloud (up) and on-premises cluster

(bottom).

Weak Scaling of Ansys CFX on Azure A8 nodes

1E1

1E2

1E3

1E4

1 2 4 8 16 32 64 128 256

Calculation time [s]

Total number of cores

Runtime

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1 2 4 8 16 32 64 128 256

Efficiency

Total number of cores

Efficiency

Elements per Processor

1.48E+04

2.98E+04

6.13E+04

1.26E+05

2.50E+05

5.01E+05

Weak Scaling of Ansys CFX on On-Premises Cluster nodes

1E1

1E2

1E3

1E4

1 6 12 24 48 256

Calculation time [s]

Total number of cores

Runtime

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1 6 12 24 48 256

Eﬃciency

Total number of cores

Eﬃciency

Elements per Processor

3.72E+04

7.95E+04

1.68E+05

3.35E+05

6.65E+05

Figure 6: Run-time and weak scaling efﬁciency of ANSYS CFX in the cloud (up) and on-premises cluster (bottom).

Uniﬁed Cloud Orchestration Framework for Elastic High Performance Computing in the Cloud

297

5 CONCLUSION

We presented the SimplyHPC framework, a novel dis-

tributed framework for Microsoft Azure. The frame-

work is composed of a light–weight API and a set

of commandlets on top of that. The modules auto-

mate the complex deployment procedure that is nec-

essary to run third-party applications from the local

machine. Specially designed mechanism keeps track

of running jobs in the cluster and a special service

in the head node regularly checks the job status and

uploads the results to the blob storage as soon as the

job ﬁnished successfully. Since table and blob storage

are distributed and mirrored, a high-availability and

persistence of the job data is guaranteed. In contrary

to HPC Pack from Microsoft, all unnecessary system

components and services have been omitted and as a

result the time needed to deploy the cluster has been

reduced from more than an hour to a few minutes. We

have demonstrated that the framework supports the

deployment of both commercial and the open–source

applications. Although we focused on ﬂuid ﬂow and

numerical simulations the software is well suited to

any research domain that uses software with a com-

mand line interface and supports MPI–based paral-

lelization. We demonstrated that SimplyHPC can be

used also in a scenario where both input and output

data are usually massive. We also showed that for

extreme scaling ﬁne–tuning of the hardware conﬁgu-

ration and MPI settings may be still necessary.

The paper clearly shows that scientists and engi-

neers applications can beneﬁt from the cloud if the

calculations are too time–consuming for a local ma-

chine or a private cluster and that they can advan-

tage of the competitive prices and the most recent

hardware. Today, it is possible to deploy a 64–cores

cluster for less than $30 and this price is likely to

be lower when the reader reads this paper. Further,

our scalability tests provide insights into the expected

performance on Microsoft Azure up to 256 cores.

SimplyHPC software is freely available at GitHub at

https://github.com/vbaros/SimplyHPC.

ACKNOWLEDGEMENTS

The framework has been developed in Microsoft In-

novation Center Rapperswil at HSR Hochschule Rap-

perswil, Switzerland. The scalability tests have been

performed in Microsoft Azure as a part of Microsoft

Research Grant No. Azdem187T 64934Y . We would

like also express our gratitude to Adrian Rohner, Ro-

man Fuchs, Rita Rueppel and Vladimir Baros from

HSR Hochschule Rapperswil for support and fruitful

discussions.

REFERENCES

Ben Belgacem, M. and Chopard, B. (2015). A hybrid

HPC/cloud distributed infrastructure: Coupling EC2

cloud resources with HPC clusters to run large tightly

coupled multiscale applications. Future Generation

Computer Systems, 42:11–21.

Che, Y., Xu, C., Fang, J., Wang, Y., and Wang, Z. (2015).

Realistic performance characterization of cfd applica-

tions on intel many integrated core architecture. The

Computer Journal.

Gentzsch, W. (2012). How cost efﬁcient is hpc in the cloud?

Technical report, Ubercloud.

Kaminsky, A. (2015). Solving the World’s Toughest Compu-

tational Problems with Parallel Computing. Creative

Commons Attribution.

Marozzo Fabrizio, Talia Dominico, T. P. (2013). A cloud

framework for big data analytics Workﬂows on Azure.

Advances in Parallel Computing. IOS Press.

Miroslaw, L., Baros, V., Pantic, M., and Nordborg, H.

(2015). Uniﬁed cloud orchestration framework for

elastic high performance computing on microsoft

azure. In Summary of Proceedings, NAFEMS World

Congress, page 216. NAFEMS Ltd.

Pham, L. M., Tchana, A., Donsez, D., de Palma, N., Zur-

czak, V., and Gibello, P.-Y. (2015). Roboconf: A

hybrid cloud orchestrator to deploy complex applica-

tions. In Cloud Computing (CLOUD), 2015 IEEE 8th

International Conference on, pages 365–372.

Popinet, S. (2003). Gerris: a tree-based adaptive solver

for the incompressible Euler equations in complex

geometries. Journal of Computational Physics,

190(2):572–600.

Tomczak, T., Zadarnowska, K., Koza, Z., Matyka, M., and

Miroslaw, L. (2013). Acceleration of iterative navier-

stokes solvers on graphics processing units. Inter-

national Journal of Computational Fluid Dynamics,

27(4-5):201–209.

Wong, A. K. and Goscinski, A. M. (2013). A uniﬁed frame-

work for the deployment, exposure and access of HPC

applications as services in clouds. Future Generation

Computer Systems, 29(6):1333–1344.

IoTBD 2016 - International Conference on Internet of Things and Big Data

298