A Containerized Tool to Deploy Scientiﬁc Applications over SoC-based

Systems: The Case of Meteorological Forecasting with WRF

Luiz Angelo Steffenel

, Andrea Schwertner Char

and Bruno da Silva Alves

CReSTIC Laboratory, Universit

e de Reims Champagne-Ardenne, Reims, France

LSC Laboratory, Universidade Federal de Santa Maria, Santa Maria, Brazil

Keywords:

Application Containers, High Performance Computing, Systems-on-a-Chip.

Abstract:

Container-based virtualization represents a ﬂexible and scalable solution for HPC environments, allowing a

simple and efﬁcient management of scientiﬁc applications. Recently, Systems-on-a-Chip (SoC) have emerged

as an alternative to traditional HPC clusters, with a good computing power and low costs. In this paper, we

present how we developed a container-based solution for SoC clusters, and study the performance of WRF

(Weather Research and Forecasting) in such environments. The results demonstrate that although the peak

performance of SoC clusters is still limited, these environments are more than suitable to scientiﬁc application

that have relaxed QoS constraints.

1 INTRODUCTION

High Performance Computing (HPC) is a generic

term to applications that are computationally inten-

sive or data intensive in nature (Somasundaram and

Govindarajan, 2014). While most HPC platforms rely

on dedicated and expensive infrastructures such as

clusters and computational grids, other technologies

like cloud computing and systems-on-a-chip (SoC)

are becoming interesting alternatives for HPC.

Indeed, cloud computing has brought a non-

negligible ﬂexibility and scalability for most users

(Marathe et al., 2014), and a smaller maintenance

cost. One drawback, however, is that the widespread

of cloud computing forced a paradigm shift as appli-

cations are no longer executed directly on bare-metal

but instead must be executed on top of a virtualization

layer. While the performance overhead of virtualiza-

tion is being rapidly reduced, it is still perceptible and

may compromise some applications (Younge et al.,

2011). Another inconvenience of cloud computing

is that not all applications are prone to a distant exe-

cution. Latency-sensitive applications or applications

executed in remote locations with limited Internet ac-

cess may be penalized by a remote execution, as well

as applications relying on sensitive data that cannot

be transmitted to a third-part facility.

Systems-on-a-chip (SoC), on the other hand, rep-

resent a rupture on the traditional HPC infrastructure

as SoC encapsulate CPU, GPU, RAM memory and

other components at the same chip (Wolf et al., 2008).

Most of the times, the SoC technology is used as a

way to reduce the cost of single-board computers, like

Raspberry Pi, ODroid or Banana Pi. These systems

are currently used for a large range of applications,

from Computer Science teaching (Ali et al., 2013)

to Internet of Things (Molano et al., 2015). Being

mostly based on ARM processors, SoCs also beneﬁt

from the improvements to this family of processors.

Indeed, if the choice for ARM processors was initially

driven by energy and cost requirement, nowadays this

family of processors presents several improvements

that allow the construction of computing infrastruc-

tures with a good computing power and a cost way

inferior to traditional HPC platforms (Weloli et al.,

2017; Cox et al., 2014; Montella et al., 2014).

SoC also have an active role in Fog and Edge com-

puting (Steffenel and Kirsch-Pinheiro, 2015), bring-

ing computation closer to the user and therefore of-

fering proximity services that otherwise would be en-

tirely deployed on a distant infrastructure. Further-

more, a SoC cluster can substitute a traditional HPC

cluster in some situations, as SoC are relatively inex-

pensive and have low maintenance and environmental

requirements (cooling, etc.). Of course, this is only

valid as long as the SoC infrastructure provides sufﬁ-

cient Quality of Service (QoS) to the ﬁnal users.

In this context, the association of SoC and vir-

tualization represents also an interesting solution to

deploy scientiﬁc applications for educational pur-

Steffenel, L., Charão, A. and Alves, B.

A Containerized Tool to Deploy Scientiﬁc Applications over SoC-based Systems: The Case of Meteorological Forecasting with WRF.

DOI: 10.5220/0007799705610568

In Proceedings of the 9th International Conference on Cloud Computing and Services Science (CLOSER 2019), pages 561-568

ISBN: 978-989-758-365-0

561

poses (Alvarez et al., 2018). Indeed, if virtualization

(and especially container-based virtualization) con-

tributes to simplify the administrative tasks related to

the installation and maintenance of scientiﬁc applica-

tions, it also enables a rich experimental learning for

students, which can test different software and per-

form hands-on exercises without having to struggle

with compilers, operating systems, and DevOps tasks.

Furthermore, by focusing on SoC, we try to minimize

the material requirements to execute an application,

enabling the deployment of applications on personal

computers, classrooms, dedicated infrastructures or

even the cloud, seamlessly.

This work is structured as follows: Section 2 re-

views some elements of virtualization, while Section

3 presents how HPC applications can be challeng-

ing in a Docker environment, specially when they

are based on the Message Passing Interface (MPI)

standard. Section 4 introduces the WRF forecasting

model and the adaptions we made to develop WRF

Docker images for SoCs. Section 5 presents some

benchmarking results obtained and ﬁnally Section 6

presents the conclusions obtained from this study and

our plans for future works.

2 BACKGROUND

The development of OS-level virtualization is increas-

ingly popular. This virtualization approach relies

on OS facilities that partition the physical machine

resources, creating multiple isolated user-space in-

stances (containers) on top of a single host kernel.

Another advantage of such container-based virtual-

ization approach is that there is no execution over-

head, as OS-level virtualization does not need a hy-

pervisor (Felter et al., 2014).

One of the most popular container solutions is

Docker

. Docker allows the creation of personal-

ized images that can be used as a base to the deploy-

ment of many concurrent containers. While the ini-

tial releases of Docker made use of LXC as execution

driver, it eventually implemented its own execution

driver (Morabito et al., 2015). Docker also provides

a registry-based service named Docker Hub

that al-

lows users to share their images, simplifying the de-

ployment of virtual images. Also, Docker provides a

basic orchestrator service called Docker Swarm that

enables the deployment of a cluster of docker nodes.

While Docker Swarm is not as rich as other orchestra-

tors like Kubernetes (Burns et al., 2016), it is simple

https://www.docker.com/

https://hub.docker.com/

to use, and Swarm services can be easily adapted to

operate under Kubernetes.

If traditionally the HPC community was reluctant

to virtualization because of the performance penal-

ties it could incur, the dissemination of container vir-

tualization is changing this view. More and more

HPC centers favor the use of containers to simplify

the resources management and to guarantee compat-

ibility and reproducibility for the users’ applications

(Ruiz et al., 2015). For example, the NVidia DGX

servers

, dedicated to Deep Learning and Artiﬁcial

Intelligence applications, use Docker containers to

deploy the user’s applications.

Although Docker was initially developed for x86

platforms, its adaption to other processor architec-

tures like ARM started around 2014, with an ini-

tial adaption made by Hypriot

for Raspberry Pi ma-

chines. More recently, Docker started to ofﬁcially

support ARM, and several base images on Docker

Hub are now published with both x86 and ARM ver-

sions.

3 HPC ON DOCKER:

LEVERAGING MPI

When considering large-scale HPC applications, they

mostly rely on MPI for data exchange and task co-

ordination across a cluster, grid or cloud. In spite of

recent advances in its speciﬁcation, the deployment of

an MPI application can be quite rigid as it requires a

well-known execution environment. Indeed, the start-

ing point for an MPI cluster is the deﬁnition of a list of

participating nodes (often known as the hostfile),

which imposes a previous knowledge of the comput-

ing environment.

Deploying an MPI cluster over containers is a

challenging task as the overlay network on Docker is

designed to perform load balancing, not to address

speciﬁc nodes as in the case of MPI. Most works

that propose Docker images for MPI fail to develop

a self-content solution, requiring manual or external

manipulation of MPI elements to deploy applications.

Indeed, (d. Bayser and Cerqueira, 2017) automates

the deployment of the MPI application over a Docker

Swarm cluster but requires the user to provide the list

of available nodes (and cores) as a command-line pa-

rameter.

The external management of containers is fre-

https://www.nvidia.com/content/dam/en-zz/

Solutions/Data-Center/dgx-1/dgx-1-rhel-centos-datasheet-

update-r2.pdf

https://blog.hypriot.com/

CLOSER 2019 - 9th International Conference on Cloud Computing and Services Science

562

quently cited as a possible solution for accommodat-

ing MPI over containers. (Yong et al., 2018) brieﬂy

describes two architectural arrangements that could

be used with Docker, all relying on external automa-

tion with scripts and SSH connections to the con-

tainers. This is indeed the case of (Higgins et al.,

2015), where the container orchestration is replaced

by a combination of a resource manager (PBS) and a

set of scripts that deploy individual container images

then set them together. A similar approach is used by

(Azab, 2017), who uses Slurm as orchestrator.

Another example of “external management” is the

case of Singularity (Kurtzer et al., 2017), a container

manager speciﬁcally designed for the HPC commu-

nity although compatible with Docker images. Sin-

gularity developed a speciﬁc solution for MPI deploy-

ments, where an external tool deploys and set up the

MPI hostﬁle, as well as copying the required appli-

cation and data to the containers. In (Chung et al.,

2016), MPI is not even part of the container but is

mounted from the host OS, making their solution to-

tally dependent on the execution platform.

Finally, (Nguyen and Bein, 2017), propose a

generic service for the deployment of MPI applica-

tions on Docker in one machine or in a cluster, with

Docker Swarm. Based on the Alpine Linux distribu-

tion, this platform automatizes most of the deploy-

ment of the Docker Swarm service, and the list of

working nodes for the hostﬁle is obtained through the

surveillance of active connections (using netstat).

The choice of using netstat proved to be too unsta-

ble, and we were unable to make it work properly on

a SoC.

As the existing solutions either require too much

manual intervention or are not reliable enough, we de-

cided to develop our own solutions to deploy an MPI

on a Docker Swarm cluster made of SoC. Therefore,

using the work from (Nguyen and Bein, 2017) as a

starting point, we tried to automate the deployment of

MPI as follows.

3.1 Hostﬁle

As explained before, most works proposing MPI over

Docker delegate the task of deﬁning the hostﬁle to the

users. The only exception is the work from (Nguyen

and Bein, 2017), who present an automated process

that unfortunately does not work reliably enough.

The main reason for such difﬁculty is the fact

that Docker presents two different execution modes

that are quite dissimilar: in the “individual” mode,

a container instance is launched as a standalone ap-

plication, requiring no additional interconnections to

other instances (although this is possible). In the “ser-

vice” mode, different instances are bound together by

a routing mesh and a naming service whose main pur-

pose is to load balance messages to a given service

and to easily redirect messages in case of failures.

As illustrated by Figure 1, different instances can be

addresses by the same name (my-web, in the exam-

ple), simplifying the development of application that

do not need any more to keep a trace of the servers’

IP addresses.

Figure 1: Docker routing mesh.

Unfortunately, the MPI hostﬁle requires the list of

the servers. In both cases, there is no simple way to

address a list of nodes as in a regular cluster, where

machines are often named according to a deﬁned pat-

tern (e.g. node-X ). Instead, we need a discovery

service to ﬁnd out which IP addresses correspond to

the instances in our network. Contrarily to (Nguyen

and Bein, 2017), however, we decided to exploit the

own naming service of Docker by making “low level”

DNS calls using dig. By making speciﬁc queries for

the name of a service, we obtain a list of the corre-

sponding IP addresses of the instances. As the host-

ﬁle also indicates the number of process (or slots) a

node can run simultaneously, we call the nproc appli-

cation on each machine, obtaining therefore the num-

ber of available processing cores. This simple “hack”

is presented below, where we obtain the list of all

worker nodes (i.e., instances of the “worker” service

on Swarm).

iplists=‘dig +short tasks.workers A‘

for i in $iplists; do

np=‘ssh $i "nproc --all"‘

echo "$i:$np" >> hostfile

done

3.2 Roles and External Access

In addition to the list of nodes, MPI strongly relies

on the nodes’ rank. For instance, the most important

node in an MPI execution is the one tagged with rank

0, who usually starts the MPI community and gathers

the results at the end. As this “master” node has some

more responsibilities than a regular “worker” node, it

is important to allow users to access this node using

A Containerized Tool to Deploy Scientiﬁc Applications over SoC-based Systems: The Case of Meteorological Forecasting with WRF

563

SSH, for example. Indeed, several applications re-

quire an access to a frontend node where the user can

execute preprocessing steps, setup the application pa-

rameters or simply verify the code is running before

deploying it over the cluster. Therefore, we looked for

ways to launch the master together with the workers.

As we need to publish the master service’s port di-

rectly from the Swarm node, this node cannot simply

use the ingress routing network, but needs to be exe-

cuted under the special global deployment mode.

Additional attributes ensure that the master will

be easily located (on the manager node from the

Swarm cluster), simplifying the access (using SSH)

and also guaranteeing that at least this node mounts

correctly all external volumes required for the appli-

cation. Listing 1 presents the main elements of the

docker-compose.yaml ﬁle used to deﬁne and deploy

the Swarm service for our application.

Listing 1: Excerpt of the Swarm Service deﬁnition.

versi o n : "3 .3 "

ser vic es :

ma ste r :

im age : XXX XX

de plo y :

mo de :

gl oba l

pla cem e nt :

con s tra i nts :

- n od e . role == man age r

po rts :

- pu b lis h ed : 2 02 2

ta rge t : 22

mo de : host

vo l um e s :

- "./ W P S_G EOG :/ WP S _GE OG "

- "./ w r fin put :/ wr f inp ut "

- "./ w r fou t pu t :/ wrf o utp ut "

net wor ks :

- w rfn et

wo r ke r s :

im age : XXX XX

de plo y :

rep lic as : 2

pla cem e nt :

pre f ere n ces :

- s pre ad : n od e . l ab els . d ata c ent er

vo l um e s :

- "./ W P S_G EOG :/ WP S _GE OG "

- "./ w r fin put :/ wr f inp ut "

- "./ w r fou t pu t :/ wrf o utp ut "

net wor ks :

- w rfn et

net wor ks :

wr fne t :

dr ive r : ove rla y

att a ch a ble : t ru e

4 THE WRF MODEL

In order to experiment our virtualized cluster plat-

form, we adapted the Weather Research and Forecast-

ing (WRF) model (Skamarock et al., 2008), a well-

known numerical weather prediction model. WRF

has over 1.5 million lines in C and Fortran, as well as

many dependencies on external software packages for

input/output (I/O), parallel communications, and data

compression, that are not trivial to satisfy. Hence,

compilation and execution can be challenging for be-

ginners or for users that do not have administration

rights on their computing infrastructures.

Running the model can also be difﬁcult for new

users. WRF is composed by several steps to gen-

erate computational grids, import initialization data,

produce initial and boundary conditions, and run the

model (Hacker et al., 2017).

The typical workﬂow to execute the WRF model

(Figure 2) is made of 5 phases, as indicated below.

These steps do not include the additional access to ex-

ternal data sources, neither the analysis/visualization

of the results.

1. Geogrid - creates terrestrial data from static geo-

graphic data

2. Ungrib - unpacks GRIB meteorological data ob-

tained from an external source and packs it into

an intermediate ﬁle format

3. Metgrid - interpolates the meteorological data

horizontally onto the model domain

4. Real - vertically interpolates the data onto the

model coordinates, creates boundary and initial

condition ﬁles, and performs consistency checks

5. WRF - generates the model forecast

Figure 2: WRF workﬂow.

The three ﬁrst steps are part of the WRF Prepro-

cessing System (WPS), that is conﬁgured and com-

piled separately from the WRF Model. The WPS

conﬁguration allows two execution modes: serial or

dmpar (distributed memory parallelism through MPI).

In the case of the WRF Model conﬁguration, four

modes are proposed: serial, smpar (shared mem-

ory parallelism), dmpar (distributed memory paral-

lelism) and sm+dmpar. The smpar option depends

CLOSER 2019 - 9th International Conference on Cloud Computing and Services Science

564

on OpenMP, while the dmpar lies on MPI. The last

option (sm+dmpar) combines OpenMP and MPI, but

several works point out that the pure dmpar usu-

ally outperforms the mixed option (Council, 2010;

Langkamp and B

ohner, 2011).

Software containers, which are becoming an es-

sential part of modern software development and de-

ployment, offer a way for mitigating or eliminating

many of the problems cited above, and simplify the

deployment of computing infrastructures for both ed-

ucation and research. Containers allow the packaging

of a working (and validated) WRF instance, ready to

be used, preventing the user from having to install and

to set up all dependencies.

4.1 WRF Containers for ARM

Although a non-ofﬁcial container for WRF on x86

platforms

developed by NCAR researchers exists al-

ready, this image is not adapted for cluster deploy-

ment and has not evolved since its launching, despite

the intentions of the authors (Hacker et al., 2017).

When we started developing a version compatible

with the SoCs under the ARM platform, we had to ad-

dress a few issues related to the availability of some

libraries and compiling options. Indeed, the origi-

nal container image from NCAR is based on CentOS,

which does not support ARM processors yet. This

forced us to move to Ubuntu as our new base image.

Not only Ubuntu supports ARM but most libraries re-

quired by WRF are available as packages, simplifying

the installation (this reason also motivated us to avoid

alpine, a popular image for containers).

The other issue is related to the pre-conﬁguration

of the parameters for WRF compiling. While WRF

supports several compilers (gcc, Intel, Portland, etc.)

and architectures, ARM processors are not listed

among the supported ones. Fortunately, a few re-

searchers have faced the same problem before

and

we were able to apply their instructions. While the

adaption requires the editing of the conﬁguration ﬁles

in order to ﬁnd a match to the ARM platform, the con-

ﬁguration differences for both ARM or x86 are min-

imal, and most of the process is simple and straight-

forward.

In addition, we modiﬁed the way input data is

accessed, moving from a ﬁxed Docker volume to a

mounted ﬁle system. We believe that this gives more

ﬂexibility to develop workﬂows to execute the ap-

plication regularly, like for example in a daily fore-

cast schedule. This also helps to ﬁx a storage prob-

lem that may touch many SoC boards. Indeed, the

https://github.com/NCAR/container-wrf

http://supersmith.com/site/ARM.html

ﬁrst step on the WRF workﬂow (Geogrid) depends

on a large geographical database (WPS GEOG). Without

careful pruning, the full database reaches 60GB when

uncompressed, which is too voluminous for most SoC

boards. By allowing the use of external volumes, we

allow the users to attach external storage drives to

their nodes. As demonstrated later in Section 5.2,

only a single node requires this database, so we can

minimize the costs and the management complexity

in the SoC cluster.

Finally, we also updated the WRF version to

3.9.1.1, as the version present on the NCAR image

dates back to 2015. As WRF 4.0 has recently been

launched, we are planning to develop new images for

this version.

As a result, WRF containers for both ARM and

x86 architectures are now available at Docker Hub

and the scripts and Docker ﬁles are available at

GitHub

5 PERFORMANCE

BENCHMARKING

In order to assess the interest of using SoC based on

ARM for meteorological simulations with WRF, we

conducted a series of benchmarks to evaluate the per-

formance of the application. The next sessions de-

scribe the experiments and the platforms we com-

pared.

5.1 Deﬁnitions

For the benchmarks we used a dataset for a 12-hour

forecasting on October 18, 2016 and concerning an

area covering Uruguay and the south of Brazil. Al-

though small, this dataset is often used as training ex-

ample for meteorology students at Universidade Fed-

eral de Santa Maria, who can modify the parameters

and compare the results to the ground truth observa-

tions. The entire dataset is accessible at our github

repository.

In the benchmarks we compared different SoC

models and a x86 computers. The SoC boards

include a Raspberry Pi 2 model B (Broadcom

BCM2835 processor, ARM Cortex-A7, 4 cores,

900MHz, 1GB RAM) and two Raspberry Pi 3

(Broadcom BCM2837 processor, ARM Cortex-A53,

4 cores, 1.2GHz, 1GB RAM). The x86 computers

https://hub.docker.com/r/lsteffenel/wrf-container-

armv7l/

https://github.com/lsteffenel/wrf-container-armv7l-

RaspberryPi

A Containerized Tool to Deploy Scientiﬁc Applications over SoC-based Systems: The Case of Meteorological Forecasting with WRF

565

were represented by a server with an Intel Xeon

E5-2620v2 processor (2.10 GHz, 12 cores, 48GB

RAM). We also experimented with other SoC boards

like NanoPi NEO (Allwinner H3, ARM Cortex-A7,

4 cores, 1.2GHz, 512MB RAM), an NTC C.H.I.P.

(AllWinner R8 processor, ARM Cortex-A8, 1 core,

1GHz, 512MB RAM) and a Banana Pi (Allwinner

A83T processor, Arm Cortex-A7 8 cores, 1.8 GHz,

1GB RAM), but their poor performances or incompat-

ibilities with Docker forced us to exclude these plat-

forms from the subsequent tests.

All measures presented in this section correspond

to the average of at least 5 runs. For the Docker

Swarm clusters, we interconnected the devices via a

1 Gbps switch over RJ45, to avoid unreliable results

due to the wireless connections.

Furthermore, as the WRF workﬂow is composed

by 5 steps, we computed the execution time of

each step individually, in order to assess the best

deployment strategy. Therefore, the next sections

present the separate analysis of the preprocessing

steps (WPS+real) and the forecast step (WRF).

5.2 WPS and Real

As explained in Section 4.1, the size of the geograph-

ical database used by the Geogrid step on WPS often

poses a problem for typical SoC internal storage. In-

deed, in our experiments, we had to attach an external

USB storage device to a Raspberry Pi node to accom-

modate the WPS GEOG database.

Because WPS can be compiled with the dmpar op-

tion, we ﬁrst tried to identify whether the use of MPI

would beneﬁt each one of the WPS steps (as well as

the real step). For such, we measured the execution

time of each step when varying the number of com-

puting cores (using the mpirun -np option).

100

150

200

250

300

1 2 3 4

Time (s)

Number of Cores

WRF preparation steps in a Raspberry Pi 3

Geogrid Ungrib Metgrid Real

Figure 3: Performance of WPS steps when varying the

number of cores.

The result of this benchmark, illustrated in Figure

3 and detailed in Table 1, indicates that only the Ge-

Table 1: Relative performance of WPS steps on a single

machine (in seconds).

Cores 1 2 3 4

Geogrid 173.81 119.59 111.56 88.54

Ungrib 188.78 196.15 212.97 241.57

Metgrid 151.42 120.47 123.56 119.26

Real 16.437 16.54 16.59 16.69

ogrid step effectively beneﬁts from a multi-core exe-

cution. In the case of Ungrib, the parallel execution

even penalizes the algorithm. The Metgrid step shows

a small performance gain when parallelizing but the

execution time stabilizes for 2 or more cores, and the

Real step shows no evidence of improvements. Addi-

tional benchmarks on the network performance, such

as those conducted by (Yong et al., 2018), may also

help tuning the different steps.

Even if Geogrid presents some performance im-

provements when run in parallel, the acceleration is

under-optimal (we need 4x cores to obtain only a 50%

performance improvement). Associated with the stor-

age limitations cited before and its relatively small

impact to the overall execution time (when compar-

ing with the forecast step, see Section 5.3), we ad-

vise against running Geogrid cluster-wide. Instead,

we suggest assigning a single node (the master) who

can preprocess the data for the forecast model.

From these results, we suggest organizing the de-

ployment of the preprocessing steps as follows:

• Geogrid - parallel execution with mpirun, pref-

erentially only in the machine hosting the

WPS GEOG database (the master node);

• Ungrib - serial execution in a single core;

• Metgrid - serial execution or at most parallel exe-

cution with mpirun in a single machine;

• Real - serial execution in a single core.

5.3 WRF

Contrarily to the preprocessing steps that ﬁnally rep-

resent only a small computing load, the WRF fore-

cast is the main workload of the workﬂow. This is

even more important on “production” environments,

where more than a simple 12-hour forecast need to be

computed.

Indeed, the forecasting step of WRF can beneﬁt

from multicore and cluster scenarios. Figure 4 indi-

cates the average execution time when executing the

WRF step on a single Raspberry Pi 3 (1 to 4 cores),

on a cluster with two Raspberry Pi 3 (summing up 8

cores) and on a Swarm cluster with two Raspberry Pi

3 and one Raspberry Pi 2 (summing up 12 cores).

CLOSER 2019 - 9th International Conference on Cloud Computing and Services Science

566

Table 2: WRF relative performance on a single machine (in

seconds).

Cores R Pi 2 R Pi 3 Xeon

1 6268.96 5647.47 539.05

2 3280.34 2473.89 314.69

3 2468.89 1801.18 264.53

4 2075.88 1602.68 173.55

Table 3: Performance on a Raspberry Pi swarm cluster (in

seconds).

Machines Cores Pi Swarm

1 x Pi 3 4 1602.68

2 x Pi 3 8 1322.42

2 x Pi 3 + Pi 2 12 1306.10

If multicore execution allows an important perfor-

mance gain, the Swarm cluster executions show more

mitigated results. As WRF is regularly executed in

production clusters with the dmpar mode (MPI) and

that the Docker overlay network imposes little over-

load, we suspect that the reduced performance gain is

related to the network performance on the Raspberry

Pis. Indeed, as observed by (Beserra et al., 2017), the

access to the communication bus is a recurrent prob-

lem on SoC, and the Raspberry Pi suffer from a “low”

speed interconnection card (10/100 Mbps only).

1000

2000

3000

4000

5000

6000

1 2 3 4 8 12

Execution Time (s)

Number of cores

WRF Multicore and Swarm Mode Performances

1 node 2 nodes 3 nodes

Figure 4: Performance of WRF in multicore and swarm

cluster mode.

Tables 2 and 3 detail these results, and also present

a performance comparison with a Xeon processors.

While the x86 processors are faster, the execution

time on the Raspberry Pis is still acceptable, enough

to deliver forecasts on a daily or even hourly basis, or

for education and training. If we consider the mate-

rial and environmental cost of the SoC solution, it is

indeed an interesting alternative for scientiﬁc applica-

tions like WRF.

6 CONCLUSIONS

This work focuses on the deployment of containerized

scientiﬁc applications over a cluster of SoC-based

systems. Most System-on-a-Chip (SoCs) are based on

the ARM architecture, a ﬂexible and well-known fam-

ily of processors that now started to inﬁltrate the HPC

(High Performance Computing) domain. Container-

based virtualization, on the other side, enables the

packaging of complex applications and their seamless

deployment. Together, SoC and containers represent

a promising alternative for the development of com-

puting infrastructures, associating the low cost and

minimal maintenance of SoCs and the ﬂexibility of

containers.

Nonetheless, most traditional scientiﬁc applica-

tions rely on MPI for scalability, and popular con-

tainer managers like Docker do not offer a proper sup-

port for MPI. We therefore propose, in a ﬁrst moment,

a service speciﬁcation to deploy a Docker Swarm

cluster that is ready for MPI applications. Later, we

study how to adapt the WRF meteorological forecast

model to run on ARM-based SoCs. Benchmarks on

different SoC platforms are used to evaluate the per-

formance and the interest of using containers over

SoC clusters. These results indicate that if popular

SoCs such as Raspberry Pi cannot compete in per-

formance with x86 processors, they still are able to

deliver results within an acceptable delay.

Future improvements to this work include the de-

velopment of a generic platform capable of accom-

modating other MPI applications, as well as the sup-

port for recent versions of WRF and its integration on

more elaborated frameworks.

ACKNOWLEDGEMENTS

This research has been partially supported by the

French-Brazilian CAPES-COFECUB MESO project

and the GREEN-CLOUD project

(#16/2551-

0000 488-9), from FAPERGS and CNPq Brazil, pro-

gram PRONEX 12/2014.

REFERENCES

Ali, M., Vlaskamp, J. H. A., Eddin, N. N., Falconer, B., and

Oram, C. (2013). Technical development and socioe-

conomic implications of the Raspberry Pi as a learn-

ing tool in developing countries. In Computer Sci-

http://meso.univ-reims.fr

http://www.inf.ufrgs.br/greencloud/

A Containerized Tool to Deploy Scientiﬁc Applications over SoC-based Systems: The Case of Meteorological Forecasting with WRF

567

ence and Electronic Engineering Conf. (CEEC), pages

103–108. IEEE.

Alvarez, L., Ayguade, E., and Mantovani, F. (2018).

Teaching HPC systems and parallel programming

with small-scale clusters. In 2018 IEEE/ACM Work-

shop on Education for High-Performance Computing

(EduHPC), pages 1–10.

Azab, A. (2017). Enabling Docker containers for high-

performance and many-task computing. In 2017

IEEE International Conference on Cloud Engineering

(IC2E), pages 279–285.

Beserra, D., Pinheiro, M. K., Souveyet, C., Steffenel, L. A.,

and Moreno, E. D. (2017). Performance evaluation of

os-level virtualization solutions for HPC purposes on

SoC-based systems. In 2017 IEEE 31st International

Conference on Advanced Information Networking and

Applications (AINA), pages 363–370.

Burns, B., Grant, B., Oppenheimer, D., Brewer, E., and

Wilkes, J. (2016). Borg, omega, and kubernetes. Com-

mun. ACM, 59(5):50–57.

Chung, M. T., Quang-Hung, N., Nguyen, M., and Thoai, N.

(2016). Using docker in high performance computing

applications. In 2016 IEEE Sixth International Con-

ference on Communications and Electronics (ICCE),

pages 52–57.

Council, H. A. (2010). Weather research and fore-

casting (WRF): Performance benchmark and

proﬁling, best practices of the HPC advisory

council. Technical report, HPC Advisory Coun-

cil, http://www.hpcadvisorycouncil.com/pdf/

WRF Analysis and Proﬁling Intel.pdf.

Cox, S. J., Cox, J. T., Boardman, R. P., Johnston, S. J., Scott,

M., and O’brien, N. S. (2014). Iridis-pi: a low-cost,

compact demonstration cluster. Cluster Computing,

17(2):349–358.

d. Bayser, M. and Cerqueira, R. (2017). Integrating mpi

with docker for hpc. In 2017 IEEE International Con-

ference on Cloud Engineering (IC2E), pages 259–265.

Felter, W., Ferreira, A., Rajamony, R., and Rubio, J. (2014).

An updated performance comparison of virtual ma-

chines and linux containers. IBM technical report

RC25482 (AUS1407-001), Computer Science.

Hacker, J. P., Exby, J., Gill, D., Jimenez, I., Maltzahn, C.,

See, T., Mullendore, G., and Fossell, K. (2017). A

containerized mesoscale model and analysis toolkit to

accelerate classroom learning, collaborative research,

and uncertainty quantiﬁcation. Bulletin of the Ameri-

can Meteorological Society, 98(6):1129–1138.

Higgins, J., Holmes, V., and Venters, C. (2015). Orches-

trating docker containers in the HPC environment. In

Kunkel, J. M. and Ludwig, T., editors, High Perfor-

mance Computing, pages 506–513, Cham. Springer

International Publishing.

Kurtzer, G. M., Sochat, V., and Bauer, M. W. (2017). Singu-

larity: Scientiﬁc containers for mobility of compute.

PLOS ONE, 12(5):1–20.

Langkamp, T. and B

ohner, J. (2011). Inﬂuence of the com-

piler on multi-CPU performance of WRFv3. Geosci-

entiﬁc Model Development, 4(3):611–623.

Marathe, A., Harris, R., Lowenthal, D., de Supinski, B. R.,

Rountree, B., and Schulz, M. (2014). Exploiting re-

dundancy for cost-effective, time-constrained execu-

tion of HPC applications on Amazon EC2. In 23rd Int.

Symposium on High-Performance Parallel and Dis-

tributed Computing, pages 279–290. ACM.

Molano, J. I. R., Betancourt, D., and G

omez, G. (2015).

Internet of things: A prototype architecture using a

Raspberry Pi. In Knowledge Management in Organi-

zations, pages 618–631. Springer.

Montella, R., Giunta, G., and Laccetti, G. (2014). Virtu-

alizing high-end GPGPUs on ARM clusters for the

next generation of high performance cloud comput-

ing. Cluster computing, 17(1):139–152.

Morabito, R., Kjallman, J., and Komu, M. (2015). Hyper-

visors vs. lightweight virtualization: a performance

comparison. In Cloud Engineering (IC2E), IEEE Int.

Conf. on, pages 386–393. IEEE.

Nguyen, N. and Bein, D. (2017). Distributed MPI cluster

with Docker Swarm mode. In 2017 IEEE 7th Annual

Computing and Communication Workshop and Con-

ference (CCWC), pages 1–7.

Ruiz, C., Jeanvoine, E., and Nussbaum, L. (2015). Perfor-

mance evaluation of containers for HPC. In Hunold,

S., Costan, A., Gim

enez, D., Iosup, A., Ricci, L.,

omez Requena, M. E., Scarano, V., Varbanescu,

A. L., Scott, S. L., Lankes, S., Weidendorfer, J., and

Alexander, M., editors, Euro-Par 2015: Parallel Pro-

cessing Workshops, pages 813–824, Cham. Springer

International Publishing.

Skamarock, W. C., Klemp, J. B., Dudhia, J., Gill, D. O.,

Barker, D. M., Duda, M. G., Huang, X.-Y., Wang,

W., and Powers, J. G. (2008). A description of the

advanced research WRF version 3, NCAR techni-

cal note. National Center for Atmospheric Research,

Boulder, Colorado, USA.

Somasundaram, T. S. and Govindarajan, K. (2014).

CLOUDRB: A framework for scheduling and manag-

ing high-performance computing (HPC) applications

in science cloud. Future Generation Computer Sys-

tems, 34:47–65.

Steffenel, L. and Kirsch-Pinheiro, M. (2015). When the

cloud goes pervasive: approaches for IoT PaaS on a

mobiquitous world. In EAI International Conference

on Cloud, Networking for IoT systems (CN4IoT 2015),

Rome, Italy.

Weloli, J. W., Bilavarn, S., Vries, M. D., Derradji, S., and

Belleudy, C. (2017). Efﬁciency modeling and explo-

ration of 64-bit ARM compute nodes for exascale. Mi-

croprocessors and Microsystems, 53:68 – 80.

Wolf, W., Jerraya, A. A., and Martin, G. (2008).

Multiprocessor system-on-chip (MPSoC) technology.

Computer-Aided Design of Integrated Circuits and

Systems, IEEE Transactions on, 27(10):1701–1713.

Yong, C., Lee, G.-W., and Huh, E.-N. (2018). Proposal of

container-based hpc structures and performance anal-

ysis. 14.

Younge, A. J., Henschel, R., Brown, J. T., von Laszewski,

G., Qiu, J., and Fox, G. C. (2011). Analysis of virtu-

alization technologies for high performance comput-

ing environments. In IEEE 4th International Confer-

ence on Cloud Computing, CLOUD ’11, pages 9–16,

Washington, DC, USA. IEEE Computer Society.

CLOSER 2019 - 9th International Conference on Cloud Computing and Services Science

568