is overloaded. More technically and specifically, we
aim to offload the computational workload on OCTO-
PUS to Microsoft Azure through the use of the cloud
bursting technologies. Also, this research investigates
the feasibility of the OCTOPUS-Azure cloud bursting
environment as well as the technical issues that need
to be tackled for the future use of IaaS Cloud for the
future supercomputing environment at the CMC.
The structure of this paper is as follows. In Sec-
tion 2, we briefly describe the structure of OCTOPUS
and detail the current problems. Next, the technical
requirements towards the envisioned cloud bursting
functionality to be deployed on OCTOPUS are sum-
marized. Section 3 presents the cloud bursting func-
tionality which we have developed in this research. In
Section 4, the experience and practice which we have
obtained through this research are described as well as
the evaluation result of the cloud bursting functional-
ity. Section 5 describes related works and Section 6
concludes this paper.
2 TECHNICAL REQUIREMENTS
2.1 OCTOPUS Overview
Table 1: OCTOPUS System Configuration.
Compute 236 General purpose CPU: 2 Intel Xeon Gold 6126
Node CPU nodes (SkyLake/2.6GHz/12C)
(471.24 TFlops) Memory: 192 GB
37 GPU nodes CPU: 2 Intel Xeon Gold 6126
(858.28 TFlops) (SkyLake/2.6GHz/12C)
GPU: 4 NVIDIA Tesla P100(SXM2)
Memory: 192 GB
44 Xeon Phi nodes CPU: Intel Xeon Phi 7210
(117.14 TFlops) (Knights Landing/1.3GHz/64C)
Memory: 192 GB
2 Large-scale shared CPU: 8 Intel Xeon Platinum 8153
memory nodes (Skylake/2.0 GHz/16C)
(16.38 TFLOPS) Memory: 6TB
Storage DDN EXAScaler (Lustre /3.1 PB)
Interconnect InfiniBand EDR (100Gbps)
OCTOPUS is a 1.46 PFlops hybrid cluster system
composed of general-purpose CPU (Intel Skylake)
nodes, GPU (NVIDIA Tesla P100) nodes, many-core
(Intel Knights Landing) nodes, large-scale shared-
memory (6 TB memory) nodes, frontend servers and
3 PB Storage (Table 1). These nodes and storage
are connected on an Infiniband EDR network and
thus each node can take advantage of 100 Gbps full
bi-directional bandwidth. Also, a Mellanox CS7500
switch, which is a 648-port EDR 100Gbps InfiniBand
switch, houses all of the compute nodes and storage
servers. This connection configuration allows each
node to communicate with each other with 1-hop la-
tency. All of compute nodes except the large-scale
shared memory nodes have introduced DLC (direct
liquid cooling) to cool their processers and accelera-
tors to stably provide high performance.
For the operation of OCTOPUS, we charge users
for the use of compute nodes on a per-node-hour ba-
sis. In other words, the CMC has introduced a ser-
vice charge rule for OCTOPUS based on the cost
for power consumed by using a compute node for an
hour. For this reason, the CMC controls and operates
OCTOPUS to avoid a situation where a job request
cannot use a compute node simultaneously with other
job requests from a perspective of fairness in service
charge rules.
To allow users to efficiently and effectively use the
computational resources of OCTOPUS as well as per-
form the above-mentioned operational manner, NEC
NQSII and JobManipulator (NQSII/JM), which are
the proprietary job management servers from NEC,
has been adopted as the job management server in
OCTOPUS. This NQSII/JM has the functionality of
receiving job requests from users and then assign-
ing them to an appropriate set of compute nodes.
As described above, since OCTOPUS is composed
of heterogenous architectures, NQSII/JM sends job
requests to an appropriate queue for one of the job
classes, each of which prescribes the degree of paral-
lelism, set for each type of compute node. This mech-
anism is in charge of constantly maintaining higher
job throughput as well as reducing user waiting time.
For the actual daily operation of OCTOPUS, we dy-
namically change the configuration of the queue of
the NQSII/JM to reduce user waiting time by moni-
toring the current status of job submission and user
waiting time.
2.2 Problem in OCTOPUS
Despite the administrators’ efforts to avoid the high
utilization and a long user waiting time, general-
purpose CPU nodes and GPU nodes, in particular,
have been faced with extremely high utilization since
their beginning of operation. In 2019, the second
operational academic year, the general-purpose CPU
and GPU nodes have been operated between 80 and
90 percent on average. Also, many-core nodes have
been operated with high utilization due to the ten-
dency of users to utilize many-core nodes as alternate
compute nodes for the general-purpose CPU nodes
for the purpose of workload offloading. In short, some
of users prefer to get their own job completed quickly
on many-core nodes without waiting long even if their
jobs cannot obtain better performance than general-
purpose CPU nodes. The administrators at the CMC
welcomed this high utilization situation of OCTO-
PUS at first because we recognized that the procure-
ment of OCTOPUS was successful. Currently, how-
First Experience and Practice of Cloud Bursting Extension to OCTOPUS
449