
set of threats for each HPC asset. Accordingly, our
technique relies on a Systematic Literature Review
(SLR) aimed at collecting all the threats in a struc-
tured way as well as an overview of the threat mod-
elling techniques used to collect the threats. Resulting
of the SLR, the data extracted will be analysed to de-
rive threats, formulated structurally. A threat, in our
context, is delineated as a triad of (threat agent, com-
promised asset, and malicious behaviour). In essence,
it represents the proactive actions undertaken by a
threat agent with the intention of compromising an
asset. It is worth noticing that in this work, we did
not take into account the threat agents since our aim
is to collect threats and build a structured threat cat-
alogue. For further details, a technique aimed at se-
lecting threat agents in an automated way is shown in
(Granata and Rak., 2021). Data Analysis phase de-
scribes the way threats have been selected from the
papers as well as the data model used to describe a
threat.
As a result, (i) an extension of our modelling tech-
nique for the considered domain is formulated; and
(ii) the threat catalogue, which is a structured repre-
sentation of all information related to the security of
the system, highlighting the threats to which each as-
set type is exposed. The following sections will de-
scribe in detail each phase of the technique applied to
the HPC context.
4 HPC DOMAIN ANALYSIS
This section presents a detailed analysis of the High-
Performance Computing domain, taking into account
the reference architecture proposed by NIST (Guo
et al., 2023). Subsequently, starting from the refer-
ence architecture, the identified assets are described,
highlighting the reasons why they need to be ade-
quately protected. Lastly our modelling technique ex-
tension (Granata et al., 2022) is presented, focusing
on new asset types.
4.1 HPC Reference Architecture
According to NIST (Guo et al., 2023), as in evidence
in figure 1, an HPC system consists of four distinct
function zones: (i) access zone; (ii) computing zone;
(iii) data storage zone; and (iv) management zone.
The access zone consists of one or more nodes,
connected to external networks, that provide services
for authenticating and authorizing the access of users
and administrators and, possibly data transfer services
and web portals allowing for a range of web-based
interfaces to access HPC system services. At least
one node provides shells that can be used to launch
interactive or batch jobs.
The computing zone involves a set of compute
nodes connected by one or more high-speed networks
through which it is possible to run parallel jobs at
scale. Some nodes can be equipped with hardware ac-
celerators (e.g., GPU) to speed up applications. High-
performance communication networks (e.g., Infini-
Band, Omni-Path) are characterized by high band-
width and ultra-low latency; and they serve the pur-
pose of connecting compute nodes with data storage
zones. Instead, non-high-performance communica-
tion networks (e.g., Ethernet) are used as cluster inter-
nal networks to connect the high-performance com-
puting zone with the management zone and access
zone.
The data storage zone includes one or multiple high-
speed parallel file systems to provide data storage ser-
vices for user data. They are designed to handle vast
amounts of data, offering efficient storage capabili-
ties and rapid data access for both reading and writing
purposes. Typical classes of storage systems encom-
pass parallel file systems (PFS), node-local storage
for low-latency workloads, and archival file systems
that defend against data loss and support campaign
storage.
The management zone encompasses a pool of
nodes for HPC system operation and management.
It provides necessary protocols and services required
by the hosts within the other zones such as Do-
main Name Serivces (DNS), Network Time Protocol
(NTP), as well as configuration definitions, authenti-
cation, and authorization services through an LDAP
server. These services can run on dedicated hard-
ware or virtual machines. Additionally, the manage-
ment zone includes storage systems for configuration
data and node images, as well as logging and analy-
sis servers to alert administrators of events. Resource
requests for specific workloads are coordinated by
schedulers like SLURM and Portable Batch System
(PBS) due to the distributed nature of HPC systems.
4.2 Asset Identification
As previously explained, assets denote what must be
safeguarded. In this section, we identified 23 assets
from the analysis of HPC reference architecture pro-
posed by NIST. Each node in the described zone is
treated as an asset, resulting in 10 initial asset types:
login node, data transfer node, web portal node, com-
pute node, storage node, storage array, storage disk,
scheduler node, cluster services node, and provision-
ing node. Certain nodes, like login, data transfer, and
web portal nodes, serve as access points and are con-
Systematic Threat Modelling of High-Performance Computing Systems: The V:HPCCRI Case Study
329