A 10-Layer Model for Service Availability Risk Management

Jan Marius Evang

1,2

Oslo Metropolitan University, Oslo, Norway

Simula Metropolitan Center for Digital Engineering, Oslo, Norway

Keywords:

Risk Assessment, Availability Management.

Abstract:

Effective management of service availability risk is a critical aspect of Network Operations Centers (NOCs)

as network uptime is a key performance indicator. However, commonly used risk classiﬁcation systems such

as ISO27001:2013, NIST CSF, and NIST 800-53 often do not prioritize network availability, resulting in

the potential oversight of certain risks and ambiguous classiﬁcations. This paper presents a comprehensive

examination of network availability risk and proposes a 10-layer model that aligns closely with the operational

framework of NOCs. The 10-layer model encompasses hardware risk, risks across various network layers, as

well as external risks such as cloud, human errors, and political governance. By adopting this model, critical

risks are less likely to be overlooked, and the NOC’s risk management process is streamlined. The paper

outlines each layer of the model, provides illustrative examples of related risks and outages, and presents the

successful evaluation of the model on two real-life networks, where all risks were identiﬁed and appropriately

classiﬁed.

1 INTRODUCTION

From the advent of computer networks, disruptions to

the network service have been a persistent challenge.

A Network Operations Centre (NOC)’s most impor-

tant goal has been to make these disruptions invisible

to the end users, since they can lead to lost productiv-

ity, revenue, and erode customer trust. At all times,

businesses have performed some form of risk man-

agement, whether formally or informally, and count-

less books have been written on the subject, to the

point where an ofﬁcial standard was created with the

1st Edition of ISO31000 (ISO, 2018) in 2009.

A “top down” approach to risk identiﬁcation is to

conduct interviews with key stakeholders, based on

one of the common security standard frameworks’s

classiﬁcation system. This approach may be con-

fusing and not optimal for a NOC team. Some-

times these categories are very generic, for instance

the ISO27001:2013 (ISO, 2022a) standard has chap-

ters like “Cryptography” and “Communications Se-

curity”, and NIST CSF (Barrett, 2018) has “Protec-

tive Technology”, while the updated ISO27001:2022

has only four themes of ”People”, ”Organizations”,

”Technology” and ”Physical”

. Furthermore, one

ISO27001:2022 may be an improvement over

ISO27001:2013, but detailed risk discovery data based on

network availability risk often spans multiple cate-

gories, for instance NIST800-53’s (NIST, 2022) con-

trols

of “Audit”, “Security Assessment”, “Contin-

gency Planning”, “Incident Response”, “Media Pro-

tection”, “Planning”, “Performance Measurement”,

“System and Communication Protection”, “System

Integrity” and “Supplier Risk” have signiﬁcant over-

laps. We experimentally verify these in Section 3.

To address these challenges, this paper proposes a

novel framework for the discovery and classiﬁcation

of availability risk in network services. Our model is

based on the ISO/OSI 7-layer reference model (OSI

model) (Zimmermann, 1980), which has proved to

be a very suitable tool for dividing network functions

into manageable compartments (See Figure 1). The

OSI model is not perfect, but is used in some form

in network courses, research and standardisation pro-

cesses. The layers of the OSI model are well deﬁned,

common network protocols map reasonably well to

the layers and the model is universally recognized in

the networking business.

However, when it comes to network availability

risk management, a different separation of layers is

this model was not available to us at the time of writing.

Actions, devices, procedures, techniques, or other

measures that reduce the vulnerability of an information

system.

716

Evang, J.

A 10-Layer Model for Service Availability Risk Management.

DOI: 10.5220/0012092600003555

In Proceedings of the 20th International Conference on Security and Cryptography (SECRYPT 2023), pages 716-723

ISBN: 978-989-758-666-8; ISSN: 2184-7711

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

Figure 1: The ISO/OSI model, as used in a typical network

service.

suggested in this paper. Some risks lie outside the

OSI model layers, and we slightly modify the layer

division to better match the risks that a NOC needs

to manage. Although the idea of additional layers be-

yond the 7-layer OSI model is not new, as seen in pre-

vious works like (Taylor and Wexler, 2003; Kachold,

2009), a comprehensive description of all the layers

has not been published until now. In this paper, we

use named layers to describe the new proposed lay-

ers, while numbered layers refer to the layers of the

OSI model, to avoid confusion.

Information security is often classiﬁed into three

main objects: conﬁdentiality, integrity, and availabil-

ity (Anderson, 1972). While conﬁdentiality and in-

tegrity are typically addressed together, availability

is often handled separately by a Network Operations

Centre (NOC). This chapter focuses on the topic of

availability and its importance for all types of NOCs,

whether in-house or outsourced. In today’s intercon-

nected world, organizations heavily rely on informa-

tion availability across various layers, encompassing

customer interactions and service delivery. However,

due to the multitude of risks involved, identifying and

managing these risks can be challenging. To facilitate

the risk identiﬁcation process, common approaches

involve grouping risks into manageable areas and an-

alyzing them individually to gain a comprehensive

overview. This paper aims to categorize and discuss

risk topics associated with operating a network ser-

vice, highlighting examples of availability breaches at

each layer. Mitigation strategies within the same layer

or across different layers are also presented. Please

note that the references cited mostly refer to media

coverage of outages, as detailed research on such in-

cidents is seldom available, and the provided content

may include speculations.

Risk is deﬁned as the impact of uncertainty on ob-

jectives, and it is typically expressed in terms of the

likelihood of an event occurring and its consequences

or impact, which can be qualitative or quantitative.

Numerically, we deﬁne the risk level as the product of

likelihood and impact. The impact can be measured

in various ways, such as packet loss, total downtime,

or ﬁnancial loss.

Every layer within the model poses its own set

of risks, necessitating a holistic approach where the

NOC considers all layers, quantiﬁes associated risks,

and determines appropriate mitigation actions. This

comprehensive perspective is crucial for effective risk

management and ensuring the availability of network

services.

2 SECURITY LAYERS

The security topics in our proposed 10-layer model

are deﬁned with the service layer in the middle, where

the total availability (uptime) is measured. Below

the service layer, we have layers whose risks are

predictable and directly affect service delivery, and

where industry standards have emerged to handle

these risks. Above the service layer, we ﬁnd topics

that indirectly and less predictably contribute to the

availability risk of the NOC, like risks associated with

human errors, company culture and legal responsibil-

ities.

Figure 2: The proposed 10-layer model for network service

risk assessment.

2.1 Physical Layer

This category encompasses risks associated with

physical hardware, including cables, networking

equipment, server equipment, workstations, phones,

and IoT devices. Outages at the physical layer can be

caused by equipment defects, broken cables, planned

maintenance activities, power failures, and physical

A 10-Layer Model for Service Availability Risk Management

717

security breaches. Controls for managing these out-

ages can be found in ISO27002 Clause 11 (Physical &

Environmental Security) (ISO, 2022b) and NIST800-

53’s PE controls.

Physical layer outages often have longer durations

and may require on-site technician visits, resulting

in extended Time To Recover (TTR). Therefore, it is

crucial to mitigate these risks proactively. Duplication

and clustering of networking and server hardware,

along with redundant components such as power sup-

plies and hard disks with automatic failover, can be

implemented. Critical network links may require du-

plicate network cables and the use of network pro-

tocols to maintain service availability during Physi-

cal Layer failures. One particularly severe physical

layer failure is a ﬁre in a server room triggering a

ﬁre suppression system, potentially causing perma-

nent equipment failure. Mitigating such an outage

involves distributing the service across multiple ge-

ographic locations to ensure service continuity.

Selecting high-quality hardware and having hard-

ware service agreements can enhance the likelihood

of maintaining reliable physical layer operations. For

low TTR requirements, keeping spare parts in-house

can be considered, based on a Return on Investment

assessment.

Risk discovery at the Physical Layer is relatively

straightforward, as every physical asset can fail and

should be included in the risk registry. Evaluating the

likelihood of failures and implementing measures to

reduce their impact are essential.

Examples of outages include the Jan 2020 earth-

quake in Puerto Rico (Santiago et al., 2020), which

caused prolonged power outages and network faults,

leading to signiﬁcant internet disruptions. However,

communications were still upheld through the re-

silient cellular network during these events (NET-

BLOCKS, 2020). As another example, multiple sub-

sea cables following the same paths in the Suez Canal

have posed increased risks of shared-fate problems,

resulting in several outages (Burgess, 2022).

2.2 Local Network Layer

The local network refers to the network infrastructure

within a building or campus, where the NOC owns

and manages the hardware and cabling. This layer in-

cludes networks such as server-room networks, build-

ing cabling, ofﬁce-space networks, as well as wireless

networks like WiFi, cellular, and IoT.

Risks at the Local Network Layer primarily stem

from ﬁrmware or conﬁguration errors in network

equipment, along with capacity issues like full disks,

out-of-memory situations, and network capacity lim-

itations. Monitoring and proactive planning are key

measures to mitigate these risks. Additionally, this

layer plays a crucial role in mitigating most of the

risks originating from the Physical Layer by imple-

menting local (network) protocols like RAID (Pat-

terson et al., 1988), LACP (C/LM - LAN/MAN

Standards Committee, 2000), VRRP (Hinden, 2004),

High Availability protocols, and Interior Gateway

Protocols (IGP) such as IS-IS (ISO, 2002) and OSPF

(Moy, 1998).

Examples of outages include one of GitHub’s ma-

jor outages in December 2012, which occurred due to

the failure of multi-chassis link aggregation protocols

at the local network layer when a switch experienced

partial malfunctioning (Imbriaco, 2012). Another sig-

niﬁcant outage took place in February 2020, where

the RIPE RPKI repository experienced a three-day

outage caused by a full disk quota, leading to the in-

validation of all RIPE RPKI routes (Trenaman, 2020).

2.3 Wide Area Network Layer

The wide area network (WAN) encompasses net-

works that are logically part of the NOC’s oper-

ations but physically leased from network service

providers. These networks can include optical ﬁbers,

Layer 1 wavelengths, Layer 2/2.5 MPLS-like services

(Viswanathan et al., 2001), or overlay networks like

SD-WAN over a Layer 3 service. WANs typically

span metropolitan, national, or international areas,

and may also include in-building or space-based net-

works. Additionally, Layer 1/2 interconnections with

remote customers and suppliers of network services

are considered within this layer.

Wide area networks often experience full or par-

tial outages, as documented in (Evang et al., 2022).

These outages can have various root causes, including

physical layer or local network layer events, conges-

tion, or issues from other layers. However, the com-

mon symptoms are outages or packet loss. Mitiga-

tion strategies for network outages in WANs often in-

volve duplicate links, redundancy protocols, MPLS,

VXLAN (Mahalingam et al., 2014), BFD (Katz and

Ward, 2010), and IGP protocols such as IS-IS and

OSPF. However, the time taken for failover (TTR) is

usually longer due to the distances involved, which

may cause delays in protocol updates. Capacity risks

are more signiﬁcant in wide area networks since ser-

vices are typically purchased based on capacity, and

service providers may drop packets if the agreed traf-

ﬁc rate is exceeded. Mitigating this risk requires care-

ful consideration, including over-purchasing of ca-

pacity, planning for backup links, assessing shared-

fate risks of links, and potentially engaging multiple

SECRYPT 2023 - 20th International Conference on Security and Cryptography

718

providers to safeguard against total provider failure.

Example of outage: In June 2022, simultaneous

outages occurred in two major subsea cable systems,

leading to congestion and packet loss for numerous

wide area networks traversing the Suez Canal (Bel-

son, 2022).

2.4 Internet Layer

The internet layer focuses on the risks associated with

connectivity to external networks that are beyond the

direct control of the NOC, where they best-case have

a contractual agreement, and worst-case have no con-

trol whatsoever.

The predominant protocol at this layer is BGP

(Rekhter et al., 2006), which encompasses IP transit,

Internet Exchanges, private peering, and BGP cus-

tomers. While BGP effectively navigates the intri-

cate Internet landscape, it suffers from security lim-

itations (Freedman et al., 2019). The protocol relies

on trust and does not verify the validity of exchanged

data, leading to signiﬁcant conﬁdentiality and avail-

ability risks as highlighted in the OECD Routing Se-

curity paper of 2022 (OECD, 2022). Efforts are un-

derway to address these systemic ﬂaws, with promis-

ing technologies like RPKI (Bush and Austein, 2013)

employing cryptographic signatures to mitigate ori-

gin hijacking risks. Other initiatives such as BGPsec

(Lepinski and Sriram, 2017) and SCION (Rustignoli

and de Kater, 2022) tackle BGP path hijacking risks

but encounter their own challenges (Durand, 2020).

Examples of outages: In February 2008, a

Pakistani network operator mistakenly announced

YouTube’s IP addresses via BGP, resulting in a two-

hour global service blackhole (Hunter, 2008). These

announcements, intended for internal use only, were

leaked to their upstream provider and subsequently

propagated throughout the entire internet.

In June 2015, Telecom Malaysia leaked 179,000

preﬁxes to Level3, causing a signiﬁcant volume of

trafﬁc to traverse Telecom Malaysia’s backbone, lead-

ing to network overload, severe packet loss, and inter-

net slowdown worldwide (Toonk, 2015).

The deﬁciencies in BGP have also been exploited

maliciously. In August 2020, AS209243 announced

the IP addresses of a critical smart contract user inter-

face for the Celer Bridge cryptocurrency exchange.

The attacker obtained authorized HTTPS certiﬁcates

and reportedly stole a total of USD 234,866.65 worth

of various cryptocurrencies (The SlowMist Security

Team, 2022; Kacherginsky, 2022).

2.5 Cloud Layer

Today, numerous services are delivered through var-

ious cloud providers, ranging from on-premises so-

lutions to Infrastructure as a Service (IaaS), Plat-

form as a Service (PaaS), and Software as a Service

(SaaS). The level of risk varies depending on the ex-

tent of responsibility transferred from the NOC to

the dedicated teams of the service providers. How-

ever, it’s important to assess the Return on Security

Investments (RoSI) considering the costs involved.

ISO27017 (ISO, 2015) provides a speciﬁc code of

practice for securing Cloud Services. Cloud-related

risks also extend to supporting services such as email

systems, documentation systems, and customer man-

agement systems.

During an outage at a major cloud provider, the

impact can be severe, leaving the NOC with little to

do but wait. To mitigate cloud risks, systems can

be distributed across multiple cloud providers and

failover protocols can be implemented.

Examples of outages: In December 2021, Ama-

zon Web Services (AWS) experienced a signiﬁcant

outage in their IaaS service, causing disruptions to nu-

merous dependent services (Goovaerts, 2021; AWS,

2021).

In October 2022, the Cloudﬂare Content Delivery

Network (CDN) cloud service suffered an outage due

to a software bug, resulting in a failure rate of around

5% for over six hours (Graham-Cumming, 2022).

2.6 Applications Layer

Application risks arise from both internally devel-

oped applications and those developed by third par-

ties. To mitigate risks associated with third-party ap-

plications, thorough sandbox testing and duplication

strategies are employed for critical services.

Ensuring well-written applications with minimal

software errors and effective error handling is crucial

for reducing availability risks. While conﬁdentiality

risks are beyond the scope of this document, it’s worth

noting that breaches in conﬁdentiality can also impact

availability. ISO27002’s Clause 14 provides recom-

mended controls for secure application development

and service protection.

Application-based redundancy can be imple-

mented to safeguard the service from signiﬁcant out-

ages at lower layers. In such cases, if the primary

backend service fails, the application can utilize a sec-

ondary backend service.

An example of an application causing availability

issues is the Facebook outage in 2021, which resulted

from a software bug and potentially led to signiﬁcant

A 10-Layer Model for Service Availability Risk Management

719

revenue losses in the tens of millions (Integrated Hu-

man Factors, 2022).

2.7 Services Layer

The services provided by the organization are what

customers ultimately experience. These services de-

pend on all underlying layers and may also depend on

purchased services. Mitigation measures are imple-

mented at lower layers to minimize service outages.

Customer contracts often include Service Level

Agreements (SLAs) that deﬁne expected availability.

If the sold service has a better SLA than the purchased

service, risk mitigation is necessary. SLA levels can

vary widely, ranging from 99% to 99.999% uptime

per year. SLAs are addressed in ISO27002’s Clause

18.

While planned maintenance is typically exempt

from SLA contracts, it still impacts availability and

requires mitigation. Risks may also arise from fail-

ures of subcontracted supporting services, such as

payment services. Using redundant services can re-

duce risk but increases costs.

The root DNS service exempliﬁes a highly critical

service with a resilient design. It is distributed across

independent servers, avoiding dependency on any sin-

gle entity. Even during heavy DDoS attacks (ICANN,

2007), the DNS service remained robust and did not

signiﬁcantly disrupt internet trafﬁc.

An undisclosed root cause led to the September

2022 Zoom outage, causing the Video Conferencing

service to be unavailable and resulting in numerous

failed video meetings (Goyal, 2022; Silberling, 2022;

Zoom, 2022).

2.8 Organizations Layer

The quality of service delivery relies heavily on

the organization itself. A positive company culture,

strong policies, and employees who adhere to those

policies can signiﬁcantly reduce human errors.

Implementing a robust Information Security Man-

agement System (ISMS) with comprehensive risk

policies and effective mitigation measures is essential.

Considering the culture, policies, and certiﬁcations of

providers and peers is also important, as customers

may require adherence to standards like ISO27001 or

NIST800-53.

Furthermore, organizations may have dependen-

cies on overarching entities such as trade unions,

employer organizations, industry associations, and

Regional Internet Registries and network operators’

groups.

Examples of outages include a 10-day IT outage

in July 2022 at the UK’s largest hospital, attributed to

a lack of attention to IT security in the company cul-

ture (Thimbleby, 2022). Another instance was nation-

wide internet shutdowns in Lebanon in 2022 due to a

strike by employees of the state-owned telco, Ogero

(Barton, 2022).

2.9 People Layer

Human errors are inevitable, and a NOC must take

measures to protect the service against common mis-

takes. Implementing effective procedures and reduc-

ing stress can help mitigate this risk. It is also impor-

tant to address the risk of disloyal employees through

compartmentalization, need-based access rights, and

a strong Human Resources team.

Other people-related risks include the impact of

sick leave and employee departures, which can lead to

knowledge loss and potential exposure to competitors

or attackers. Documentation plays a crucial role in

mitigating these risks, ensuring that no individual pos-

sesses irreplaceable knowledge within the company.

Numerous signiﬁcant outages in the internet world

have been caused by human errors that went unde-

tected by control systems. Examples include the June

2022 Cloudﬂare outage (Belson, 2022), the October

2021 outage affecting Facebook, WhatsApp, and In-

stagram (Integrated Human Factors, 2022), and the

February 2017 AWS outage (AWS, 2017).

2.10 Governance Layer

The risk of breaking local regulations or national laws

is most often associated with Conﬁdentiality and In-

tegrity, but the punishments may be severe and even

cause availability outages, for instance if a court or-

ders the temporary or permanent shutdown of a ser-

vice. The ﬁnancial impact of a breach of contract

or breach of regulations, or even a customer boycott

must also be considered, as this may lead to cost cuts,

including cut of security measures.

Example of outages: When the Russian army en-

tered Ukraine, western countries deployed sanctions

towards Russian entities. On the 3. March 2022,

Cogent terminated services to Russian organisations

with 24 hours notice and stated they would turn off

all co-located equipment and prepare it to be picked

up. Lumen at the same time disconnected all their

hardware in Russia (Madory, 2022).

SECRYPT 2023 - 20th International Conference on Security and Cryptography

720

2.11 Governance Layer

Governance risks are often underestimated in risk

evaluations. These risks can arise from national

governments, central internet governance bodies like

ICANN and RIR, and centralized services such as

IRR, RPKI, and Root DNS. Critical services must be

prepared to withstand potential outages of these gov-

ernance services.

Static risks in the Governance Layer exist during

implementation, while dynamic risks involve changes

in laws and regulations. Other risks include IPv4 ad-

dress exhaustion, legal actions such as “cease and de-

sist” letters, and being blocked by governmental ﬁl-

ters or embargoes.

Failure to comply with local regulations or na-

tional laws on conﬁdentiality and integrity, may lead

to severe punishments, which again might impact

availability. Breaches can lead to legal orders for

temporary or permanent service shutdowns, ﬁnancial

penalties, and customer boycotts, potentially necessi-

tating cost cuts and reduced security measures.

Example of outage: In March 2022, following

the Russian army’s entry into Ukraine, Western coun-

tries imposed sanctions on Russian entities. Cogent

terminated services to Russian organizations with 24

hours’ notice, while Lumen disconnected their hard-

ware in Russia, causing service disruptions (Madory,

2022)

3 MODEL VERIFICATION

The efﬁciency of the 10-layer model was veriﬁed for

two different networks.

3.1 Risk Registry Analysis of Exiting

Network

To test the new 10-layer model, we were allowed

access to the risk registry from a global network

provider, and mapped all the risks that were identi-

ﬁed during their ISO27001:2013 risk discovery pro-

cess into the proposed model as well as into the

ISO27001:2013 and NIST800-53 models for compar-

ison. The risks are anonymized, but the statistics may

be published.

We see that for ISO27001, each risk maps to on

average 8.9 controls (median 8), and for NIST800-

53, each risk maps to an average of 4.8 controls (me-

dian 5). In the 10-layer model, however, only three

risks map to two layers, while all other risks maps

to a single layer. For ISO27001 and the 10-layer

model, all risks were covered, but for NIST800-53,

eight risks were not discovered by any of the sections.

The types of missed risks were Governance risks and

risks to non-production equipment like lab equipment

and equipment during transport.

3.2 Risk Discovery Process for a New

Network Service Provider

Our second veriﬁcation project uses the new 10-layer

model to discover risks associated with the implemen-

tation of a new small research network for a local re-

search organization. The network spans a metropoli-

tan area, with two sites and two separate IP transit

sessions.

The risks for this network was discovered by inter-

viewing the NOC for the new research network, using

the 10-layer model as basis. After this risk discovery

process, the ISO27001 and NIST800-53 frameworks

were brieﬂy consulted to discover any risks that were

un-noticed by the 10-layer procedure.

The result of the risk discovery was 55 risk points

across all 10 layers, out of which 48 were assigned a

mitigation plan.

The second risk discovery process, using the

ISO27001 and NIST800-53 frameworks did not re-

veal any new risk points, and the interviewees (sub-

jectively) found this process more confusing and less

straightforward than the process based on the 10-layer

model. When asked to elaborate, the subjects stated

that the risk areas were not well deﬁned when applied

to Network Availability and the 10-layer model was

easier to follow.

4 DISCUSSION

The certiﬁcation market has grown into a multi-

billion dollar industry, with standards like ISO27001,

NIST800-53, and SOC2 gaining signiﬁcant momen-

tum. However, we believe that the inherent classiﬁ-

cation in these standards may not be well-suited for

effectively managing network and service availability

risks. Relying solely on these standards for risk dis-

covery can lead to confusion, oversights, and unnec-

essary work, resulting in incomplete risk management

and employee frustration.

While none of these standards provide a manda-

tory risk discovery interview template, we propose

our 10-layer model as a suitable foundation for con-

ducting such interviews in alignment with any secu-

rity standard. This model is familiar to the Network

Operations Center (NOC) and encompasses all rele-

vant risks, making it easy to understand and facilitat-

ing classiﬁcation. By using this model, the NOC can

A 10-Layer Model for Service Availability Risk Management

721

gain conﬁdence in their ability to handle all risks ef-

fectively.

It’s important to note that mitigating every single

risk may not be necessary, but being aware of all risks

and making informed management decisions about

whether to accept or mitigate them is crucial. By con-

ﬁdently producing a comprehensive risk management

report using this model, a NOC manager can instill

trust in top management, reassuring them that the net-

work and/or service is in capable hands.

In conclusion, while existing certiﬁcation stan-

dards have their merits, our proposed 10-layer model

offers a practical and comprehensive approach to risk

discovery and management. It empowers the NOC

with a familiar framework, facilitates risk classiﬁca-

tion, and ultimately contributes to a more conﬁdent

and capable handling of network and service risks.

REFERENCES

Anderson, J. P. (1972). Computer security technol-

ogy planning study. Technical report, ANDERSON

(JAMES P) AND CO FORT WASHINGTON PA

FORT WASHINGTON.

AWS (2017). Summary of the amazon S3 ser-

vice disruption in the northern virginia (us-

east-1) region. aws.amazon.com, https:

//aws.amazon.com/message/41926/?ascsubtag=

[]vx[p]14556677[t]w[r]google.com[d]D.

AWS (2021). Summary of the AWS service event

in the northern virginia (US-EAST-1) region.

aws.amazon.com, https://aws.amazon.com/message/

12721/.

Barrett, M. (2018). Framework for improving critical in-

frastructure cybersecurity version 1.1.

Barton, J. (2022). Networks down in lebanon

as ogero workers strike. developingtele-

coms.com, https://developingtelecoms.com/telecom-

business/operator-news/13926-networks-down-in-

lebanon-as-ogero-workers-strike.html.

Belson, D. (2022). AAE-1 & SMW5 cable cuts im-

pact millions of users across multiple countries.

blog.cloudﬂare.com, https://blog.cloudﬂare.com/aae-

1-smw5-cable-cuts/.

Burgess, M. (2022). The Most Vulnerable Place on the

Internet. www.wired.com, https://www.wired.com/

story/submarine-internet-cables-egypt/.

Bush, R. and Austein, R. (2013). The Resource Public Key

Infrastructure (RPKI) to Router Protocol. RFC 6810.

C/LM - LAN/MAN Standards Committee (2000). IEEE

standard for information technology - local and

metropolitan area networks - part 3: Carrier sense

multiple access with collision detection (CSMA/CD)

access method and physical layer speciﬁcations-

aggregation of multiple link segments. IEEE Std

802.3ad-2000, pages 1–184.

Durand, A. (2020). Resource public key infrastructure

(RPKI) technical analysis.

Evang, J. M., Ahmed, A. H., Elmokashﬁ, A., and Bryhni,

H. (2022). Crosslayer network outage classiﬁcation

using machine learning. In Proceedings of the Work-

shop on Applied Networking Research, ANRW ’22,

New York, NY, USA. Association for Computing Ma-

chinery.

Freedman, D., Foust, B., Greene, B., Maddison, B.,

Robachevsky, A., Snijders, J., and Steffann, S. (2019).

Mutually agreed norms for routing security (MANRS)

implementation guide.

Goovaerts, D. (2021). Extended AWS outage disrupts

services across the globe. www.ﬁercetelecom.com,

https://www.ﬁercetelecom.com/cloud/extended-aws-

outage-disrupts-services-across-globe.

Goyal, R. (2022). Zscaler digital experience

detects outage. www.zscaler.com, https:

//www.zscaler.com/blogs/product-insights/zoom-

outage-detected-zscaler-digital-experience-zdx.

Graham-Cumming, J. (2022). Partial cloudﬂare out-

age on october 25, 2022. blog.cloudﬂare.com,

https://blog.cloudﬂare.com/partial-cloudﬂare-outage-

on-october-25-2022/.

Hinden, B. (2004). Virtual Router Redundancy Protocol

(VRRP). RFC 3768.

Hunter, P. (2008). Pakistan youtube block exposes fun-

damental internet security weakness: Concern that

pakistani action affected youtube access elsewhere in

world. Computer Fraud & Security, 2008(4):10–11.

ICANN (2007). Factsheet root server attack on

6 february 2007. www.icann.org, https:

//www.icann.org/en/system/ﬁles/ﬁles/factsheet-

dns-attack-08mar07-en.pdf.

Imbriaco, M. (2012). Downtime last Saturday. github.blog,

https://github.blog/2012-12-26-downtime-last-

saturday/.

Integrated Human Factors (2022). Facebook & instagram

outage likely caused by human error. www.ihf.co.uk,

https://www.ihf.co.uk/facebook-instagram-outage-

by-human-error/.

ISO (2002). ISO/IEC 10589:2002 Information technol-

ogy — Telecommunications and information exchange

between systems — Intermediate System to Interme-

diate System intra-domain routeing information ex-

change protocol for use in conjunction with the pro-

tocol for providing the connectionless-mode network

service. International Organization for Standardiza-

tion, Geneva, Switzerland.

ISO (2015). ISO/IEC 27017:2015 Information technology

— Security techniques — Code of practice for infor-

mation security controls based on ISO/IEC 27002 for

cloud services. International Organization for Stan-

dardization Geneva, Switzerland.

ISO (2018). ISO 31000:2018(en) Risk management —

Guidelines. International Organization for Standard-

ization, Geneva, Switzerland.

ISO (2022a). ISO/IEC 27001:2022(en) Information se-

curity, cybersecurity and privacy protection — In-

formation security management systems — Require-

SECRYPT 2023 - 20th International Conference on Security and Cryptography

722

ments. International Organization for Standardization,

Geneva, Switzerland.

ISO (2022b). ISO/IEC 27002:2022, Information security,

cybersecurity and privacy protection — Information

security controls. International Organization for Stan-

dardization, Vernier, Geneva, Switzerland, ISO/IEC

27002:2022 edition.

Kacherginsky, P. (2022). Celer bridge incident analysis.

www.coinbase.com, https://www.coinbase.com/blog/

celer-bridge-incident-analysis.

Kachold, L. (2009). Layer 8 linux security: OPSEC for

linux common users, developers and systems adminis-

trators. linuxgazette.net, https://linuxgazette.net/164/

kachold.html.

Katz, D. and Ward, D. (2010). Bidirectional Forwarding

Detection (BFD). RFC 5880.

Lepinski, M. and Sriram, K. (2017). BGPsec Protocol Spec-

iﬁcation. RFC 8205.

Madory, D. (2022). Cogent and lumen curtail operations

in russia. www.kentik.com, https://www.kentik.com/

blog/cogent-disconnects-from-russia/.

Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,

L., Sridhar, T., Bursell, M., and Wright, C. (2014).

Virtual eXtensible Local Area Network (VXLAN): A

Framework for Overlaying Virtualized Layer 2 Net-

works over Layer 3 Networks. RFC 7348.

Moy, J. (1998). OSPF Version 2. RFC 2328.

NETBLOCKS (2020). Mobile internet provides life-

line after earthquake knocks out Puerto Rico infras-

tructure. netblocks.org, https://netblocks.org/reports/

puerto-rico-earthquake-internet-outage-dAmqEDA9.

NIST (2022). Security and privacy controls for federal in-

formation systems and organizations. Technical Re-

port NIST Special Publication 800-53, National Insti-

tute of Standards and Technology, U.S. Department of

Commerce, Washington, D.C.

OECD (2022). Routing security. Number 330. The Orga-

nization for Economic Cooperation and Development,

OECD Digital Economy Papers.

Patterson, D. A., Gibson, G., and Katz, R. H. (1988). A case

for redundant arrays of inexpensive disks (RAID).

SIGMOD Rec., 17(3):109–116.

Rekhter, Y., Hares, S., and Li, T. (2006). A Border Gateway

Protocol 4 (BGP-4). RFC 4271.

Rustignoli, N. and de Kater, C. (2022). SCION Components

Analysis. Internet-Draft draft-rustignoli-panrg-scion-

components-01, Internet Engineering Task Force.

Work in Progress.

Santiago, R., de On

ıs, C. M., and Llor

ens, H. (2020). Pow-

ering life in puerto rico. NACLA Report on the Amer-

icas, 52(2):178–185.

Silberling, A. (2022). Zoom is down in a major out-

age. www.techcrunch.com, https://techcrunch.com/

2022/09/15/zoom-is-experiencing-a-major-outage/.

Taylor, S. and Wexler, J. (2003). Mailbag: OSI layer

8 - money and politics. www.networkworld.com,

https://www.networkworld.com/article/2339786/

mailbag--osi-layer-8---money-and-politics.html.

The SlowMist Security Team (2022). Truth be-

hind the Celer Network cBridge cross-chain

bridge incident: BGP hijacking. medium.com,

https://medium.com/coinmonks/truth-behind-the-

celer-network-cbridge-cross-chain-bridge-incident-

bgp-hijacking-52556227e940.

Thimbleby, H. (2022). Failing IT infrastructure is under-

mining safe healthcare in the NHS. www.bmj.com,

https://www.bmj.com/content/379/bmj-2022-

073166/rr.

Toonk, A. (2015). Massive route leak causes internet slow-

down. www.bgpmon.net, https://www.bgpmon.net/

massive-route-leak-cause-internet-slowdown/.

Trenaman, N. (2020). Downtime last Saturday.

www.ripe.net, https://www.ripe.net/ripe/mail/

archives/routing-wg/2020-February/004015.html.

Viswanathan, A., Rosen, E. C., and Callon, R. (2001). Mul-

tiprotocol Label Switching Architecture. RFC 3031.

Zimmermann, H. (1980). OSI reference model-the ISO

model of architecture for open systems interconnec-

tion. IEEE Trans. Communication (USA), COM-

28(4):425–432. IRIA/Lab., Rocquencourt, France.

Zoom (2022). Issues starting and joining meetings incident

report for zoom. status.zoom.us, https://status.zoom.

us/incidents/k7fm2j5q8lx1.

A 10-Layer Model for Service Availability Risk Management

723