Squeezing the Lemon: Using Accident Analysis for Recommendations to

Improve the Resilience of Telecommunications Organizations

Hans C. A. Wienen

, Faiza A. Bukhsh

, Eelco Vriezekolk

, Luís Ferreira Pires

and Roel J. Wieringa

Faculty of Electrical Engineering. Mathematics and Computer Science, University of Twente, The Netherlands

Dutch Authority for Digital Infrastructure, The Netherlands

Keywords:

Accident Analysis, Telecommunications, Resilience.

Abstract:

Telecommunications networks form critical infrastructure, since accidents in these networks can severely im-

pact the functioning of society. Structured accident analysis methods can help draw lessons from accidents,

giving valuable information to improve the resilience of telecommunications networks. In this paper, we in-

troduce a method (TRAM) for accident analysis in the Telecommunication domain by improving AcciMap,

which is a popular method for analyzing accidents. The improvements have made AcciMap more efﬁcient and

instructive by explicitly identifying ICT aspects of the accidents, extending the method to support the evalua-

tion of crisis organizations and introducing additional notation for feedback loops. This resulted in TRAM, a

method with a 25% improved efﬁciency over AcciMap, while also addressing ICT aspects, leading to concrete

actionable results that can help telecommunication organizations grow more resilient.

1 INTRODUCTION

Telecommunications networks are a vital part of our

society and form a critical infrastructure. Any ac-

cidents

that impact the availability of this infras-

tructure can have crippling effects on society, from

malfunctioning trafﬁc control systems to unreachable

emergency services. The consequences of these acci-

dents may be broad, ranging from ﬁnancial losses to

physical damage and damage to health. If emergency

services cannot be reached in time or cannot reach

their destination in time due to gridlock caused by

malfunctioning trafﬁc control systems, this may even

result in casualties. Having resilient telecommunica-

tions infrastructure is therefore a concern of govern-

ments and they have established enforcement agen-

cies called regulators to ensure this resilience. In the

European Union, public operators of telecommunica-

tions networks and/or services are required by law to

report any large accident that results in service un-

availability to their regulator.

In Telecommunications, accidents are often called ‘in-

cidents’, even when they are damaging. As the common un-

derstanding in accident analysis is that anything that causes

harm (e.g. to service, hardware or people) is called an acci-

dent (Leveson, 2011; Harms-Ringdahl, 2013; U.S. Depart-

ment of Energy, 2000) and (Doytchev and Szwillus, 2009),

we use the term telecommunication accident in this paper.

Just reporting accidents does not prevent new ac-

cidents. To make sure an accident is well-understood

and appropriate measures are taken to prevent the ac-

cident from recurring, a thorough analysis must be

conducted. The results of this analysis need to be

taken into account when deﬁning improvement ac-

tions for the appropriate organizations or organiza-

tional units concerned.

Accident analysis methods have been researched

and applied since at least 1941 (Heinrich, 1941). In

1997, Rasmussen introduced AcciMap (Rasmussen,

1997), which is an accident causation model and

analysis method that not only considers the techni-

cal aspects of an accident, but also the social con-

text. By also considering this aspect, the method ex-

plicitly recognizes that activities like training, human

resource management and organizational culture also

play a role in the development of an accident. The

combination of the technical and the social context is

called the socio-technical context. In 2009, Branford

introduced the Generic AcciMap Method (Branford

et al., 2009), which included a workﬂow for drawing

up AcciMap diagrams with the key players of the or-

ganization.

AcciMap has been applied in many different sec-

tors (Wienen et al., 2018), but to our knowledge not

in telecommunications. More generally, we found

few literature sources that even describe analyses of

Wienen, H., Bukhsh, F., Vriezekolk, E., Pires, L. and Wieringa, R.

Squeezing the Lemon: Using Accident Analysis for Recommendations to Improve the Resilience of Telecommunications Organizations.

DOI: 10.5220/0012562900003690

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 26th International Conference on Enterprise Information Systems (ICEIS 2024) - Volume 2, pages 149-158

ISBN: 978-989-758-692-7; ISSN: 2184-4992

149

telecommunications incidents (Bukhsh et al., 2020).

We observed in a previous case study (Wienen et al.,

2019) that considering the socio-technical context to

telecommunication accidents is a useful addition to

the analysis of these accidents: by considering this

context, a company can identify latent factors that

can exacerbate accidents and malfunctioning coun-

termeasures that fail to inhibit accidents. We also

demonstrated that AcciMap, and more speciﬁcally the

Generic AcciMap Method, can be effectively applied

to telecommunications accidents. In (Wienen et al.,

2019), we reported on a case study on a DDOS attack

on a Telecommunications operator and we showed

that applying AcciMap to it yielded positive results.

That case study gave rise to improvements to the

method, resulting in an updated method that we called

TRAM (Telecommunications Related AcciMap). In

this paper we validate these improvements to the

method by applying TRAM to a second case study on

a telecommunications accident.

This paper is structured as follows: Section 2 dis-

cusses the background of this research, Section 3 in-

troduces the changes to AcciMap, resulting in TRAM,

Section 4 describes the steps in the method illustrated

by the application of the method in our case study,

Section 5 presents the results of the case study which

we discuss in Section 6, and Section 7 contains our

conclusions and suggestions for future research.

2 BACKGROUND

2.1 Telecommunications

The telecommunications domain is a highly technical

domain, under heavy competition and driven by tech-

nological progress as much as market forces (Meena

and Geng, 2022). Many telecommunications net-

works (most notably incumbent PTTs and cable oper-

ators) have been around for a long time and have been

going through many mergers, creating a large base

of legacy equipment and technology, which some-

times is not even compatible with each other. This

leads to speciﬁc problems that can render the network

more fragile than desirable. Telecommunications net-

works are strongly connected and issues propagate

very fast through that network (Pitsillides and Sek-

ercioglu, 2000). The domain is furthermore charac-

terized by strong competitiveness. Market pressure

leads to lower prices (Fernández and Usero, 2009)

and thus lower margins, and investment decisions are

driven by market share, putting stress on the distribu-

tion of scarce resources, e.g., where to invest money,

namely in maintenance or new services, or whether to

have cost reductions to provide cheaper services to the

customer. The decisions made in these circumstances

may have negative consequences for the robustness of

the services and they become visible during large ac-

cidents, so that these accidents may shed a light on

decisions made months or even years earlier.

2.2 Accident Analysis

Companies can have several reasons to analyze acci-

dents. A typical use case is improvement of stabil-

ity. This use case is concerned with preventing the

next accident, or at least preventing the previous ac-

cident from happening again (Underwood and Water-

son, 2013; Stringfellow, 2011).

There are three families of accident analysis mod-

els (Hollnagel, 2002; Hollnagel and Goteman, 2004):

Sequential, Epidemiological and Systemic. Sequen-

tial models describe accidents as a series of events;

epidemiological models add latent factors to the acci-

dent, which are factors that failed to prevent the acci-

dent or that aggravated the consequences of other fac-

tors; and systemic models ﬁnally add tight coupling to

the model, in which parts of the system are so tightly

linked that the propagation of errors or mistake can

go so swiftly that a run-away process may ensue or

a positive feedback loop can occur. Systemic models

start by modeling the context in which the accident

occurred and then describe the accident in terms of

organizational functions (Hollnagel et al., 2014) or a

systems theoretic model (Leveson, 2011).

We have selected an epidemiological model

(AcciMap) as a basis for the development of an ac-

cident analysis method that takes the speciﬁc needs

of the telecommunications domain into account.

3 TRAM

This section presents TRAM (the Telecom Related

AcciMap Method), which is the method for model-

ing and analyzing accidents in the telecom domain

that we developed by extending the Generic AcciMap

Method (Branford et al., 2009). This extension was a

result of our previous research (Wienen et al., 2019).

In this section, we also introduce the case study that

we performed to validate the method.

3.1 The Method

TRAM prescribes the steps deﬁned in the Generic

AcciMap Method in (Branford et al., 2009), which are

shown in Figure 1, extending them with one step that

is indicated in green. We added more features to the

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

150

Figure 1: TRAM steps (our addition to the Generic AcciMap

Method is in green).

method, which we describe in this section. In these

steps a diagram represents causal factors that con-

tributed to the accident, linked by arrows that denote a

causal connection, i.e. if A caused B, then an arrow is

drawn from A to B. The diagram is layered, denoting

different areas of inﬂuence for the company. The bot-

tom part of the diagram contains the outcomes, which

are the consequences of the accident. Figure 3 and

Figure 4 show an example.

Applying TRAM starts by studying the available

information and ﬁnding the relevant staff for the anal-

ysis workshops. Relevant staff have domain knowl-

edge of the different aspects of the accident, taking

the different functions in an organization into account.

Once these experts have been identiﬁed in the or-

ganization, they are invited to the accident analysis

workshop. In this workshop, the physical path (i.e.,

the technical failures that caused the accident), the

outcomes and the appropriate layer for analysis are

identiﬁed, after which the workshop participants iden-

tify causes and link them to effects, which are either

causes for further effects or outcomes. They ﬁll the

gaps in and check the logic of the model that repre-

sents the accident. The ultimate goal of the analysis

is for the ﬁnal model to generate enough insight so

that recommendations can be formulated.

Avoiding the Blame Game. In a previous case

study (Wienen et al., 2018), we spent a lot of time re-

assuring participants in the workshops that we would

not be assigning blame to them due to perceived or

actual errors they made. To prevent this from hap-

pening, we took time to address this issue at the be-

ginning of the workshop. Moreover, we also started

out by describing the physical events that led to the

accident. In this way, we started out with a discus-

sion about technical facts, which were perceived to

be neutral. This positively inﬂuenced the mood in the

session, after which discussing actions from partici-

pants became less of a threat.

Splitting up the Analysis. We observed that there

may be more than one phase in the development of

an accident. An accident can cause another accident

or a crisis that needs to be resolved. These differ-

ent phases can sometimes be described independent

of each other. If this becomes apparent during a work-

shop, it provides an opportunity to work in parallel:

divide the group into two subgroups that each treat

one of the phases. The results are then discussed

in the combined group, amended where necessary

and ﬁnalized. This parallellization makes the process

more efﬁcient.

Describing Crisis Management. In our previous

research we also observed that a large part of the acci-

dent’s resolution time was spent managing the crisis

that ensued after the accident itself. This led us to

apply AcciMap to crisis management, trying to an-

swer the question ‘why did it take so long to resolve

the accident’. AcciMap turned out to help there as

well. After our previous work, we added a new Cate-

gory of Cause to the Organization layer, namely Cri-

sis Management, which covers the following aspects:

(i) crisis management organization absent; (ii) unclear

mandate during crisis; (iii) unclear roles and chain of

command; and (iv) inadequate facilities.

Add the ICT Layer. Telecommunications net-

works strongly depend on ICT. Both software, hard-

ware, conﬁguration and data are relevant and sepa-

rate aspects of an IT system and they do not have

a dedicated place in the layers of the Generic Acci-

Map. Due to this strong dependence, they deserve

focus. Branford’s categories of cause only feature

hardware, which may be the most robust part of an

ICT network: software, conﬁgurations and data are

much more volatile than hardware. Hardware is only

mentioned as an aspect of Equipment and Design,

while the other aspects of that category (poor qual-

ity, defective, aging, untidy, missing or poorly main-

tained equipment or tools) strongly suggest that this

is aimed at machines and tools, not computer hard-

ware. So, these are good arguments for extending

Branford’s Categories of Cause. However, as errors

in the ICT layer lead to problems in the operational

(physical/actor) level and also inﬂuence decisions at

the organizational level, we have not added a new cat-

egory of cause to either layer, but rather introduced a

new layer with these categories of cause.

Improve Efﬁciency and Lead Time. Telecommu-

nications operators experience ﬁerce competition, so

that efﬁciency of processes is a main business driver.

Therefore, in TRAM we aimed to make the accident

Squeezing the Lemon: Using Accident Analysis for Recommendations to Improve the Resilience of Telecommunications Organizations

151

analysis as efﬁcient as possible. Most effort is put

into the workshops, due to their attendance. Minimiz-

ing the number of workshops has a large impact on

the total number of person-hours invested in the in-

vestigation. Splitting up the analysis and working in

parallel as mentioned before decreases the duration of

the workshop. Furthermore, by creating and review-

ing the diagrams ofﬂine (and between the sessions),

we were also able to improve the review process: the

analysts who run the analysis create the graphical rep-

resentation of the accident (the diagram) on their own

and send it out for review prior to the next workshop,

asking for review remarks before reconvening. The

researchers then process these remarks and have a re-

viewed version of the diagram available at the start of

the next workshop.

3.2 The Accident

In the accident we analysed to validate our method,

during routine maintenance on a power switch in a

telecom company in 2018, a system failure caused a

short interruption in the power supply. Power was re-

stored within a minute, but the interruption had a sig-

niﬁcant follow-on effect.

During the accident, the power outage caused

some parts of the network to go down. Under nor-

mal circumstances, the network can deal with such

outages since it is designed with redundant parts to

compensate for them. However, due to planned main-

tenance elsewhere in the network, the fallback links

were not active. This caused a drop in capacity and

congestion in other parts of the network.

Many companies rely strongly on machine-to-

machine communication, i.e., communication be-

tween devices that runs over the GSM network. Ex-

amples are sensors for measuring water levels in

canals and rivers that send measurements from time

to time to a central control system. When a device

cannot send information, e.g., due to a network error,

it keeps retransmitting until it ﬁnally receives a conﬁr-

mation that the information has been received by the

control system. These types of devices constitute part

of the Internet of Things (IoT) network that partially

runs over the GSM network.

The congestion caused information from the IoT

devices to be dropped, after which the devices tried

sending the information again, causing a positive

feedback loop that added even more trafﬁc to the al-

ready congested network.

The only way to stop this cascade was to block all

devices and to release them one by one once the con-

gestion was resolved. Unfortunately, the company did

not have a clear administration, so they could not eas-

ily ﬁnd which SIM cards belonged to which type of

device. Therefore, they had to disable all devices, in-

cluding regular mobile phones. This in turn caused

the roaming service to break down, so that customers

of the mobile network were not able to use the net-

work of other providers, resulting in complaints from

customers and roaming partners.

It took the company sixteen hours to completely

restore the service, with substantial ﬁnancial and rep-

utation losses.

4 STEPS AND APPLICATION

This section describes the individual steps in TRAM

alongside their application to the accident.

The method uses a predeﬁned structure for or-

dering causal factors that contributed to the accident.

This structure is drawn on a screen, whiteboard or

brown paper wall and serves as a foundation on which

the next steps are executed. The structure deﬁnes ﬁve

layers, which describe distinct types of causal factors:

1. The External Layer covers all factors external

to the organization that the organization cannot

change itself. It contains the following blocks:

(a) Government Block, covering factors like (inad-

equate) legislation or budgeting issues.

(b) Regulator Block, covering factors like (inade-

quate) enforcement of regulations.

and market pressure.

2. The Organizational Layer covers factors like in-

adequate risk management or training. These fac-

tors are under control of the organization.

3. The ICT Layer provides better visibility for the

critical role ICT plays in telecommunication, both

as an enabler and as a product or service. The

layer is one of our additions to the Generic Acci-

Map Method.

4. The Physical / Actor Layer describes immediate

actions and events leading to the outcomes, in-

cluding hardware malfunctioning and human er-

ror.

5. The Outcome Layer covers the (usually adverse)

outcomes of the accident, such as outage and ﬁ-

nancial loss through missed revenue.

4.1 Workshop Preparation

To prepare for the workshop, we studied the acci-

dent reports created by the company so that we could

understand which departments were affected by or

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

152

played a role in the accident. We invited 8 partici-

pants to the workshop, who were chosen to make sure

all relevant departments were represented in the work-

shop, as prescribed by the Generic AcciMap Method.

4.2 Workshop 1

In the beginning of the workshop, we introduced the

TRAM method to the participants.

Identify the Physical Path. As a ﬁrst step, the

physical path of the accident was identiﬁed. The

group used notation taken from the Fault Tree Anal-

ysis method (FTA) (Lee et al., 1985) for this step

(seeFigure 2. The approach in this step is to only con-

sider the causal factors that are directly linked to the

accident and that are physical events, which means

that no actions performed by actors should be con-

sidered. The purpose of this exercise is to lay down

a common image of the accident without evoking dis-

cussions about blame. Everybody agreed on the phys-

ical events and this resulted in the diagram depicted in

Figure 2. This diagram showed that the accident ac-

tually could be considered as a combination of two

accidents: a power failure and a roaming service out-

age. This prompted us to divide the participants into

two groups, which has been another improvement

we made to the Generic AcciMap Method based on

our experience from our earlier research. After each

subgroup completed their respective diagrams on ﬂip

over sheets, the diagrams were combined and the re-

sult was discussed with the entire group.

Identify Outcomes. In this step, the outcomes of

the accident are identiﬁed. Outcomes are the con-

sequences of the accident and they are mostly detri-

mental. The outcomes of this accident were customer

complaints and partner complaints, leading to extra

calls to the call center and reputation damage.

Identify Causal Factors. In this step, the Causal

Factors are identiﬁed. These are factors that directly

or indirectly contributed to the accident. The method

for identifying causal factors is to ask the partici-

pants to write down these factors on sticky notes. In

our workshop, we ﬁrst asked the participants to write

down factors without any guidance. After around 10

minutes, we gave them a list of categories they could

use to jog their memory. TRAM uses the categories of

cause from the Generic AcciMap Method. We added

Crisis Management as an extra category since crisis

management played a crucial role in the validation of

Table 1: Categories of cause for identiﬁcation of causal fac-

tors.

Government

Financial Issues

Communication & Information

Risk Management

Training

Actor Activities and Conditions

Human Resources

Physical events, Processes and Conditions

Regulatory Bodies

Equipment & Design

Auditing & Rules Enforcement

Manuals and Procedures

Crisis Management

Society

Defences

Organizational Culture

our previous research. Actually, it turned out that cri-

sis management played no role in this particular acci-

dent.

Identify Appropriate Layer. In this step, the ap-

propriate layer for the causal factor is identiﬁed by

sticking the notes to the appropriate places on the

brown paper wall. This is a physical activity that in-

vites the audience to already discuss the meaning of a

factor, increasing the quality of the factor.

Insert Causes and Links. In this step, a ﬁrst draft

version of the AcciMap diagram is built by model-

ing the causal links between the different factors, e.g.,

factor A caused factor B, or factor C was one of the

contributors to factor D. After this step, the workshop

was ﬁnalized. The researchers then created the ﬁnal

diagrams (Figures 3 and 4) based on the results of the

workshop.

Positive Feedback Loop. During the meeting, we

identiﬁed a positive feedback loop, as mentioned in

Section 3.2. We introduced new notation to represent

this loop and for breaking it, which we discuss in Sec-

tion 5.1.

4.3 Workshops 2 and 3

Fill Gaps and Check Logic. In this step, the re-

searchers presented the diagram they produced ofﬂine

based on the draft version created in the ﬁrst work-

shop and discussed it with the participants. The aim

of this discussion was to verify the relations between

Squeezing the Lemon: Using Accident Analysis for Recommendations to Improve the Resilience of Telecommunications Organizations

153

Figure 2: The physical path of the accident.

the causal factors and the completeness of the dia-

gram. As a result, we changed some terminology and

rewrote part of the diagram to more adequately reﬂect

what happened. At the end of this step, we had iden-

tiﬁed 56 causal factors, resulting in the diagrams in

Figures 3 and 4.

Formulate Recommendations. After completing

the diagram, recommendations for each causal fac-

tor are formulated. This is done by considering how

to (i) prevent the causal factor from happening, com-

pletely preventing the consequence, and/or (ii) control

the causal factor during its development, diminishing

the consequence, and/or (iii) compensate for the con-

sequence of the causal factor.

We posed these questions for each causal factor,

resulting in a total of 60 recommendations.

5 RESULTS AND FINDINGS

As a result of the workshops, we drew one diagram.

Due to the size of the diagram, we split it up into

two parts, one (Figure 3) describing the power outage

and another (Figure 4) describing the roaming failure.

Due to company conﬁdentiality, we are not allowed

to share the exact causal factors and use numbers in-

stead. The triangles are connectors which show where

the two parts need to be connected to form the com-

plete diagram of the accident.

Figure 3 shows the TRAM diagram of the power

failure. The newly added ICT layer only contains 1

causal factor, hardly justifying a layer of its own. One

causal factor (#13) can be included in two Categories

of Cause. One causal factor has a modest number

(5) of incoming links (#16). This may suggest that

it played a central role in the development of the ac-

cident, however, it is only one of four factors that link

the organization layer to the physical layer. This im-

plies that there is no clear central point of failure.

Figure 4 shows the TRAM diagram of the roaming

failure. One part of the diagram stood out: we found

a positive feedback loop which we discuss in the se-

quel. The roaming failure itself (#23) is the result of

a relatively simple cascade combined with the feed-

back mechanism. However, many factors contributed

to the long lead time of the accident (#25), which in-

dicates that solving this consequence may take a lot

of effort in different parts of the organization. The

ICT layer is much more populated that in the power

outage, namely 17 causal factors versus 1. This indi-

cates that the ICT environment played a large role in

the development (and the prolonged duration) of the

incident and the underlying data conﬁrms this.

5.1 Feedback Loops

As part of the analysis of the roaming failure, we

discovered a positive feedback loop that exacerbated

the accident in the Company’s network. To indicate

this loop, we introduced additional notation: the loop

itself can be represented by joining the constituting

causal factors with arrows. However, in order to re-

solve the accident, an operator had to break the loop.

The link that was broken to stop the loop is indicated

with a valve symbol (▷◁). The action breaking that

link is indicated with a causal factor box (β) with a

pointer to the valve. This reparative action had its

own consequences, leading to the outcomes under γ.

Since breaking the loop was part of the resolution of

the accident (and not of the accident itself), we indi-

cate these factors with Greek letters.

5.2 ICT Layer

For the Power Outage, the addition of the ICT Layer

did not provide extra insight. For the Roaming Fail-

ure, however, the addition was very instructive, since

a large part of the diagram is related to that layer. This

also gave us the opportunity to formulate new cate-

gories of cause for this layer.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

154

Figure 3: TRAM diagram of the power outage. Note that causal factor 13 has two colors: it ﬁts in both categories. The

triangles labeled A, B and C are the connections to the TRAM diagram in Figure 4. The factors in both diagrams have been

numbered independent of one another. Number 13 in this diagram has no relation with number 13 in Figure 4.

Table 2: Categories of Cause for the ICT Layer.

Software Network

Hardware Conﬁguration Management

System Management

5.3 Recommendations

For the power outage, we identiﬁed 38 recommenda-

tions. For the roaming service outage, we identiﬁed

22 recommendations. Tables 3, 4 and 5 provide more

detail about the causal factors and recommendations

we identiﬁed in our analysis. Branford’s Categories

of Cause (Branford et al., 2009) proved to be appro-

priate for our causal factors. Our proposed category

Crisis Management is absent, as this accident did not

call for an analysis of the crisis organization. The res-

olution was smooth, albeit time consuming, and did

not call for crisis management.

Table 3: Number of identiﬁed causal factors and recom-

mendations per stage and AcciMap level (Ext: External;

Org: Organizational; P/A: Physical/Actor).

Org Ext ICT P/A

Causal Factors

Power Outage 2 14 1 7

Roaming Failure 0 6 17 1

Recommendations

Power Outage 3 23 3 9

Roaming Failure 0 5 19 1

5.4 Effort and Lead Time

The total effort invested in the analysis was 114 per-

son hours (see Table 7), which is 53% of the effort

used for our previous research. This is at least in part

due to the efﬁciency gains we have introduced. This

brought the number of workshops and meetings down

from 5 to 3. If we measure the complexity of the acci-

dent analysis by the ratio of causal factors and causal

links between the two accidents, the improvement still

Squeezing the Lemon: Using Accident Analysis for Recommendations to Improve the Resilience of Telecommunications Organizations

155

Figure 4: TRAM diagram of the roaming failure.

Table 4: Recommendations per category of cause for the

power outage (Org: Organizational; P/A: Physical/Actor).

Category of Cause Org ICT P/A

Financial 4 0 0

Equipment & Design 3 0 3

Defences 0 0 1

Communication & Information 0 0 1

Auditing & Rules Enforcement 0 0 0

Risk Management 9 0 0

Manuals & Procedures 6 0 4

Training 1 0 0

Software 0 0 0

Hardware 0 0 0

Conﬁguration & Management 0 3 0

Systems Management 0 0 0

Network 0 0 0

Data 0 0 0

holds. If n

c f ,x

denotes the number of causal factors in

case study x and n

cl,x

the causal links, and with Q

the

ratio between the numbers of causal factors or causal

links (y), we have:

c f

c f ,2

c f ,1

= 0.69; Q

cl,2

cl,1

= 0.72

This would imply that case study 2 was about 30%

Table 5: Recommendations per category of cause for the

roaming failure (Org: Organizational; P/A: Physical/Actor).

Category of Cause Org ICT P/A

Financial 0 0 0

Equipment & Design 1 0 1

Defences 0 0 0

Communication & Information 0 0 0

Auditing & Rules Enforcement 0 0 0

Risk Management 1 0 0

Manuals & Procedures 3 0 0

Training 0 0 0

Software 0 0 0

Hardware 0 3 0

Conﬁguration & Management 0 4 0

Systems Management 0 9 0

Network 0 3 0

Data 0 0 0

less complex than case study 1, but took 47% less

time. Correcting for the lower complexity by mul-

tiplying the time spent on case study 1 (t

) by 0.7 (the

factor by which accident 2 is less complex), this still

yields an improvement of 25%:

1,corr

= 0.7 · t

= 151 hrs;

114

151

= 0.75

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

156

Table 6: Relevant measurements from the ﬁrst case study as

reported in (Wienen et al., 2019) and this case study; t de-

notes the time invested in the case study; n

c f

is the number

of causal factors and n

the number of causal links. For the

time invested in case study 2, See also Table 7.

case study t n

c f

1 216 81 95

2 114 56 68

6 DISCUSSION

We introduced three new elements when designing

TRAM: the ICT layer (with its Categories of Cause),

the Physical Path and the Crisis Management Cat-

egory of Cause. We also introduced efﬁciency im-

provements. During the execution of the current case

study, we observed that the addition of the ICT layer

provided more insight into the accident. The high

number of causal factors show that ICT aspects play

a very important role in the roaming failure.

The addition of the ICT layer gave us the oppor-

tunity to formulate new Categories of Cause for this

layer. These categories are now determined by the

accident we analyzed. They were sufﬁcient and not

too detailed for this accident. Based on our ﬁndings,

we anticipate they may prove equally useful for other

accidents. If this list is adequate or if it needs to be

adapted is a subject for future research.

Starting with the physical path helped in two as-

pects: it showed that the accident was actually a com-

bination of two accidents, enabling us to split up the

group and it set the scene for a constructive discus-

sion, since the group agreed on the physical steps for

the accident and thus had an early success. The dis-

cussions during the rest of the workshops were fo-

cused on substance and we only had to direct the dis-

cussion away from blame a small number of times.

This accident did not feature the Crisis Manage-

ment Category, so we were not able to assess its use-

fulness.

During the analysis, we identiﬁed a behaviour pat-

tern that TRAM had not yet taken into account: the

positive feedback loop. A result of the current case

study is additional notation to represent this pattern.

The efﬁciency measures (splitting the groups, do-

ing ofﬂine work) had a positive result on the effort put

into the analysis, since it took 25% less effort than

our previous research after correcting for complexity.

However, this conclusion is only based on the differ-

ence in these two case studies. More analyses need to

be done to get statistically signiﬁcant results.

Experts’ Opinion

The company concluded that “This method is a good

method for accident analysis in the Telecommunica-

tions domain and that it contributes to ﬁnding im-

provements. The case study has led to a number of

changes in the company, under which the planning of

an annual Business Continuity Management (BCM)

test. The discussions in making the diagram itself

were essential in formulating the right improvements.

The explicit principle of no blame has led to the com-

plete set of insights and improvements.”

7 CONCLUSION

We performed this case study to validate the changes

we made to AcciMap when developing TRAM. Based

on this case study, we conclude that our improve-

ments to AcciMap are useful. By adopting the princi-

ple of no blame and by starting out from a pure phys-

ical perspective, the participants were able to steer

away from discussions about blame, leading to a com-

prehensive set of insights and improvements. The ad-

dition of the ICT layer led to more speciﬁc recom-

mendations in the ICT domain, which is pertinent to

telecommunications.

Splitting the group helped conducting the work-

shops more efﬁciently, as did the ofﬂine generation of

the diagram and review. This yielded an improvement

of 25% after correcting for complexity.

The recommendations from the method were ac-

tionable: several have already been implemented and

the company believes they have already led to re-

silience improvements.

We were able to improve TRAM itself as well,

since the addition of notation to represent the feed-

back loop gives more insight in runaway processes.

It provides insight into ways of breaking the loops.

Introducing these controls into the operation of an or-

ganization will enable that organization to break the

loop in a more controlled way, leading to a more ro-

bust and resilient operation.

Future research will determine whether and to

what extent we can enhance the method’s efﬁciency

even further. Gathering more data to ﬁll out the cat-

egories of cause for the ICT layer is another area to

investigate.

Additionally, creating a systematic way to prior-

itize recommendations will help companies plan im-

provement programs more effectively. Recommenda-

tion prioritization is another open research direction.

The results of this research are limited by the vali-

dation being based on one case study, which makes it

Squeezing the Lemon: Using Accident Analysis for Recommendations to Improve the Resilience of Telecommunications Organizations

157

Table 7: Time invested in this case study was only 53% of the time invested in our previous case study: 114 person hours

versus 216 (Wienen et al., 2019).

activity duration (hrs) participants person hours

Company Researchers

workshop I 4 8 2 40

workshop II 4 7 2 36

workshop III 3 5 1 18

preparations 4 - 2 8

reporting 12 - 1 12

total 114

harder to generalize the conclusions. Another limita-

tion is that it is hard to compare accidents of different

complexities. We have chosen to measure the com-

plexity by comparing the numbers of causal factors

and causal links. Future research may show whether

this is an appropriate way of quantifying the complex-

ity of accidents.

REFERENCES

Branford, K., Naikar, N., and Hopkins, A. (2009). Guide-

lines for accimap analysis. In Hopkins, A., editor,

Learning from High Reliability Organisations. CCH

Australia Ltd.

Bukhsh, F. A., Vriezekolk, E., Wienen, H. C. A., and

Wieringa, R. J. (2020). Availability incidents in the

telecommunication domain: A literature review. Tech-

nical report, DSI.

Doytchev, D. E. and Szwillus, G. (2009). Combining task

analysis and fault tree analysis for accident and inci-

dent analysis: A case study from bulgaria. Accident

Analysis & Prevention, 41(6):1172—1179.

Fernández, Z. and Usero, B. (2009). Competitive behavior

in the european mobile telecommunications industry:

Pioneers vs. followers. Telecommunications Policy,

33(7):339–347.

Harms-Ringdahl, L. (2013). Guide to safety analysis for ac-

cident prevention. IRS Riskhantering AB, Stockholm.

Heinrich, H. W. (1941). Industrial accident prevention: a

scientiﬁc approach. New York & London: McGraw-

Hill Book Company, Inc., ﬁrst edition edition.

Hollnagel, E. (2002). Understanding accidents-from root

causes to performance variability. In Proceedings of

the IEEE 7th Human Factors Meeting.

Hollnagel, E. and Goteman, O. (2004). The functional res-

onance accident model. Proceedings of cognitive sys-

tem engineering in process plant, 2004:155—161.

Hollnagel, E., Hounsgaard, J., and Colligan, L. (2014).

FRAM - the Functional Resonance Analysis Method :

a handbook for the practical use of the method. Centre

for Quality, Region of Southern Denmark.

Lee, W. S., Grosh, D. L., Tillman, F. A., and Lie, C. H.

(1985). Fault tree analysis, methods, and applica-

tions - a review. IEEE Transactions on Reliability,

R-34(3):194—203.

Leveson, N. G. (2011). Engineering a Safer World: Systems

Thinking Applied to Safety. Engineering Systems.

Meena, M. E. and Geng, J. (2022). Dynamic competition in

telecommunications: A systematic literature review.

SAGE Open, 12(2):21582440221094609.

Pitsillides, A. and Sekercioglu, A. (2000). Congestion con-

trol. In Computational Intelligence in Telecommuni-

cations Networks, pages 109–158. CRC Press.

Rasmussen, J. (1997). Risk management in a dynamic so-

ciety: A modelling problem. Safety Science, 27(2-

3):183–213.

Stringfellow, M. V. (2011). Accident Analysis and Hazard

Analysis for Human and Organizational Factors. PhD

thesis, Massachusetts Institute of Technology.

Underwood, P. J. and Waterson, P. E. (2013). Accident anal-

ysis models and methods: guidance for safety profes-

sionals. Technical report, Loughborough University.

U.S. Department of Energy (2000). Conducting accident

investigations - revision 2.

Wienen, H. C. A., Bukhsh, F. A., Vriezekolk, E., and

Wieringa, R. J. (2018). Learning from accidents: A

systematic review of accident analysis methods and

models. International Journal of Information Systems

for Crisis Response and Management (IJISCRAM),

10(3):42–62.

Wienen, H. C. A., Bukhsh, F. A., Vriezekolk, E., and

Wieringa, R. J. (2019). Applying generic accimap to

a ddos attack on a western-european telecom opera-

tor. In Proceedings of the 16th ISCRAM Conference,

pages 528–535.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

158