SICAEN: A NEW METHOD TO DETERMINE THE IMPACT

OF SEVERE NETWORK FAULTS ON BASIC

TELECOMMUNICATION SERVICES

Andrés Cancer

, Cristina del Campo

and Carlos Gascón

Telecom NGOSS, Indra, Madrid, Spain

Department of Estadística e Investigación Operativa II, Universidad Complutense de Madrid, Madrid, Spain

Keywords: Severe Network Faults, Customer Care, Service Management, Impact Evaluation.

Abstract: Despite the effort that has been carried out in the last two decades, there is still a huge gap between the

information that network management systems can provide to identify and solve network problems, and the

information they offer to determine the actual impact of these problems on basic telecommunication

services. This paper presents a new method (called SICAEN) to identify and characterize service impact

incidents using network resource unavailability information as an input. Most of the previously done work

tries to identify the root cause of a failure, while SICAEN is concerned with the actual impact of the failure,

from a user (service) perspective. The method performs impact evaluation in a per-service basis and has

been successfully applied in real world in the context of Telefonica’s IMPACTA project, whose goal is to

determine the impact of severe network faults on mobile basic services for the Spanish biggest mobile

operator.

1 INTRODUCTION

A big effort has been carried out in the last two

decades in order to define the structure of

telecommunication network faults and their

treatment by a network fault management system. In

fact, during these last two decades

telecommunication service providers, network

equipment providers, software vendors and system

integrators have tried to define and standardize the

structure and the management of network faults.

Some of the most widely accepted proposals in this

area are:

 ITU-T X733 Alarm Reporting Function

(ISO/IEC 10164-4).

 3GPP Alarm Integration Reference Point

(3GPP) .

 OSS/J Quality of Service API (Åberg, 2002)

and Fault Management API (Raymer and

Flauw, 2007).

 Multi-Technology Operation System Interface

(MTOSI) Release 1.0 (TMFORUM-MTOSI).

These proposals define management solution

sets, but each of them has a different background

since they evolve from working groups related to

specific network technologies. This is the case of

3GPP, focused on mobile networks, and MTOSI, on

transport technologies, while others evolve from

solutions for fixed networks from the 90’s as is the

case of ITU-T, the former CCITT, and OSS/J who

tries to define a J2EE standard for network

management. All of these proposals are roughly

equivalent and recently some efforts have been made

to integrate them (Raymer, 2004).

Nowadays, there is a wide range of commercial

operation support systems that implement, to a

certain extent, some of the aforementioned

proposals. This adoption has made it easy to achieve

system interoperability and, what is more, to

improve network oriented management processes

whose goal is to identify and solve network issues

and events.

Nevertheless, there is still a wide gap between

the information operation support systems provide to

identify and solve network problems, and the

information they can offer to determine the actual

impact of these problems on basic

telecommunication services. In fact most of service

impact analysis is done manually, based on the

knowledge of the network topology provided by the

maintenance experts. It must be also noticed that,

154

Cancer A., del Campo C. and Gascón C..

SICAEN: A NEW METHOD TO DETERMINE THE IMPACT OF SEVERE NETWORK FAULTS ON BASIC TELECOMMUNICATION SERVICES.

DOI: 10.5220/0003726601540163

In Proceedings of the 4th International Conference on Agents and Artiﬁcial Intelligence (ICAART-2012), pages 154-163

ISBN: 978-989-8425-96-6

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

although the information related to a service impact

is based on network data, it has an independent life-

cycle and scope, so the raw network data must be

processed to obtain service impact information. This

detailed service impact knowledge is essential to

prioritize repairing actions on severe network faults,

to take new network planning decisions and, more

generally, to achieve higher levels of service and, by

extent, of customer satisfaction. It must be noted that

the TeleManagement Forum (TM Forum) clearly

states (see TMFORUM-eTOM and TMFORUM-

TAM) that the billing/invoice service must be feed

with information about service impacts in order to

trigger the appropriate customer care actions if

appropriated (v.g. SLA violation treatment). This is

also enforced by law in some EU countries, like

Spain (see Real Decreto 424/2005).

In order to fill this gap, the first issue to be dealt

with is the lack of a clear and widely accepted

definition for service impact incidents. These

incidents have a complete life-cycle with a well

defined start and end point but, in opposition to

standard network faults, their nature and their scope

can vary meaningfully over time, presenting

different intermediate states. Therefore, to be able to

capture all the data needed to track and to record the

service impact incident life-cycle, a more complex

definition and structure for service faults is needed.

The TeleManagement Forum has recently launched

a working group (TMFORUM-NGOSS) to define all

these items.

This paper presents a new method that has been

developed to identify and characterize service

impacting incidents using network resource

unavailability information as an input. The method

performs impact evaluation in a per-service basis

and has been identified with the name SICAEN. The

method has been successfully applied in the context

of Telefonica’s IMPACTA project whose main goal

was to determine the impact of severe network faults

on mobile basic services (GMS voice, GPRS data,

UMTS voice, UMTS data, etc).

The rest of the paper is organised as follows:

Section 2 provides some basic definitions and

architectural assumptions. Then Section 3 presents

the different stages of the SICAEN method.

Implementation results indicating the effectiveness

of the approach are presented in Section 4, and

finally future work and conclusions are drawn in

Section 5.

2 PREVIOUS DEFINITIONS

Most of the concepts that are going to be used in the

development of this paper are well known and their

definitions can be found in the references (see for

example Jakobson and Weissman, 1993), but,

although SICAEN will be presented in the following

sections, there are certain definitions and

architectural assumptions that must be introduced in

order to fully understand the scope and mechanisms

employed.

The formerly mentioned proposals use a basic

fault definition (see ISO/IEC 10164-4, 3GPP, Åberg

2002, Raymer and Flauw 2007 and TMFORUM-

MTOSI) as a basis for their fault management

processes and interfaces. As it has already been

stated, this definition is clearly insufficient to handle

service impact incidents, so new concepts and new

structures must be defined to manage them. Along

this paper, “Service Incident” stands for a

telecommunication service unavailability affecting a

geographical area and/or a defined group of

customers. Service Incidents may evolve

dynamically, featuring a complete life-cycle, during

which they may change their affection scope, may

disappear once the affection scope is void, may be

aggregated with another Service Incident, may be

divided in two or more independent parts, etc.

Regarding the architecture, SICAEN does not

define and implement a complete OSS, but it

complements the concept of OSS with service

impact generation. Therefore SICAEN assumes the

existence an OSS solution which provides network

management facilities (i.e Fault and Configuration).

This implies that SICAEN can be deployed as an

independent module of the OSS or as a completely

independent system, accessing to the underlying

OSSs. This last configuration would be typical on a

multivendor environment since the SICAEN method

needs access to the different network events

provided by the underlying network managers.

Also, SICAEN follows the recommendations of

the TM Forum about data stewardships so it requires

the existence of a corporate repository and unified

resource identifiers. A corporate repository of the

network resources is needed in order to complete the

information provided by the underlying OSS. Also,

since the SICAEN method will integrate different

sources of events and will make use of the corporate

network repository, a unified network resource

identifier (UNRI) for every element must be

provided to be able to correlate all these sources.

This unified identifier, UNRI, will be used to

retrieve information from the corporate repository.

Finally, as it will be shown in following sections,

SICAEN method requires a rich detailed knowledge

of the role and characteristics of each element, so it

SICAEN: A NEW METHOD TO DETERMINE THE IMPACT OF SEVERE NETWORK FAULTS ON BASIC

TELECOMMUNICATION SERVICES

155

can integrate the events originated by each element

in the appropriated service impact incident.

3 SICAEN METHOD

SICAEN is an acronym formed by the initial letter

of each of the five Spanish-named stages which

constitute the method: SI for “Selección de

Indisponibilidades” (Unavailability Selection), C for

“Consolidación” (Consolidation), A for

“Agregación” (Aggregation), E for “Estabilización”

(Stabilization) and N for “Notificación”

(Notification).

In the following sections the different stages of

the method will be presented along with the

concepts that emerge from them.

3.1 First Stage: Unavailability

Selection

The first stage (see Figure 1), named “Unavailability

Selection (SI)”, collects unavailability data from

network elements and network element managers

and selects any alarm, state change event or operator

action record representing a total unavailability of a

network resource or the end of the unavailability

condition. As an example, in the context of

IMPACTA project, aimed to identify the impact of

severe network faults on mobile basic services, the

method selects and catalogues unavailability

network resource conditions and restorations.

Although the list is not exhaustive, some of the most

relevant conditions are:

 GSM cell service unavailable.

 GSM cell service available.

 GSM cell manually locked.

 GSM cell manually unlocked.

 BSC function unavailable.

 BSC function available.

 MSC function unavailable.

 MSC function available.

 UMTS cell service unavailable.

 UMTS cell service available.

 UMTS cell manually locked.

 UMTS cell manually unlocked.

 HLR function unavailable.

 HLR function available.

 SGSN function unavailable.

 SGSN function available.

Each of these un/availability conditions can be

generated by a variable range of alarms, events or

manual actions extracted from log records. This

range differs from one network technology to

another. SICAEN method first stage (SI) transforms

(see Figure 1) alarms, state change events and

records into catalogued un/availability conditions

and forwards them to the next stage as a new kind of

notification called Service Potential Incidents (SPI).

These notifications can be firing notifications (SPI-

F) or clearing ones (SPI-C) and they convey

information regarding the un/availability condition

such as its type, the starting and/or ending time and

the affected network resource identity using the

aforementioned unified network resource identifier

UNRI (see Section 2).

This transformation process may be a simple

mapping of the received events. That is to say, just

an event acquisition from some kind of event source

followed by a network resource identity resolution.

This kind of schema can be implemented in a quite

straight forward way, although the SPI can be

defined as a complex combination of simple events

and a more sophisticated approach may be needed

using techniques such as correlation engines as

described in (Forgy, 1979 and 1982) in order to

produce the SPI.

Figure 1: SICAEN First Stage: Unavailability Selection.

It must be noted that multiple “Unavailability

Selectors”, each of them using different sources, can

be employed simultaneously to feed the next stage.

This may lead to duplicated SPIs for a given

network resource, but it may provide a simpler way

to integrate different sources or even to provide

some resilience to failures on the fault reporting

functions. This event redundancy must be dealt by

larms

State Change Events

Operator Action

Records

SICAEN Method

Second Stage: Consolidation (C)

Service Potential Incidents (SPI)

{

Firing (SPI-F)/Clearing (SPI-C),

Incident Type,

Starting Date-Time,

Ending Date-time,

ffected Network Resource Id,

….

}

SICAEN Method First Stage:

Unavailability Selection (SI)

ICAART 2012 - International Conference on Agents and Artificial Intelligence

156

the second stage “Consolidation”.

3.2 Second Stage: Consolidation

In the second stage (see Figure 2), named

“Consolidation (C)”, the system checks the recorded

state of the network resource referenced by the

incoming SPI against the state reported by this SPI.

If there is no change in the state of the network

resource, then the SPI may be discarded since the

received SPI will not affect the state of the network

resource. This discard process works as follows:

 If an SPI-C is received for a non faulty

network resource, the SPI-C is discarded and

regarded as an internal error (In fact, the error

is: There is not an existing SPI active in the

network resource).

 If an SPI-F is received and is not already

attached to the network resource, then it is

attached and the “life-count” of the network

resource is increased. The existence of

multiple “Unavailability Selector” from the

previous stage can lead to multiple path

detection for SPI.

An SPI-C can still be not treated, even when the

referenced network resource is in a faulty state, if

the “life-count” for the network resource is bigger

than one. This represents that the SPI-C is just

deleting a path, but there are still other SPI for the

network resource.

This “life-count” mechanism allows to

coordinate multiple “Unavailability Selector” in a

very simple and straightforward way, although it

should be noted that some sort of synchronising

protocols between this second Consolidation stage

and all the instances of the previous stage

(Unavailability Selection) must be implemented.

Once it has been stated that the received SPI is

relevant and that it must be treated, the method

updates the network resource state, recording the

date and the time of the state change. Then, it

searches through the Network Inventory and the

Service Inventory to identify the basic services

affected by the network resource referenced by the

SPI. As soon as these services have been identified,

this stage forwards them to the next stage

(Aggregation), generating as many Service-Specific

Potential Incidents (S-SPI) as basic services have

Figure 2: SICAEN Second Stage: Consolidation.

Network-Resource

State Checking

Service Searching

Service Inventory

Service Potential Incident (SPI)

Network Inventory

SPI Discarding

SICAEN Method Third Stage:

Aggregation (A)

SICAEN Method

Second Stage: Consolidation (C)

Service-Specific Potential Incidents (S-SPI)

{

Firing (S-SPI-F)/Clearing (S-SPI-C),

Incident Type,

Affected Service Identity,

Starting Date-Time,

Ending Date-time,

Affected Network Resource Id,

….

}

SICAEN: A NEW METHOD TO DETERMINE THE IMPACT OF SEVERE NETWORK FAULTS ON BASIC

TELECOMMUNICATION SERVICES

157

been found. Together with the information conveyed

by the SPI, each of the S-SPI generated by the

second stage holds the affected service identity.

Therefore, this second stage performs a first

unification of the network data, in a more complete

approach than the pre-processing proposed in Li and

Li (2011). It also provides more flexibility when the

method is deployed, since the existence of several

“Unavailability Selection (SI)” stages may be

necessary in order to simplify their internal

architecture or to provide redundancy.

3.3 Third Stage: Aggregation

The third stage, named “Aggregation (A)”, is the

key stage in the SICAEN Method. This stage

determines whether each of the S-SPI generated by

the second stage should indeed be tailored as a

Service Incident (SI). This analysis is performed in a

per-service basis (see Figure 3).

3.3.1 Definitions

SIs are the final output generated by the method,

and, as it was stated before, they represent actual

service unavailability situations affecting a

geographical area and/or a defined group of

customers, where each SI concerns a specific basic

telecommunication service. However, SIs are not

static entities: they feature a complete life-cycle,

with a starting point, an ending point and an

indefinite number of changes (mutations) in

between. The authors have chosen the term

“mutation” as the changes can be very slight or

transient or lead, on some occasions, to evolutionary

leaps in the SI, mimicking the mutations on life

organisms. These mutations will be filtered on the

consolidation phase in order to provide an ordered

view of the SI evolution.

A Service Incident Mutation (SIM) may concern

the incident type (for example, an incident that was

originally perceived and catalogued as a “GSM cell

unavailability”, may be later perceived and

catalogued as a “BSC function unavailability”) or it

may concern the incident scope, i.e. the collection

of network resources affected by the incident

(throughout its life cycle, the incident may involve

different collections of network resources).

SICAEN identifies outgoing SI by a unique

Figure 3: SICAEN Third Stage: Aggregation.

Aggregation analysis

Service “A”

Aggregation analysis

Service “B”

Aggregation analysis

Service “S”

Service-Specific Potential Incidents

(S-SPI)

{

….

ffected Service ID = “A”

….

}

Service-Specific Potential Incidents

(S-SPI)

{

….

ffected Service ID = “B”

….

}

Service-Specific Potential Incidents

(S-SPI)

{

….

ffected Service ID = “S”

….

}

SICAEN Method Fourth Stage:

Stabilization (E)

Service Incident Mutations (SIM)

{

GFID-SFID

Firing/Clearing

Incident Type

Affected Service ID = “B”

Starting Date-Time

Ending Date-Time

Affected Network Resource List

….

}

SICAEN Method

Third Stage: Aggregation (A)

ICAART 2012 - International Conference on Agents and Artificial Intelligence

158

numerical identifier, named “Global Fault Identifier”

(GFID), and each of the comprising SIMs by the

concatenation of GFID and a second mutation-

specific numerical identifier named “Specific Fault

Identifier” (SFID). Like SPI and S-SPI, SIMs may

be firing notification (SIM-F) or clearing ones (SIM-

C). Firing notifications denote that a new SI state

has been reached while clearing notifications

indicate the SI is no longer at the state notified by

the mutation.

Each SIM captures relevant aspects of the

change suffered by the SI it belongs to. Those

aspects include the type of the incident (it may be

different from one mutation to the following one),

the collection of network resources affected by the

incident just after the mutation took place, and the

time and date of the mutation. The affected service

identity does not change throughout the service

incident life-cycle.

The SI life-cycle is composed by the complete

sequence of SIM from its rise to its demise. As an

example, some of the SIMs generated by SICAEN

method for a SI with GFID=214 are shown in the

table below. Table 1 shows some of the thirty six

mutations throughout its life-cycle, affecting service

S7, where R1, R2, R3…, Rn are network resources.

As it can be seen in Table 1, SIM with SFID = 0

is a special mutation: it flags the start and the end of

the whole SI.

3.3.2 Functional Behaviour

To compute the net effect of each incoming S-SPI

on an ongoing SI, SICAEN Aggregation stage uses

correlation techniques based on network topology,

i.e. based on functional dependencies between

network resources, or, failing these (when network

topology information is not enough, inaccurate or

non-existent), based on spatial-temporal proximity

criteria. Liu et al. (2008) presents a similar proposal.

An incoming firing Service-Specific Potential

Incident (S-SPI-F) may produce different results

after being processed by the Aggregation stage:

 A new SI is generated.

 A new SIM for ongoing SI is created.

 Two or more SIs are aggregated into only one

SI encompassing all of the network resources

from the aggregated SI.

In this case, the resultant SI retains the identity

(GFID) of the oldest one being aggregated.

The remaining aggregated SIs are closed.

 No effect on ongoing SIs.

Similarly, an incoming clearing Service-Specific

Potential Incident (S-SPI-C) may also produce

different results:

Table 1: Service Incident life-cycle example.

Time

Stamp

Mutation

Type

Service Incident Mutation

Firing

[

GFID=214; SFID=0; {R7, R19, R24}; Firing:

7:45:03 04/04/2008; Clearing: -]

[

GFID=214; SFID=1; {R7, R19, R24}; Firing:

7:45:03 04/04/2008; Clearing: -]

Clearing

Firing

[GFID=214; SFID=1; {R7, R19, R24}; Firing:

17:45:03 04/04/2008; Clearing: 17:46:07 04/04/08]

[GFID=214; SFID=2; {R7, R19, R24; R51}; Firing:

17:46:07 04/04/08; Clearing: -]

Clearing

Firing

[GFID=214; SFID=2; {R7, R19, R24; R51}; Firing:

17:46:07 04/04/08; Clearing: 17:46:46 04/04/08]

[GFID=214; SFID=3; {R7, R19, R24; R51; R55};

Firing: 17:46:46 04/04/08; Clearing: -]

… … …

T34

Clearing

Firing

[GFID=214; SFID=34; {R19, R51; R63}; Firing:

18:06:41 04/04/08; Clearing: 18:07:39 04/04/08]

[GFID=214; SFID=35; { R19, R51}; Firing: 18:07:39

04/04/08; Clearing: -]

T35

Clearing

Firing

[GFID=214; SFID=35; { R19, R51}; Firing: 18:07:39

04/04/08; Clearing: 18:08:22 04/04/08]

[GFID=214; SFID=36; { R19}; Firing: 18:08:22

04/04/08; Clearing: -]

T36

Clearing

[GFID=214; SFID=36; {R19}; Firing: 18:08:22

04/04/08; Clearing: 18:10:11 04/04/08]

[GFID=214; SFID=0; {}; Firing: 18:10:11 04/04/08;

Clearing: 18:10:11 04/04/08]

SICAEN: A NEW METHOD TO DETERMINE THE IMPACT OF SEVERE NETWORK FAULTS ON BASIC

TELECOMMUNICATION SERVICES

159

 An ongoing SI is closed.

 A new SIM for an ongoing SI is created.

 A new SI is created after a SIM has been

emitted for an ongoing SI.

 No effect on ongoing SIs.

The process to determine the effect of each

incoming Service-Specific Potential Incident (S-SPI)

on ongoing SIs is named “Aggregation Analysis”

and, as it has been noted before, it is performed in a

“per-service” basis. Therefore, both the incoming S-

SPI and the ongoing SI taken into consideration in

this analysis must reference the same basic service.

3.3.3 Details of the Aggregation Analysis

The Aggregation Analysis is based on a correlation

process that uses inference techniques as the basic

tool, and network topology (functional dependencies

between network resources for the service

concerned) as the main source of knowledge. In

short, the behaviour of this analysis for incoming

firing Service-Specific Potential Incident (S-SPI) is

as follows:

1. If the network resource referenced by the

incoming S-SPI-F is functionally dependent

on some of the network resources

encompassed by one of the ongoing SIs, then

that resource, and any other that is

functionally dependent on it, is aggregated to

the concerned SI.

2. If any of the network resources included in

the ongoing SIs are related to the network

resource referenced by the incoming S-SPI-F,

then these Incidents are changed to

accommodate the network resource included

in the S-SPI-F. It has to be noted that

functional dependency is one type of relation

but not the only one, spatial-temporal

relations can also be defined.

Should there be more than one SI, they will

be unified. In this unification process the

oldest SI will prevail, while the rest will be

closed if all of theirs resources have been

subsumed in the prevailing SI. Not all the

relations must be of type functional

dependent and although the relations are

symmetric they may be not transitive

Figure 4: SICAEN Fourth Stage: Stabilization.

ICAART 2012 - International Conference on Agents and Artificial Intelligence

160

3. If the network resource referenced by the

incoming S-SPI-F is not functionally

dependent on any of the network resources

encompassed by the ongoing SIs, then a new

SI, encompassing that resource and any other

that is functional dependent on it, is created.

There is an analogous correlation process for S-

SPI-C.

In all the cases, the type of the resultant SI is re-

evaluated. Every change on any ongoing SIs or the

rise of a SI is notified to the next stage

(Stabilization) by means of the corresponding SIMs.

As it has been formerly stated, network topology

information for the concerned service may be

detailed enough, inaccurate or, even, non-existent. In

any of these cases, the “Aggregation Analysis” uses

the concept of “spatial-temporal proximity” as a

basic mechanism for its correlation process.

The spatial proximity relationship between

network resources is defined as a function of both

network resource typology and network resource

location. As an example, in the context of

IMPACTA project, aimed to identify the impact of

severe network faults on mobile basic services, the

method uses as spatial proximity criteria the distance

between cells defined as a function of cell location

(rural, urban, sub-urban, etc) and cell type (macro-

cell, micro-cell, pico-cell, etc).

The temporal proximity relationship between

events is simply defined as a function of elapsed

time between events.

3.4 Fourth Stage: Stabilization

The fourth stage, named “Stabilization (E)”, is

aimed at reducing the amount of information the

method delivers, in line with the proposal in Sterritt

and Bustard 2002. This stage forwards to the next

one only those SIMs which convey relevant

information.

A severe network incident can cause hundreds,

or even thousands, of alarms and events to be raised

and to be cleared throughout its life-cycle. Each of

them may provoke several SIMs to be issued by the

Aggregation Stage. Most of those mutations report

only very small changes regarding the collection of

network resources affected by the incident

concerned and do not alter by any means the type of

the incident.

In order to select the incident mutations that must

be forwarded to the following stage, Stabilization

Stage makes use of a sliding time-limited watching

window in a per-incident basis (see Figure 4). For

each incident, this technique allows only relevant

mutations to be relayed to final stage. The mutations

are evaluated and since each of them represents a

particular state associated to the Service Incident,

the mutation located at the end of the window

summarizes the previous mutations.

3.5 Fifth Stage: Notification

The fifth and last stage, named “Notification (N)”, is

responsible for issuing SIMs for those clients

wishing to receive them. These clients can be part of

any assurance process that needs to know the impact

of severe network faults on basic telecommunication

services.

This stage shows the typical behaviour of a

standard notification service, such as OMG

Notification Service (OMG, 2004) or OASIS Web

Service Notification (Graham et al. 2006), although

any other type of notification schema may be used.

4 IMPLEMENTATION RESULTS

The SICAEN method has been implemented on top

of the proprietary OSS used by Telefónica (Spanish

biggest mobile operator) to manage its mobile

network which consists of several thousands of cells.

It also takes on account the full network topology,

BSC/RNC, MSC, SGSN, GSSN, HLR... It has

provided for the last few years and still does, impact

analysis on the two basic telecommunications

services, voice and data over GSM and UMTS

networks. It manages over a hundred thousand

different event origins, compromising cells,

BSC/RNCs, MSCs... It generates, on a daily basis,

over thirty thousand service mutation incidents

which in turn generate after the Stabilization Stage

about five thousands unique Service Specific

Identifiers grouped in four thousand Global Fault

Identifiers, for the four basic telecommunication

services managed. This information is used to

prioritise network faults and evaluate their actual

impact.

SICAEN is implemented and is coded in C++

and Java as a multiagent system making use of a

Rete correlation engine in the Unavailability

Selection stage and OASIS Web Service

Notification in the Notification Stage.

The method is integrated in a wider multiagent

architecture (IMPACTA) with specialised agents

complementing SICAEN to provide real time maps

of Service Unavailability zones to the Network

Operator Centre and the Customer Care Personnel.

These maps take in account the fact that in a mobile

SICAEN: A NEW METHOD TO DETERMINE THE IMPACT OF SEVERE NETWORK FAULTS ON BASIC

TELECOMMUNICATION SERVICES

161

network there is a certain degree of overlap between

cells so the scope of the failure reported by SICAEN

is lessened using this overlap information.

5 CONCLUSIONS AND FUTURE

WORK

A complete description of the SICAEN method has

been presented in the previous sections. Stages SI

and C provide a normalised annotated input to the

Aggregation phase. These first two stages make a

heavy use of the facilities provided by the

underlying OSS and corporate repositories to obtain

normalised Service-Specific Potential Incidents,

taking into account not only network faults, but also

operator actions. The later are not usually analyzed

by other methods since its root cause is already

known, but nevertheless they create a measurable

Service Impact to the user.

The Aggregation stage computes, using

topological and functional relationships among the

network elements, the net effect for each Service

Specific Potential Incidence (S-SPI). The last two

stages, E and N, provide the means to reduce the

volume of information generated by the Aggregation

stage and to relay the information to external clients.

SICAEN method has proven to be a powerful

tool to determine the impact of severe network faults

on mobile basic services in the context of

Telefonica’s IMPACTA project, which is used to

provide automated reports, textual and graphical, on

severe network failures as requested by the Spanish

Law (Real Decreto 424/2005). Nevertheless, some

enhancements have been identified during its daily

use.

The first enhancement regards the ability to

disaggregate a long life Service Incident into several

separate Service Incidents when it is suspected to

have been wrongly aggregated. In this situation, a

re-evaluation of correlation criteria for every

concerned Service-Specific Potential Incident (S-

SPI) should be performed.

Another enhancement concerns the use of

network resource partial unavailability information

as input. A network resource may be only partially

affected by a fault and, as a result, it may be only

able to preserve part of the services it usually

supports operating under regular conditions.

Additionally, the effect of the fault over each

individual service may vary meaningfully. To handle

this kind of partial unavailability information the use

of fuzzy logic and fuzzy reasoning techniques

(Zadeh, 1988 and Baldwin, 1981)) shall be explored.

REFERENCES

3GPP. Telecommunication management; Fault Manage-

ment; Part 2: Alarm Integration Reference Point

(IRP): Information Service (IS), TS 32.111-2. Retrie-

ved from http://www.3gpp.org/ftp/Specs/html-info/32

111-2.htm. Last visited 28-07-2011.

Åberg, S. 2002. OSS Quality of Service API (JSR-000090),

OSS through Java Initiative (OSS/J), November 2002.

Retrieved from http://jcp.org/en/jsr/detail?id=090.

Baldwin, J. F. 1981. Fuzzy Logic and Fuzzy Reasoning, in

E. H. Mamdani and B. R. Gaines (eds.) Fuzzy

Reasoning and its Applications, N. Y., Academic

Press, pp. 133-148.

Forgy, C. L. 1979. On the Efficient Implementation of

Production Systems. PhD thesis, Computer Science

Department, Carnegie Mellon University.

Forgy, C. L. 1982. Rete: A Fast Algorithm for the Many

Pattern/Many Object Pattern Match Problem, Artificial

Intelligence, Vol. 19, No. 1, pp. 17-37.

Graham, S., Hull, D. and Murray, B. (eds.). 2006. Web

Services Base Notification (WSN), Oasis Standard,

October. Retrieved from http://www.oasis-open.org/

specs/index.php#wsnv1.3. Last visited 01-07-2011.

ISO/IEC International Standard 10164-4. 1992. Also

published as X.733: Information Technology - Open

Systems Interconnection - Systems Management:

Alarm Reporting Function. International

Communication Union, Geneva. Retrieved from http://

www.itu.int/rec/T-REC-X.733/en.

Jakobson, G. and Weissman, M. D. 1993. Alarm

Correlation, IEEE Network, pp. 52-59, November.

Li, T. Y and Li, X. M. 2011. Preprocessing expert system

for mining association rules in telecommunication

networks, Expert Systems with Applications, 38, pp.

1709–1715.

Liu, Y., Zhang, J., Meng, X. and Strassner, J. 2008

Sequential Proximity-Based Clustering for

Telecommunication Netwok Alarm Correlation,

Lecture Notes In Computer Science, Vol. 5264, pp.

30-39.

OMG. 2004. OMG Notification Service, Version 1.1.

Retrieved from: http://www.omg.org/technology/do-

cuments/formal/notification_service.htm. Last visited

05-11-2009.

Raymer, D. 2004. Statement of alignment between OSS

through Java™ Initiative and 3GPP. Motorola.

Raymer, D. and Flauw, M. 2007. OSS Fault Management

API (JSR-000263), OSS through Java Initiative

(OSS/J), September. Retrieved from http://jcp.org/

en/jsr/detail?id=263. Last visited 05-11-2009.

Real Decreto 424/2005 “Reglamento sobre las condiciones

para la prestación de servicios de comunicaciones

electrónicas, el servicio universal y la protección de

los usuarios” (Regulation on the conditions to provide

electronic communication services, universal service

and customer data protection) Boletín Oficial del

Estado, No 102, pp 14545-14588, 15 April 2005 (In

Spanish).

ICAART 2012 - International Conference on Agents and Artificial Intelligence

162

Sterritt, R. and. Bustard, D. W. 2002. Fusing Hard and

Soft Computing for Fault Management in

Telecommunications Systems, IEEE Transactions on

Systems, Man, and Cybernetics—Part c: Applications

and Reviews, Vol. 32, No. 2, pp. 92-98.

TMFORUM-eTOM. GB921-Concepts and Principles -

Main eTOM document. Retrieved from http://www.

tmforum.org/BusinessProcessFramework/1647/home.

html (membership needed). Last visited 05-11-2009.

TMFORUM-MTOSI.Multi-Technology Operation System

Interface (MTOSI) Release 1.0. Retrieved from http://

www.tmforum.org/browse.aspx?catID=2320&linkID=

30792 (membership needed). Last visited 05-11-2009.

TMFORUM-NGOSS. NGOSS Service Assurance Charter.

http://www.tmforum.org/NGOSSServiceAssurance/48

26/home.html (membership needed). Last visited 05-

11-2009.

TMFORUM-TAM. 2006. Telecom Applications Map, The

BSS/OSS Systems Landscape. Release 2.0. Retrieved

from http://www.tmforum.org/ (membership needed).

Last visited 05-11-2009.

Zadeh, L.A. 1988. Fuzzy Logic, IEEE Computer, Vol. 21,

Issue 4, pp. 83-93.

APPENDIX

Patent pending. Reference number: P200803548

SICAEN: A NEW METHOD TO DETERMINE THE IMPACT OF SEVERE NETWORK FAULTS ON BASIC

TELECOMMUNICATION SERVICES

163