POLICY-BASED SERVICE LEVEL AGREEMENT

MANAGEMENT SYSTEM

Noh-sam Park, Shin-kyung Lee, Gil-haeng Lee

Electronics and Telecommunications Research Institute

Keywords: Service Level Agreement, SLA, SLM, Policy

Abstract: SLA is a negotiated agreement between a customer and the service provider on levels of service

character

istics and the associated set of metrics. In this paper, we propose a policy-based SLA management

system. We present an approach to react not only when an SLA is violated, but also before imminent SLA

violations. We provide a common generic framework capable of components to interwork via XML. The

managed SLA metrics are classified into service opening metrics, trouble metrics, and performance metrics.

We rely on a proposal for architecture to provide the end-user with SLM from the service subscription to the

service termination. Finally, we’ll give an example to illustrate a typical scenario to assure customers’ SLAs

in ADSL network service.

1 INTRODUCTION

A Service Level Agreement (SLA) is part of the

contract between the service provider and its

consumers. It describes the provider’s commitments

and specifies penalties if those commitments are not

met.

Service Level Agreements (SLAs) are

fund

amental to business continuity. The bottom line

is that they define your minimum levels of

availability from key suppliers, and often determine

what actions will be taken in the event of serious

disruption. As a consequence, they require full

consideration and attention and must be constructed

extremely carefully. This is not an area in which to

cut corners.

The SLA will be wide in scope, covering all key

asp

ects of the service. Typically, it will fully

embrace such issues as problem management,

compensation (often essential in terms of

motivation), warranties and remedies, resolution of

disputes and legal compliance. It essentially frames

the relationship, and determines the major

responsibilities, both in times of normal operation

and during an emergency situation. The difficulty, as

ever, is usually where to start. Is it possible with a

blank piece of paper? This is not usually a good idea,

not specifically because of the amount of effort

involved, but also because of the greater risk of

missing, or perhaps not properly documenting, a

major issue.

It is now widely accepted that service

provision

and receipt should be governed by an agreement.

This is essential to define the parameters of the

service, for the benefit of both the provider and the

recipient. It must obviously cover many other issues,

as well as defining the service itself.

SLA does not care how the service is configured

what is the topology of underlying network(Bao

Hua Liu et al., 2003). SLA only concerns end-to-end

delivery of the services. There has been substantial

progress in the management of distributed

application. With distributed applications running on

underlying networks and systems, the performance

of applications is inevitably influenced by the

performance of networks and systems.

111

Park N., Lee S. and Lee G. (2004).

POLICY-BASED SERVICE LEVEL AGREEMENT MANAGEMENT SYSTEM.

In Proceedings of the First International Conference on E-Business and Telecommunication Networks, pages 111-117

DOI: 10.5220/0001390801110117

 SciTePress

Figure 1: Interrelation between SLA, SLS, TCA, and TCS

According to (Brian et al., 2002), all of the

application level performance problems, 45% are

caused by network problems. Moreover, the

customer wants the higher level of network speed as

the contents of multimedia proliferate. Therefore,

the need arises for an SLA management process in

IP networks. The aims of this work are the definition

of an architecture for the implementation of SLAs,

and the design of an entity capable to monitor a

violation of SLAs.

The remainder of this paper is structured as

follows. Section 2 will discuss the Quality of Service

(QoS) terminology, and describe the earlier works of

service level management system. In section 3, we

will explain a framework of our SLA Management

system, together with a detailed description of the

components. A sample scenario, illustrating the SLA

management process over IP networks, will be

provided in section 4. In section 5, we’ll summarize

our work and sum up the conclusions from this study.

2 SERVICE LEVEL AGREEMENT

Several approaches to QoS definition, including

those of IETF, ITU, and ETSI are in progress in

order to clarify the terminology and eliminate the

confusion. In this section, we describe the

terminology related to an SLA, and review the

related works.

2.1 QoS Terminology

The ITU defines a service level agreement (SLA) as

“a negotiated agreement between a customer and

the service provider on levels of service

characteristics and the associated set of metrics. The

content of SLA varies depending on the service

offering and includes the attributes required for the

negotiated agreement” (ITU-T Rec, 2001). An SLA

may be in form of a document containing names of

the parties signing the contract. It should be

composed of service level objectives, service

monitoring components, and financial compensation

components. Service level objectives encompass

QoS parameters or class of the service provided,

service availability and reliability, authentication

issues, the SLA expiry date, and so on. Service

monitoring specifies the way of measuring service

quality and other parameters used to assess whether

the service complies with the SLA. It may also

include an agreement on form and frequency of

delivering the report on service usage. The financial

component may include billing options, penalties for

breaking the contract, and so forth (ITU-T Rec,

2001).

The notion of service level specification (SLS)

was introduced to separate a technical part of the

contract from SLA. It is defined as “a set of

parameters and their values which together define

the service offered to a traffic” (Grossman, 2002). It

specifies a set of values of network parameters

related to a particular service. The IP transport

services are technically described by SLSs.

ICETE 2004 - GLOBAL COMMUNICATION INFORMATION SYSTEMS AND SERVICES

112

A traffic conditioning agreement (TCA) is an

agreement specifying packet classification rules and

traffic profiles as a description of the temporal

properties of a traffic stream, such as the rate and

burst size. In order to force a customer’s traffic

conformance to the profile particular metering,

marking, discarding, and shaping rules are defined.

The treatment of out-of-profile packets is also

specified by a TCA. According to the IETF

definition, “TCA encompasses all of the traffic

conditioning rules explicitly specified within a SLA

along with all of the rules implicit from the relevant

service requirements and/or from a DiffServ

domain’s service provisioning policy” (Blake et al.,

1998).

The traffic conditioning specification (TCS) is a

set of parameters with assigned values that

unambiguously specify a set of classifier rules and a

traffic profile. A TCS is a technical part of TCA. A

TCS is also an integral element of an SLS

(Grossman, 2002).

Interrelations between SLA, SLS, TCA and TCS

are shown in Fig. 1(Gozdecki et al., 2003).

2.2 Related Works

The importance of SLA has been recognized and

widely accepted by ASP’s, ISP’s, etc. This section

reviews features of various SLA management

systems.

Reference (Leff et al., 2003) examines the

requirements on a grid’s infrastructure to support

SLAs and describes a prototype implementation that

satisfies them. It specifically focuses on the dynamic

offload infrastructure needed to meet SLAs related

to varying workload conditions. The components has

the ability to formally define an SLA, detect an SLA

violation, scale up resources dynamically in

response to an SLA violation. However, it is limited

to react only when an SLA is violated, not predict

SLA violations.

In (Chakravorty et al., 2003), an architecture for

end-to-end QoS control in a wired-wireless (UMTS)

environment is proposed with dynamic SLA-based

resource provisioning. It is achieved in CUE

(CADENUS-UMTS Extension) framework. CUE

architecture adds two new components, CUE-SM

and CUE-RM, that can be used to provision end-to-

end QoS in a wired-wireless network. It uses a

combined mix of dynamic SLA-based and policy

control schemes. The main functions include

automation of r-SLA (retail SLA), static or dynamic

negotiation of r-SLAs. Adopting QoS negotiation, it

is possible to make a decision about user QoS in real

time.

In a view of contract management, T.J. Watson

Research Center has developed SAM(Buco et al.,

2003). The e-business SLA contract execution

manager SAM enables the provider to application

provider to deploy an effective means of capturing

and managing contractual SLA data as well as

provider-facing non-contractual SLM data. SAM

assists service personnel to prioritize the processing

of action-demanding quality management alerts.

And it automates the prioritization and execution

management of approved SLM processes on behalf

of the provider.

In order to share management information across

interdomains, (Bhoj et al., 2001) elaborated a web-

based architecture. The architecture can be used for

automatically management of SLA for internet

services. The authors also demonstrated how a

service provider could offer verifiable and

meaningful pre-defined SLA behaviors to their

customers.

3 FRAMEWORK OF SLMS

We propose a form of architecture for policy-based

SLA Management System (SLMS) using web

service. It provides a common generic framework

capable of its components to interwork via XML.

We design the user interface for system operators

using SLMS. Operators use can search SLA metrics,

violation details, and can monitor SLA in real time.

3.1 SLMS Components

We categorize the SLA metrics into service opening

metrics, trouble metrics, and performance metrics.

Service opening means that end-user must be able to

use the network service at the date of agreements.

Trouble metrics includes the trouble recovery time,

the sum of trouble time, and the number of troubles.

Performance metrics are related to the QoS of

network such as packet delay, packet loss.

In order to efficiently manage the SLA, the

warning messages are sent to system operators in

real time while monitoring the SLA. System

operators check the details and take an action to

prevent the violation of SLA.

POLICY-BASED SERVICE LEVEL AGREEMENT MANAGEMENT SYSTEM

113

In this context, we rely on a proposal for

architecture to provide the end-user with service

level management (SLM) based on policy. The

functional blocks are:

 AM - Access Manager

 DM – Data Manager

 MM – Monitoring Manager

 PM – Policy Manager

 UM – User Interface Manager

The AM is the entity that receives the

information related service opening, trouble and

performance. It is responsible for translating the

information into XML format, pushing the translated

XML document into the message queue.

Furthermore, the AM collects the network

performance data.

The DM reads the XML data from the message

queue, classifies the data according to the SLA

metrics. As the DM manages the information in the

database, it can response to the UM the retrieve and

save the SLA related data.

The MM plays the important role of monitoring

the violation of SLA metrics. It reads the monitored

data through the DM at the defined interval, and

compares the current data with SLA metric. If the

MM detects the violation, it sends the violation

information through the message queue.

The PM is closely related to MM. Policy is the

editable file which contains attribute-value pairs.

Policy contains the flag if the metrics is monitored,

and the monitoring interval. The MM reads the

policy file and parses attribute-value pairs.

Depending on the value, we can monitor the specific

metrics, or not.

Operators manage the SLA of end-users by

utilizing the SLMS. The UM interacts with the

operators, and provides the variety of data.

Additionally using the UM, SLA metrics can be

retrieved, updated, and added according to the

change of the network service. Operators can

configure the policy of SLM such as the execution

of monitoring or not. Fig. 2 represents the

architecture of our SLMS.

Figure 2: Architecture of SLMS

3.2 Monitoring SLA Metrics

Monitoring is the core function of SLM to prevent

the violation of SLA. Our system has the monitoring

component which checks the threshold at first, and

compares the metrics value secondly. The threshold

is the value which can be alerted to the operator by

sending the ‘warning’ message. If the operator

receives the message from the system, he/she checks

the details, and can take an action to prevent the

violation of SLA.

As the aforementioned metrics classification, the

MM monitors the following metrics : service

opening, trouble, and performance. At the system

initiation stage, the MM reads the policy file and

parses the attribute-value pairs. For example, if

‘packet delay’ metrics of ADSL service is marked as

‘not monitored’, MM will not monitor the

corresponding data. As deciding the policy, the MM

creates threads in order to monitor the categorized

metrics.

ICETE 2004 - GLOBAL COMMUNICATION INFORMATION SYSTEMS AND SERVICES

114

Table 1:. SLA Metrics

MetricsCode Description Threshold Value Unit

ADS10 ADSL Service Opening 2 8 Day

ADS20 ADSL Trouble Delay Time 120 180 Minute

ADS21 ADSL Monthly Trouble Time 20 24 Hour

ADS31 Packet Loss Rate 3 5 %

Using the UM, the operator can configure

whether the metrics are monitored or not. If the

metrics is set not to be monitored, the MM will not

execute the monitoring function. But if the history of

warning and violation is recorded in the database,

and can be retrieved by the UM.

Also, another policy-based monitoring can be

accomplished by configuring the various preferences.

The interval of monitoring can be changed by using

UM. If the operator changes the interval, UM sends

the message to message queue. While listening to

the message queue, MM receives the event of

interval change. MM aborts the current living

threads, and invokes threads again with the new

interval.

At the defined thread invoke time, the MM

periodically creates threads. Threads retrieve the

monitored data, threshold and metric. Firstly, the

MM thread compares the data with the threshold. If

the current value is greater than the threshold, the

‘warning’ message is sent to the AM in XML format.

As time passes, the MM thread will detect the

violation of SLA by comparing the current value

with value of the metrics. If the violation event

occurs, the ‘violation’ message will be sent to the

message queue, and at the same time the violation

details are recorded in the database by the DM. The

threads are disposed after execution. This procedure

is iterated on processing time.

4 POLICY-BASED SLA

MANAGEMENT

We manage SLA metrics as the code with value and

threshold. If the other metrics should be added or

deleted, we can simply manage the metrics only to

add/delete the related metrics code into/from the

database(Table 1).

In this section, we will explain the SLA

management over xDSL services by illustrating a

sample scenario. From the service subscription, our

system will monitor in order to meet the SLA. And

trouble recovery and network performance must be

satisfied in order not to violate the metrics value.

4.1 Service Opening Management

Service must be available before the customer’s

hoping date. We assume that a customer would like

to subscribe an ADSL service, and wants to use the

service in 10 days from the requesting date.

Furthermore, the customer wants a high quality of

service. The request is received by service-opening

system, and that system passes the required data to

the our system.

As soon as SLMS receive the subscription data,

the monitoring process begins. The threshold of

service open metric is 2 days before the user’s

hoping date. So, no event is sent to the system

operator from SLMS in 8 days from the requested

date. During that time, the operator can ask of

service-opening system to check the current opening

state.

Service opening can be done successfully in 10

days. The service-opening system passes the result

with the service quality (e.g. line speed). If the

service quality does not satisfy the SLA, the system

operator requests an order again. So, the quality of

service is guaranteed.

If the service is not applicable to the customer

after 8 days, the warning message is sent to the

system operator. The system operator can send a

command to the service-opening system in order not

to violate the service open metric.

Although SLMS alerts with the warning message,

it is possible not to accomplish the order. In that case,

our system shows the violation message(Fig. 3).

More time passes, more money must be refunded to

the customer. So the system operator needs to hurry

the service opening process.

POLICY-BASED SERVICE LEVEL AGREEMENT MANAGEMENT SYSTEM

115

ure 3: A sa

le GUI for real-time monitorin

4.2 Service Trouble Management

During the service time, the user can not use the

service due to the network provider’s responsibility.

For example, it includes the periodic network

examination, system breakdown, and so on. The

customer is assured to be able to use the service in

the specified period. If not so, the customer receives

the money in proportion to the exceeded time.

The customer makes a report to notify the

network trouble if he/she cannot use the network

service. The customer wants to use the service in the

service recovery time. The received trouble report is

received by service-assurance system, and that

system passes it to SLMS. In the same way of

opening monitoring, SLMS monitors not to violate

the metric : service recovery time.

Notwithstanding the user may be ignorant of the

service outage, it is possible of the network provider

to detect it. The same process is executed in case of

automatically detected trouble, but not overlapped.

Whether the customer knows the trouble or not,

SLMS assures the service recovery time.

Additionally, we manage the metrics related

trouble in the specified period : the sum of trouble

time, the number of troubles. As individual troubles

are recorded into database, we manage these metrics

easily to add the time and count. Our system assures

that the total trouble time must not be exceeded to

24 hours in a month; the count of troubles must be

less than 5 in a month.

4.3 Service Performance

Management

If the network provider does not satisfy the quality

of network, the customer is disappointed and may

find the other provider. In that point, the service

performance is the most important thing to both

customer and provider. We have the functionality of

managing the following metrics : packet loss rate,

packet delay, availability.

SLAs can be classified in retail-SLA and

wholesale-SLA(D’Arienzo et al., 2003). The retail

SLA refers to the agreements between an end-user

and a service provider. Conversely, a wholesale-

SLA is an agreement between network operators,

and takes into account traffic aggregates flowing

from one domain to another. As we rely on retail-

SLA, the managed section of network performance

is limited between the end-user and the backbone

network. Now we make a research to solve this

ICETE 2004 - GLOBAL COMMUNICATION INFORMATION SYSTEMS AND SERVICES

116

limitation by using the user-side agent which

collects the network performance information.

SLMS collects the performance data by polling

the equipment from the centralized server. The

server is located in the backbone network, and a

number of end-users are attached to the target

equipment. As ADSL service uses dynamic host

configuration protocol (DHCP), it is impossible to

collect the performance data using the fixed IP. So,

the same performance data of equipment are applied

to the attached end-users.

In contrast to other metrics, the violation of

service performance is made with the average value

within the specified period. Temporarily the service

performance may be declined, and the warning event

may be sent. If the system operator takes an action to

prevent a violation, it is burden because a violation

may not be happened. So we provide the trend of

network performance to the system operator. Seeing

the trend, the operator determines if it requires an

action or not.

5 CONCLUSIONS

We propose a form of architecture for policy-based

SLA Management System. We first describe the

QoS terminology, including SLS, SLA. And policy-

based SLMS are introduced with detailed

description of its components. According to the

policy, we can configure the metrics. Our system has

the capability to manage the SLA from the service

subscription to the service termination. It is possible

to monitor in real-time in order not to violate the

metrics. We design the user interface for system

operators using SLMS.

As we rely on retail-SLA, the managed section

of network performance is limited between the end-

user and the backbone network. Future work has

been working by using the user-side agent which

collects the network performance information.

According to the implications of the research, future

work has been conducted to interwork Operation

Supporting Systems (OSSs) such as the refund

system and NMS.

REFERENCES

Bao Hua Liu, P. Ray, S. Jha, Mapping distributed

application SLA to network QoS parameters,

Telecommunications, 2003 (ICT 2003), pp.1230-1235,

2003

Brian L. Tierney, End-to-End Application Monitoring

using the Distributed Monitoring Framework,

Lawrence Berkeley National Laboratory, 2002

ITU-T Rec. Y.1241, Support of IP-based Services Using

IP Transfer Capabilities, Mar. 2001

D. Grossman, New Terminology and Clarifications for

Diffserv, IETF RFC 3260, Apr. 2002

S. Blake et al., An Architecture for Differentiated Services,

IETF RFC 2475, Dec. 1998

J. Gozdecki, A. Jajszczyk, R. Stankiewicz, Quality of

service terminology in IP networks, Communications

Magazine, IEEE, pp.153-159, 2003

A. Leff, J.T. Rayfield, D.M. Dias, Service-level

agreements and commercial grids, Internet

Computing, IEEE, pp.44-50, 2003

R. Chakravorty, I. Pratt, J. Crowcroft, A framework for

dynamic SLA-based QoS control for UMTS, Wireless

Communications, IEEE, pp.30-37, Oct, 2003

M. Buco, Rong Chang, L. Luan, C. Ward, J. Wolf, P. Yu,

Managing eBusiness on demand SLA contracts in

business terms using the cross-SLA execution manager

SAM, ISADS, pp.157-164, Apr. 2003

P. Bhoj, S. Singhal, S. Chutani, SLA Management in

Federated Environments, 5-24 Comp. Nets., vol. 35,

no.1, Jan. 2001

M. D’Arienzo, M. Esposito, S.P. Romano, G. Ventre,

Automatic SLA Management in SLA-aware

architecture, Telecommunications, 2003 (ICT 2003),

pp.1402-1406, 2003

G. Cortese, R. Fiutem, P. Cremonese, S. D'antonio, M.

Esposito, S.P. Romano, A. Diaconescu, CADENUS:

creation and deployment of end-user services in

premium IP networks, Communications Magazine,

IEEE, pp.54-60, Volume: 41 , Issue: 1 , Jan. 2003

POLICY-BASED SERVICE LEVEL AGREEMENT MANAGEMENT SYSTEM

117