OPTIMIZING REVENUE

Service Provisioning Systems with QoS Contracts

J. Palmer, I. Mitrani, M. Mazzucco

School of Computing Science, Newcastle University, NE1 7RU, UK

P. McKee, M. Fisher

BT Group, Adastral Park, Ipswich, IP5 3RE, UK

Keywords:

Quality of service, Revenue maximization, Server allocation, Admission policies, M/M/N/K queue.

Abstract:

We consider the problem of how best to structure and control a distributed computer system containing many

processors, subject to Quality of Service contracts. Services of different types are offered, with different

charges for running jobs and penalties for failing to meet the QoS requirements. The aim is to choose the

number of servers allocated to each service type, and the admission criteria for jobs of that type, so as to

maximize the total average revenue per unit time. The performance of a fast allocation heuristic is evaluated.

1 INTRODUCTION

The context for this work is a service provisioning

system where a cluster of resources (servers) is em-

ployed to offer different services to a community

of users. The immediate motivation came from the

world of web services, but other multi-class hosting

environments would fall in the same framework. With

each service type is associated a service level agree-

ment (SLA), formalizing the obligations of the users

and the provider. In particular, a user agrees to pay a

certain amount for each accepted and completed job,

while the provider agrees to pay a penalty whenever

the response time (or waiting time) of a job exceeds a

certain bound. It is then the provider’s responsibility

to decide how to allocate the available resources, and

when to accept jobs, in order to make the system as

proﬁtable as possible. That, in general terms, is the

problem that we wish to address.

In order to do that, it is necessary to have a quan-

titative model of user demand, service provision and

admission policy. We use, as a basic building block,

the M/M/N/K queueing model, augmented with the

economic parameters of charges and penalties. The

aim of this paper is to propose and evaluate efﬁcient

and easily implementable policies for resource allo-

cation and job admission.

The approach we have adopted includes mathe-

matical analysis, numerical solutions and also experi-

mentation with a real hosting environment where dif-

ferent web services are requested and deployed.

The economic issues arising in multi-server,

multi-class systems with QoS contracts based on re-

sponse times or waiting times do not appear to have

been studied before. Huberman et al (Huberman

et al., 2005) describe a pricing structure for service

provision which ensures truthful reporting of QoS by

providers. However, that paper assumes that the prob-

ability of providing a particular level of QoS is ﬁxed

and known, without being speciﬁc about how QoS is

measured. The distribution of response or waiting

times is not considered. Rajkumar et al (Rajkumar

et al., 1997) consider a resource allocation model for

QoS management, where application needs may in-

clude timeliness, reliability, security and other appli-

cation speciﬁc requirements. The model is described

in terms of a utility function to be maximised. This

model is extended in (Ghosh et al., 2003) and (Hansen

et al., 2004). Such multi-dimensional QoS is beyond

the scope of this paper. However, although the model

described by Rajkumar et al allows for variation in

job computation time and frequency of application

requests, once again the distribution of the response

times and/or waiting times is not considered.

The model assumptions are described in section

2. The revenue analysis is carried out in section 3.

Numerical results and observations gathered from a

working web service hosting system are presented in

187

Palmer J., Mitrani I., Mazzucco M., McKee P. and Fisher M. (2007).

OPTIMIZING REVENUE - Service Provisioning Systems with QoS Contracts.

In Proceedings of the Second International Conference on e-Business, pages 187-191

DOI: 10.5220/0002111501870191

 SciTePress

section 4. Section 5 contains a summary and conclu-

sions.

2 THE MODEL

The system consists of N identical servers, which may

be used to serve jobs belonging to m different types.

However, once allocated to a type of service, a server

remains dedicated to jobs of that type only. In other

words, a static and non-sharing server allocation pol-

icy is employed: n

servers are assigned to jobs of type

i (n

+... +n

= N). Such a policy may deliber-

ately take the decision to deny service to one or more

job types (this will certainly happen if the number of

services exceeds the number of servers).

Jobs of type i arrive according to an indepen-

dent Poisson process with rate λ

, and join a separate

queue. Their required service times are distributed

exponentially with mean 1/µ

. An admission policy

controlled by a set of thresholds is in operation: if

there are K

jobs of type i present in the system (wait-

ing and in service), then incoming type i jobs are not

accepted and are lost (i = 1,2, ...,m).

For the purposes of this model, the quality of

service experienced by an accepted job is measured

either in terms of its response time, W (the inter-

val between the job’s arrival and completion), or in

terms of its waiting time, w (excluding the service

time). Whichever the chosen measure, it is mentioned

explicitly in a service level agreement between the

provider and the users. We assume that each such

contract would include the following three clauses:

1. Charge: For each accepted and completed job of

type i a user shall pay a charge of c

(in practice

this may be proportional to the average length of

type i jobs).

2. Obligation: The response time, W

(or waiting

time, w

), of an accepted job of type i shall not

exceed q

3. Penalty: For each accepted job of type i whose

response time (or waiting time) exceeds q

, the

provider shall pay to the user a penalty of r

Thus, in this model, service type i is characterized

by its ‘demand parameters’ (λ

,µ

), and its ‘economic

parameters’, namely the triple

) = (charge,obligation, penalty) (1)

Within the control of the provider are the server

allocations, n

, and the admission thresholds, K

. The

objective is to choose those allocations and thresholds

so as to maximize the total average revenue earned per

unit time in the steady state. A stationary regime al-

ways exists for a bounded queue, but if K

= ∞ for

some i, then the corresponding demand parameters

must satisfy λ

< n

in order that the queue be stable.

Note that, although we make no assumptions

about the relative magnitudes of the charge and

penalty parameters, the more interesting case is where

the latter is at least as large as the former: c

≤ r

Otherwise one could guarantee a positive revenue by

accepting all jobs of type i, regardless of the load and

of the obligation made.

3 REVENUE EVALUATION

We concentrate ﬁrst on the subsystem associated with

service i, for a given set of demand and economic

parameters, and ﬁxed allocation n

and threshold K

That subsystem behaves like an M/M/n

queue

(see, for example, (Mitrani, 1998)).

Denote by V

the average revenue earned from

type i jobs per unit time in the steady state. If the

QoS measure is the response time, then V

is given by

= λ

−1

∑

j=0

i, j

− r

P(W

i, j

> q

)] , (2)

where p

i, j

is the stationary probability that there are j

jobs of type i in the M/M/n

queue, andW

i, j

is the

response time of a type i job which ﬁnds, on arrival, j

other type i jobs present.

If the QoS measure is the waiting time, then V

given by a similar expression, with P(W

i, j

> q

) be-

ing replaced by P(w

i, j

> q

) (where w

i, j

is the waiting

time of a type i job which ﬁnds, on arrival, j other

type i jobs present).

The stationary distribution of the number of type

i jobs present is found by solving the balance and

normalizing equations. Similarly, the probabilities

P(W

i, j

> q

) can be evaluated by computing the dis-

tribution function of a convolution of the right num-

ber of exponential distributions. This can be done in

closed form.

When the computation of V

is done for different

sets of parameter values, it becomes clear that it is

a unimodal function of K

. That is, it has a single

maximum, which may be at K

= ∞ for lightly loaded

systems. We do not have a mathematical proof of

this proposition, but have veriﬁed it in numerous nu-

merical experiments. That observation implies that

one can search for the optimal admission threshold

by evaluating V

for consecutive values of K

, stop-

ping either when V

starts decreasing or, if that does

not happen, when the increase becomes smaller than

some ε. Such searches are typically very fast.

ICE-B 2007 - International Conference on e-Business

188

The second problem under consideration is to

maximize the total average proﬁt per unit time, V,

earned from all the m different service types:

V = V

+ ... +V

(3)

The server allocations vector, (n

,. ..,n

) (satisfy-

ing n

++n

= N), and the admission thresholds

vector, (K

,. ..,K

), must be chosen so as to max-

imize V.

A simple and intuitively sensible policy is to al-

locate the servers roughly in proportion to the offered

load, ρ

= λ

/µ

, and to the service charge, c

, for each

type. In other words, set

∑

j=1

+ 0.5

(i = 1,..., m− 1) ;

= N −

m−1

∑

i=1

(4)

(adding 0.5 and truncating is the round-off operation).

The corresponding vector of optimal admission

thresholds is determined as described above, at a com-

putational cost on the order of O(m). This will be re-

ferred to as the ‘heuristic’ policy. Note that it may

yield an allocation of n

= 0 and an admission thresh-

old K

= 0 for some service types. If that is unde-

sirable for reasons other than revenue, the allocations

can be adjusted appropriately.

4 NUMERICAL AND EMPIRICAL

RESULTS

Several experiments were carried out, aiming to eval-

uate the beneﬁts of determining the optimal system

conﬁguration. To reduce the number of variables, the

following features were held ﬁxed:

• The QoS measure is the response time, W.

• The obligations undertaken by the provider are

that jobs will complete within twice their average

required service times, i.e. q

= 2/µ

• All penalties are equal to the corresponding

charges: r

= c

(i.e., if the response time exceeds

the obligation, users get their money back).

The ﬁrst experiment concerns a 20-server system

offering two services (N = 20, m = 2). The 21 possi-

ble server allocations (n

) are evaluated and com-

pared, for three different pairs of arrival rates. In each

case, the total offered load is ρ

+ ρ

= 15.0, which

means that the 20-server system is 75% loaded.

The average service times and job charges are

1/µ

= 1/µ

= 1.0 and c

= c

= 100.0, respectively.

100

120

140

0 5 10 15 20

revenue, V

servers allocated to job type 2

= 0.75, λ

= 0.75

⋄

= 0.5, λ

= 1.0

= 0.2, λ

= 1.3

Figure 1: Maximum revenue earned for different server al-

locations: N = 20, m = 2, c

= r

= 100, µ

= 0.1.

For each server allocation, the optimal pair of admis-

sion thresholds is determined and used.

Figure 1 shows the total revenue earned, V =

+ V

, as a function of n

. When the two arrival

rates are equal, λ

= λ

= 7.5, the demand is symmet-

ric and so the optimal allocation is n

= n

= 10. The

optimal admission thresholds are K

= K

= 19. For

asymmetric demands, as one might expect, it is bet-

ter to allocate more servers to the more heavily loaded

service. Thus, if λ

= 5 and λ

= 10, the optimal allo-

cation is n

= 7, n

= 13, with admission thresholds

= 14, K

= 24. Lastly, when λ

= 2 and λ

= 13,

the optimal allocation is n

= 4, n

= 16, with admis-

sion thresholds K

= 9, K

= 28.

It is worth noting that the optimal server alloca-

tions are quite close to those suggested by the heuris-

tic described in the last section.

The aim of the next experiment is to evaluate the

quality of the heuristic allocation policy. A 20-server

system with 2 types of service was subjected to ﬂuc-

tuating demand controlled by a single parameter, λ.

During a period of time of length 1000, jobs of type

1 and 2 arrive at rates λ

= λ and λ

= 10λ, respec-

tively. Then, during the next period of length 1000,

the arrival rates are λ

= 10λ and λ

= λ, respectively;

and so on. The average service times for the two

types are equal, 1/µ

= 1/µ

= 0.8, as are the charges,

= c

= 100. In addition, a third policy which uses

the same server allocations as the heuristic, but does

not restrict admissions (i.e., K

= K

= ∞), is included

in the comparison. In all cases, it is assumed that, at

the beginning of every new period, the demand pa-

rameters become known instantaneously, so that the

server allocations and admission thresholds can be

computed and applied during that period. In order

to avoid the question of whether the system reaches

steady state during each period, the comparisons were

done by simulation.

In ﬁgure 2, the total revenue earned per unit time

by the three policies is plotted against the offered load

OPTIMIZING REVENUE - Service Provisioning Systems with QoS Contracts

189

700

750

800

850

900

950

1000

1050

1100

1150

1200

1250

65 70 75 80 85 90 95 100

revenue, V

load

optimal

⋄

heuristic

admit all

Figure 2: Policy comparisons: revenue as function of load:

N = 20,µ

= 0.8, c

= r

= 100.

(which is equal to 11λ/µ

). The near-optimality of

the heuristic is rather remarkable. In contrast, the rev-

enues earned by the unrestricted admission policy in-

crease more slowly, and then drop sharply as the load

becomes heavy. This example demonstrates that, by

itself, a sensible server allocation is not enough; to

yield good results, it should be accompanied by a sen-

sible admission policy.

Realization of a Hosting System

A middleware platform for the deployment and use

of web services was designed and implemented, with

the aim of providing a real-life environment in which

to study various QoS policies. The architecture is

message-based and asynchronous.

Client requests for services arrive at a controller

which collects statistics, estimates parameters and im-

plements the server allocation and job admission poli-

cies. The controller makes reconﬁguration decisions

(e.g., server reallocations or changes of admission

thresholds) at intervals of speciﬁed length. There is

a handler associated with each service type, whose

responsibilities include (a) deployment of the service

(fetched from the code store) if not already deployed,

(b) queueing of accepted jobs, if necessary, and (c)

passing to the controller all necessary statistics.

An experiment similar to the one illustrated in ﬁg-

ure 1 was carried out using a cluster of 20 comput-

ers. Two service types were deployed, and streams

of requests were generated. Note that this was not an

emulation of the model, but a real implementation. In

the real system, messages passed between client, con-

troller, handler and server are subject to network de-

lays and processing overheads, which cannot be con-

trolled. Also, it could not be guaranteed that the com-

puters were dedicated to these tasks; there could be

random demands from other users.

The three pairs of arrival rates used in this exper-

iment were the same as in ﬁgure 1: λ

= λ

= 0.75;

= 0.5, λ

= 1.0; and λ

= 0.2, λ

= 1.3. The aver-

age service times were also the same: 1/µ

= 1/µ

10. However, because of the factors mentioned above,

one should not expect a precise match between the

numerical predictions and the observations of the real

system. In ﬁgure 3, the total revenues earned are plot-

ted against n

, the number of servers allocated to ser-

vice 2. Each point in the ﬁgure corresponds to a sepa-

rate run of the system, during which about 1000 jobs

of each type arrived and were completed. These plots

have the same general characteristics as the ones in

ﬁgure 1. The maximum achievable revenue is again

about 120 per unit time, and the server allocations that

achieve it are the same, with one exception. In the

case of λ

= 0.5, λ

= 1.0, the real system earned its

highest revenue for n

= 10, n

= 10, whereas the nu-

merical calculations suggested n

= 7, n

= 13. How-

ever, the differences in revenues are not large.

100

120

140

0 2 4 6 8 10 12 14 16 18 20

revenue, V

servers allocated to type 2

= 0.75, λ

= 0.75

⋄

= 0.5, λ

= 1.0

= 0.2, λ

= 1.3

Figure 3: Observed revenues for different server alloca-

tions: N = 20, m = 2, c

= r

= 100.

5 CONCLUSIONS

The contribution of this paper is to introduce quantita-

tive methods to a previously unexplored area, namely

the market in computer services. We have demon-

strated that policy decisions such as server allocations

and admission thresholds can have a signiﬁcant ef-

fect on the revenue earned. Moreover, those decisions

are affected by the contractual obligations between

clients and provider in relation to quality of service.

When the numbers of services offered and servers

available are too large for an exhaustive search of the

optimal policy, a simple heuristic is proposed. Exper-

imentation suggests that it is close to optimal.

ACKNOWLEDGEMENTS

This work was carried out as part of the research

project QOSP (Quality Of Service Provisioning),

ICE-B 2007 - International Conference on e-Business

190

funded by British Telecom. It was also supported by

the European Union Network of Excellence EuroNGI

(Next Generation Internet).

REFERENCES

Ghosh, S. et al. (2003). Scalable resource allocation for

multi-processor qos optimization. In The Interna-

tional Conference on Distributed Computing Systems.

Hansen, J. et al. (2004). Resource management of highly

conﬁgurable tasks. In IPDPS ’04, The 18th Inter-

national Parallel and Distributed Processing Sympo-

sium, pages 116–123.

Huberman, B. A. et al. (2005). Ensuring trust in one

time exchanges: solving the qos problem. Netnomics,

7:27–37.

Mitrani, I. (1998). Probabilistic Modelling. Cambridge

University Press.

Rajkumar, R. et al. (1997). A resource allocation model for

qos management. In RTSS ’97, The 18th IEEE Real-

Time Systems Symposium, pages 298–307.

OPTIMIZING REVENUE - Service Provisioning Systems with QoS Contracts

191