Conceptual Approach for Performance Isolation in Multi-tenant Systems

Manuel Loesch

and Rouven Krebs

FZI Research Center for Information Technology, Karlsruhe, Germany

SAP AG, Global Research and Business Incubation Karlsruhe, Karlsruhe, Germany

Keywords:

Performance, Isolation, Architecture, Multi-tenancy, PaaS.

Abstract:

Multi-tenant applications (MTAs) share one application instance among several customers to increase the

efﬁciency. Due to the tight coupling, customers may inﬂuence each other with regards to the performance they

observe. Existing research focuses on methods and concrete algorithms to performance-isolate the tenants.

In this paper, we present conceptual concerns raised when serving a high amount of users. Based on a load

balancing cluster of multiple MTAs, we identiﬁed potential positions in an architecture where performance

isolation can be enforced based on request admission control. Our discussion shows that different positions

come along with speciﬁc pros and cons that have inﬂuence on the ability to performance-isolate tenants.

1 INTRODUCTION

Cloud computing is a model that enables ubiquitous

and convenient on-demand access to computing re-

sources (Armbrust et al., 2009) via the Internet, of-

fered by a central provider. Economies of scale re-

duce costs of such systems. In addition, sharing of

resources increases the overall utilization rate and al-

lows to distribute static overheads among all con-

sumers.

The NIST deﬁnes three service models for cloud

computing (Mell and Grance, 2011). Infrastructure

as a Service (IaaS) provides access to hardware re-

sources, usually by levering virtualization. Platform

as a Service (PaaS) provides a complete runtime en-

vironment for applications following a well-deﬁned

programming model. SaaS offers on-demand access

to pre-installed applications used remotely.

Multi-tenancy is used in SaaS offerings to share

one application instance between different tenants, in-

cluding all underneath layers, in order to leverage cost

saving potentials the most. At this, a tenant is deﬁned

as a group of users sharing the same view on an ap-

plication. A view includes the data they access, the

conﬁguration, the user management, particular func-

tionality, and non-functional properties (Krebs et al.,

2012a). Typically, a tenant is one customer such as

a company. This way, multi-tenancy is an approach

to share an application instance between multiple ten-

ants by providing every tenant a dedicated share of

the instance which is isolated from other shares.

1.1 Challenges

Since MTAs share the hardware, operating system,

middleware and application instance, this leads to

potential performance inﬂuences of different tenants.

For potential cloud customers, performance problems

are a major obstacle (IBM, 2010) (Bitcurrent, 2011).

Consequently, it is one of the primary goals of cloud

service providers to isolate different customers as

much as possible in terms of performance.

Performance isolation exists if for customers

working within their quotas, the performance is not

affected when aggressive customers exceed their quo-

tas (Krebs et al., 2012b). Relating this deﬁnition

to Service Level Agreements (SLAs) means that a

decreased performance for the customers working

within their quotas is acceptable as long as their per-

formance is within their SLA guarantees. Within this

paper we assume SLAs where the quota is deﬁned by

the request rate and the guarantees by the response

time.

In order to fully leverage beneﬁts of multi-

tenancy, the goal is to realize an efﬁcient perfor-

mance isolation which means that a tenant’s perfor-

mance should only be throttled when (1) its quota is

exceeded, and (2) he is responsible for performance

degradation of other tenants. If violating the quota

were the only criteria, free resources would unneces-

sarily be wasted.

Since customers have a divergent willingness to

pay for performance, SaaS providers are furthermore

297

Loesch M. and Krebs R..

Conceptual Approach for Performance Isolation in Multi-tenant Systems.

DOI: 10.5220/0004399102970302

In Proceedings of the 3rd International Conference on Cloud Computing and Services Science (CLOSER-2013), pages 297-302

ISBN: 978-989-8565-52-5

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

interested in product diversiﬁcation and providing dif-

ferent Quality of Service (QoS) levels when sharing

application instances. This is only possible when hav-

ing a mechanism to isolate the performance.

On the IaaS layer mutual performance inﬂuences

can be handled by virtualization. However, on the

SaaS layer where different tenants share one single

application instance, the layer discrepancy between

the operating system that handles resource manage-

ment and the application that serves multiple ten-

ants makes performance isolation harder to achieve.

Multi-tenant aware PaaS solutions handle issues re-

lated to multi-tenancy transparent for the application

developer in order to increase the efﬁciency of the de-

velopment process. However, nowaday’s PaaS solu-

tions do not address the introduced performance is-

sues.

When solving the problem of mutual performance

inﬂuences in practice, it has to be considered that

multi-tenant aware applications have to be highly

scalable since typical use cases aim at serving a very

large customer base with a huge number of simul-

taneous connections. Hence, one single application

instance running on a dedicated server may not be

enough and it is likely that more processing power

is needed than a single server can offer.

1.2 Contribution

The introduction of a load balancing cluster where

a load balancer acts as single endpoint to the ten-

ants and forwards incoming requests to one of several

MTA instances, results in the need for an architec-

tural discussion. In addition to the development of al-

gorithms that ensure performance isolation, it is also

necessary to provide solutions that show how they can

be applied in real-world environments. Hence, this

paper identiﬁes two essential conceptual concerns for

performance isolation in multi-tenant systems with

regards to the request distribution in a load balancing

cluster. We deﬁned three positions in an architecture

where performance isolation can be enforced based

on admission control. The discussion of their pros and

cons with respect to the elaborated concerns helps to

apply existing solutions in real-world environments.

The remainder of the paper is structured as fol-

lows. The related work presents an overview of ex-

isting isolation mechanism as well as the the current

architectural discussions, and we outline the missing

points in the ongoing research. Section 3 introduces

the conceptual concerns related to the distribution of

request. Section 4 evaluates various positions to en-

force performance isolation in a load balancing clus-

ter and the last Section concludes the paper.

2 RELATED WORK

The related work is twofold. The ﬁrst part focuses

on concrete methods and algorithms to isolate tenants

with regards to the performance they observe, and the

second part discusses conceptual issues. Following,

we ﬁrst give an overview of the ﬁrst part of related

work.

Li et al. (Li et al., 2008) focus on predicting per-

formance anomalies and identifying aggressive ten-

ants in order to apply an adoption strategy to ensure

isolation. The adoption strategy itself is not addressed

in detail, but it reduces the inﬂuence of the aggressive

tenant on the others.

Lin et al. (Lin et al., 2009) regulate response times

in order to provide different QoS for different tenants.

For achieving this, they make use of a regulator which

is based on feedback control theory. The proposed

regulator is composed of two controllers. The ﬁrst

uses the average response times to apply admission

control and regulate the request rate per tenant, the

second uses the difference in service levels between

tenants to regulate resource allocation through differ-

ent thread priorities.

Wang et al. (Wang et al., 2012) developed a

tenant-based resource demand estimation technique

using Kalman ﬁlters. By predicting the current re-

source usage of a tenant, they were able to control the

admission of incoming requests. Based on resource-

related quotas, they achieved performance isolation.

In (Krebs et al., 2012b) four static mechanisms

to realize performance isolation between tenants were

identiﬁed and evaluated. Three of them leverage ad-

mission control, and one of them uses thread pool

management mechanisms.

All of the above approaches miss to discuss archi-

tectural issues that become relevant when they have to

be implemented. Furthermore, no solution discusses

scenarios where more than one instance of the appli-

cations is running as a result of horizontal scaling. Af-

ter this overview of concrete methods, subsequently

the second part of related work is presented which ad-

dresses MTAs and isolation on a conceptual level.

Guo et al. (Guo et al., 2007) discuss multiple

isolation aspects relevant for MTAs on a conceptual

level. Concerning performance isolation they pro-

pose Resource Partitioning, Request-based Admis-

sion Control and Resource Reservation as mecha-

nisms to overcome the existing challenges. However,

the paper does not focus on situations with several ap-

plication instances.

Koziolek (Koziolek, 2011) evaluated several ex-

isting MTAs and derived a common architectural

style. This architectural style follows the web ap-

CLOSER2013-3rdInternationalConferenceonCloudComputingandServicesScience

298

plication pattern with an additional data storage for

tenant-related meta data (e.g., customization) and a

meta data manager. The latter uses the data stored in

the meta data storage to adopt the application to the

tenants’ speciﬁc needs once a request arrives at the

system. However, Koziolek’s architectural style does

not support performance isolation.

In (Krebs et al., 2012a) various architectural con-

cerns such as isolation, persistence, or the distribution

of users in a load balancing cluster are presented and

deﬁned. Furthermore, an overview of the mutual in-

ﬂuences of them is presented. The paper deﬁnes var-

ious aspects relevant for the following section. How-

ever, it does not discuss in detail the information that

are needed to ensure performance isolation. Further,

the position of a potential admission control in a load-

balanced cluster is not addressed.

3 CONCEPTUAL CONCERNS IN

MULTI-TENANT SYSTEMS

In this section two major conceptual concerns are pre-

sented that are of interest for performance isolation

in the context of a load balancing cluster of multiple

MTA instances.

3.1 Tenant Afﬁnity

The need to horizontal scale out by using multiple

processing nodes (i.e. real servers or virtual ma-

chines) to run application instances of the same ap-

plication leads to different ways to couple tenants and

application instances. For this purpose, the term afﬁn-

ity is used. It describes how requests of a tenant

are bound to an application instance. Various types

of afﬁnity might be introduced because of technical

limitations, or to increase the performance since it is

likely to increases the cache hit rate when the users of

one tenant use the same instance. However, sharing

a tenant’s context among application instances that

are running on different processing nodes requires a

shared database, or the use of synchronization mecha-

nisms. Since this might be inefﬁcient, tenants may be

bound to certain application instances only. In (Krebs

et al., 2012a), four different ways are described of

how such a coupling of tenants and application in-

stances can be realized:

1. Non-afﬁne. Requests from each tenant can be han-

dled by any application instance.

2. Server-afﬁne. All requests from one tenant must

be handled by the same application instance.

3. Cluster-afﬁne. Requests from one tenant can be

served by a ﬁxed subgroup of all application in-

stances and one application instance is exactly

part of one subgroup.

4. Inter-cluster Afﬁne. Same as cluster-afﬁne, but

one application instance can be part of several

subgroups.

3.2 Session Stickiness

Independent of tenant afﬁnity, requests can be state-

ful or stateless. Stateless requests can always be han-

dled by each available application instance. However,

maintaining a user’s temporary state over various re-

quests may be required, especially in enterprise ap-

plications. This is described by the term session. A

session is a sequence of interaction between a tenant’s

user and an application in which information from

previous requests are tracked. For load balancing rea-

sons, it makes sense that requests of one session can

still be handled by different application instances de-

pending on the processing nodes’ load. Hence, when

dealing with stateful requests, it can be distinguished

between two kinds of sessions:

1. Non-sticky sessions are sessions where each sin-

gle request of a tenant’s user can be handled by

all available (depending on the tenant afﬁnity) ap-

plication instances. Single requests are not stuck

to a certain server.

2. Sticky sessions are sessions where the ﬁrst request

and following requests of a tenant’s user within a

session have to be handled by the same applica-

tion instance.

When using non-sticky sessions, the session con-

text must be shared among relevant application in-

stances. This results in an additional overhead. Con-

sequently, it might be beneﬁcial to use sticky sessions

to avoid sharing of session information.

4 TOWARDS PERFORMANCE

ISOLATION

In this section two aspects of performance isolation in

a load balancing cluster of multiple instances are dis-

cussed. First, the information availability at different

positions of the requests processing ﬂow, and second,

the consequences of tenant and session afﬁnity.

4.1 Possible Positions

An intermediary component such as a proxy will get

different information at different positions in the pro-

ConceptualApproachforPerformanceIsolationinMulti-tenantSystems

299

cess ﬂow of a request. In Figure 1, three possible

positions to enforce performance isolation based on

request admission control are depicted.

Database

Application

Load Balancer

R R R

Position 1

Position 2

Pos. 3 Pos. 3 Pos. 3

Tenants

. . .

App.

Instance

App.

Instance

App.

Instance

Figure 1: Positions to enforce performance isolation.

In front of the load balancer (Position 1) an in-

termediary has access to requests from all tenants, it

can determine the response times and also whether a

tenant is within its quota. The latter is relevant since

performance isolation is based on the overall amount

of requests from a tenant. However, it is not know

which request is executed by which application in-

stance since this is decided by the load balancer. The

independence of the request distribution is a moti-

vation for this position since it can allow for easier

admission decisions. When ﬁne-grained information

about a processing nodes’s internal state should be

used by the isolation algorithm (e.g., resource utiliza-

tion), access to this data is only possible with a no-

table communication overhead.

Directly after or included in the load balancer

(Position 2), the information available is a superset

of the information available at Position 1. In addition

to the access to all requests and their response times,

at this position, access to their distribution is given

as well. It is known which application instance is re-

sponsible for which request and the overall amount

of requests from each tenant is known as well. Again,

the use of ﬁne-grained information about a processing

nodes’s state comes along with a notable communica-

tion overhead.

In front of the application (Position 3) an inter-

mediary has no information about other processing

nodes, such as the number of requests they processed

for a given tenant or their utilization. However, in-

formation about response times of the respective pro-

cessing node are available. Compared to the other po-

sitions, ﬁne-grained access to a processing node’s in-

ternal state is possible with signiﬁcantly less overhead

since the component can be placed directly on the re-

spective processing node. Further, no global knowl-

edge of the other instances exists. If the information

of all intermediaries is shared, this position would of-

fer the same information as Position 2.

4.2 Comparison of Different Positions

In this section, the suitability of performance isolation

at the three introduced positions is discussed with re-

spect to tenant and session afﬁnity. It is shown that the

kind of tenant afﬁnity and support of sticky sessions

is a major decision for horizontal scalable MTAs. Be-

sides load balancing, synchronization of data and sup-

port for session migration, it has big impact on perfor-

mance isolation.

We assume that requests from each tenant are al-

ways homogeneously distributed over all available

application instances if possible. Hence, accumula-

tions of requests from a tenant to a single applica-

tion instance are avoided and a clear separation of

server-afﬁnity and the other cases of afﬁnity is given.

From an information-centric point of view, it has to be

noted that the required information for performance

isolation and QoS differentiation is the same. When-

ever it is possible to performance isolate tenants, it is

also possible to give precedence to certain tenants by

adding weighting factors when isolating them.

4.2.1 In Front of Load Balancer

Server-afﬁne. In this scenario, performance isolation

is not possible. An increase in response times and re-

quest rates can be measured. However, it can not be

determined which request will be processed at which

application instance since this information is main-

tained in the load balancer. Although it is known that

requests from a tenant are always served by the same

instance, the tenants that inﬂuence each other’s per-

formance by being bound to the same instance are not

known. This makes it impossible to efﬁciently sepa-

rate tenants. Sticky sessions do not in inﬂuence this

since they do not answer which tenants are inﬂuenc-

ing each other.

Non-afﬁne. In this scenario, it depends on the session

stickiness whether performance isolation is possible.

Using non-sticky sessions, performance isolation is

possible. In front of the load balancer it can be de-

termined whether a tenant is within his quota since

the full number of requests from a tenant is known.

Furthermore, in case of non-sticky sessions, the load

balancer can homogeneously distribute the requests.

Hence, the more aggressive a tenant is, the more he is

contributing to a bad performance of any tenant. With

this knowledge, it is possible to performance-isolate

tenants. However, when using sticky sessions, re-

quests are bound to an unknown instance. In this case,

CLOSER2013-3rdInternationalConferenceonCloudComputingandServicesScience

300

interfering when one or more tenants experience a bad

performance is not possible since requests are not uni-

formly distributed to the different instance, and hence

not necessarily the most aggressive tenant is responsi-

ble for bad performances. While initial requests will

be distributed homogeneously, it might end up with a

signiﬁcant number of sessions that spend more time

than others. Thus, it is possible that the most aggres-

sive tenant is bound to a processing node with no fur-

ther load whereas a less aggressive tenant has to share

a processing node’s capacity and thus is responsible

for the bad performance of other requests.

Cluster-afﬁne. In this scenario, performance isolation

is not possible. The behavior in terms of request allo-

cation is the same as described in the non-afﬁne case

with sticky sessions: the underling problem is that the

request allocation information is missing and the uni-

form distribution of request workload could no longer

be assumed since the available instances are limited to

a subset which is not known at this position. This is

not changed by sticky sessions since they only make

existing request-to-instance allocations ﬁx.

4.2.2 Directly After/ Included in Load Balancer

Server-afﬁne, Non-afﬁne, Cluster-afﬁne. At this point,

performance isolation is possible in all three cases of

afﬁnity. The load balancer maintains state to enforce

tenant afﬁnity and the stickiness of sessions in order

to allocate requests to instances. Hence, at this po-

sition the available information about tenant afﬁnity

and session stickiness is a superset of the information

available at the two other positions. The information

about the request allocation and the ability to measure

response times allow to interfere and performance-

isolate tenants. However, as already stated, access to

a processing nodes’s state which may increase qual-

ity of performance isolation is complicated and comes

along with communication overhead.

4.2.3 In Front of Application

Server-afﬁne. In this scenario, performance isolation

is possible. Given server-afﬁnity, requests are always

processed by the same instance and at this point we

are directly in front of the respective processing node.

Thus, any information about other processing node

does not come along with beneﬁts. Since requests of

a tenant are not spread over multiple instances, other

processing nodes do not inﬂuence this tenant and it

is possible to completely measure all information re-

lated to the speciﬁc tenant’s performance. Since re-

quests are already bound to a speciﬁc instance, it is

irrelevant whether sticky sessions are used or not.

Non-afﬁne. In this scenario, performance isolation is

not possible without further information. Since re-

quests of tenants can be served by all instances, the

load balancer is free to distribute the requests of all

tenants. Hence, it can be assumed that requests of

each tenant are homogeneously distributed over all in-

stances. However, performance isolation is not possi-

ble as the information about the total number of re-

quest send by each tenant is not available. This way,

it can not be determined whether a tenant’s quota is

exceeded. The use of sticky or non-sticky sessions

does not change this since requests from a single ten-

ant are still distributed over various instances. How-

ever, in the case of non-sticky sessions, performance

isolation is possible when the processing capacity of

the processing nodes is equal and the total number of

instances is considered. Then, the overall request rate

can be determined since a homogeneous distribution

of the requests can be assumed. Hence it is possible to

determine whether a tenant’s quota is exceeded. But

in the case of sticky sessions, performance isolation

is still not possible since a homogeneous distribution

of requests cannot be assumed any more.

Cluster-afﬁne. Again, in this scenario, it is not pos-

sible to realize performance isolation without further

information. The behavior in terms of request alloca-

tion is the same as in the non-afﬁne case with the lim-

itation that the available set of instances is a smaller

subset. Similar as in the former case, the problem is

missing information about requests that are processed

at other instances, which makes it impossible to deter-

mine quota violations. Again, there is no difference

when non-sticky or sticky sessions are used since the

latter only make the request-to-instance allocation ﬁx.

However, like in the non-afﬁne case, performance iso-

lation is possible in the case of non-sticky sessions

when all processing nodes have the same processing

capacity and the cluster size is known. Then, infor-

mation can be projected from one processing node to

another by assuming a homogeneous distribution of

the load balancer. This allows to determine whether

a tenant is within its quota and thus performance can

be isolated since access to response times is given as

well.

4.3 Summary and Implications

Table 1 summarizes the above discussion and shows

the elaborated differences based on different kinds of

tenant and session afﬁnity. The stickiness of sessions

is only inﬂuential in some cases. In the presence of

a non-afﬁne behavior and session afﬁnity, a central

management of request processing information with

access to the allocation of requests to instances as well

ConceptualApproachforPerformanceIsolationinMulti-tenantSystems

301

Table 1: Positions and feasibility of performance isolation.

Tenant

Afﬁnity

Session

Stickiness

Pos. 1 Pos. 2 Pos. 3

afﬁne

no no yes yes

yes no yes yes

non

no yes yes yes

yes no yes no

cluster

no no yes yes

yes no yes no

as the overall amount of requests is required in order

to guarantee performance isolation. It was explained

why, in many scenarios, performance isolation is not

possible without information about the request distri-

bution (Position 1), or directly in front of the appli-

cation instance (Position 3). Offering a superset of

the information available at the two other positions,

Position 2 is the only one that allows to realize per-

formance isolation for all afﬁnity combinations.

5 CONCLUSIONS

It was shown that performance isolation between ten-

ants is an important aspect in multi-tenant systems,

and that serving a huge amount of tenants requires

the existence of several application instances and a

load balancer that distributes requests among them.

While existing work focuses on concrete algorithms

and techniques to enforce performance isolation, this

paper focuses on a conceptual realization of perfor-

mance isolation in a load-balanced multi-tenant sys-

tem.

We were able to outline that, from an information-

centric point of view, the best placement strategy for

a performance isolation component that leverages re-

quest admission control is directly after the load bal-

ancer. At this position, information about the allo-

cation of requests to processing nodes as well as the

overall amount of requests from a tenant is given. It

was shown that the positions before the load balancer,

or directly before the applications have disadvantages

which make it impossible to realize performance iso-

lation in every scenario. However, the use of ﬁne-

grained information about a processing node’s state

may increase the quality of performance isolation and

this is best possible when the component is placed at

the respective processing node. Consequently, data

has to be transmitted via the network in the other

cases, which leads to a trade-off decision depending

on the concrete scenario.

Our future research focuses on providing a com-

plete architecture to enforce and evaluate perfor-

mance isolation based on the here presented results.

ACKNOWLEDGEMENTS

The research leading to these results has re-

ceived funding from the European Union’s Seventh

Framework Programme (FP7/2007-2013) under grant

agreement N

258862.

REFERENCES

Armbrust, M., Fox, A., Grifﬁth, R., Joseph, A. D., Katz,

R. H., Konwinski, A., Lee, G., Patterson, D. A.,

Rabkin, A., Stoica, I., and Zaharia, M. (2009). Above

the Clouds: A Berkeley View of Cloud Computing.

Technical report, EECS Department, University of

California, Berkeley.

Bitcurrent (2011). Bitcurrent Cloud Computing Survey

2011. Technical report, Bitcurrent.

Guo, C. J., Sun, W., Huang, Y., Wang, Z. H., and Gao,

B. (2007). A Framework for Native Multi-Tenancy

Application Development and Management. In Proc-

ceedings of the 4th IEEE International Conference on

Enterprise Computing, E-Commerce, and E-Services.

IBM (2010). Dispelling the vapor around cloud computing.

Whitepaper, IBM Corp.

Koziolek, H. (2011). The SPOSAD Architectural Style for

Multi-tenant Software Applications. In Procceedings

of the 9th Working IEEE/IFIP Conference on Software

Architecture (WICSA 2011).

Krebs, R., Momm, C., and Kounev, S. (2012a). Architec-

tural Concerns in Multi-Tenant SaaS Applications. In

Proc. of the 2nd International Conference on Cloud

Computing and Services Science (CLOSER 2012).

Krebs, R., Momm, C., and Kounev, S. (2012b). Metrics and

Techniques for Quantifying Performance Isolation in

Cloud Environments. In Proceedings of the 8th ACM

SIGSOFT International Conference on the Quality of

Software Architectures (QoSA 2012).

Li, X. H., Liu, T., Li, Y., and Chen, Y. (2008). SPIN: Service

Performance Isolation Infrastructure in Multi-tenancy

Environment. In Proc. of the 6th International Confer-

ence on Service-Oriented Computing (ICSOC 2008).

Lin, H., Sun, K., Zhao, S., and Han, Y. (2009). Feedback-

Control-Based Performance Regulation for Multi-

Tenant Applications. In Proc. of the of the 15th Inter-

national Conf. on Parallel and Distributed Systems.

Mell, P. and Grance, T. (2011). The NIST deﬁnition of

cloud computing (Special Publication 800-145). Rec-

ommendations of the National Institute of Standards

and Technology.

Wang, W., Huang, X., Qin, X., Zhang, W., Wei, J., and

Zhong, H. (2012). Application-Level CPU Consump-

tion Estimation: Towards Performance Isolation of

Multi-tenancy Web Applications. In Proc. of the 2012

IEEE 5th International Conf. on Cloud Computing.

CLOSER2013-3rdInternationalConferenceonCloudComputingandServicesScience

302