main purpose is to find an executor for a query and
route the query to him as fast as possible. It makes a
decision based on the map of a cluster.
Figure 1: Multi-tenant database cluster architecture.
It is important to note that a query routing server
has a small choice of executors for each query. If the
query implies data modification, there is no
alternative than to route it to the master database of a
tenant, because only their data modification is
permitted. If the query is read-only, it can also be
routed to a slave server, but in the general case there
would be just one or two slaves for a given master,
so even in this case the choice is very limited.
The data distribution and load balancing server is
the most important and complicated component of
the system. Its main functions are:
initial distribution of tenants data among servers
of a cluster during the system deployment or
addition of new servers or tenants;
management of tenant data distribution, based on
the collected statistics, including the creation of
additional data copies and moving data to
another server;
diagnosis of the system for the need of adding
new computing nodes and storage devices;
managing the replication.
This component of the system has the highest value
since the performance of an application depends on
the success of its work.
3 ANALYSIS OF EXISTING
APPLICATION
Analysis of existing applications and their mode of
operation is the first thing to study when designing
an imitation model. In the context of the multi-tenant
cluster theme the most interesting question is the
characteristics of the query flow, since this
component has the greatest impact on the results
obtained during the modelling. As the multi-tenant
cluster is a queuing system, the Poisson flow of
events is a good basic model of a query flow. The
key points to explore are:
1. intensity distribution of incoming query flows
among clients;
2. presence or absence of dependency between an
average time of query execution and
characteristics of the client which this query
belongs to;
3. characteristics of a customer base;
4. characteristics of customer base changes over
time.
Since questions 1 and 2 have a significant impact on
the distribution of queries between servers thus
making a decisive contribution to the assessment of
the efficiency of load balancing across the cluster as
a whole, they are very important. The answer to the
fourth question will allow us to adequately simulate
the dynamism inherent to all cloud systems and
therefore offer an effective long-term data
management strategy.
There are many factors that possibly can affect
parameters of a client query flow. At the initial stage
of the study it was decided to take the size of the
data that the client stores in the cloud as its key
characteristic. The relationship between this
parameter and the intensity of the query flow or an
average time of query execution has been studied.
The following assumptions seemed to be reasonable:
1. the most of client schemas are approximately of
the same size, but there are also significant (but
rare) variations in both directions;
2. client query flow intensity is directly dependent
on the size of client data (the greater data the
client has, the more often they are accessed);
3. the query execution time is directly dependent on
the size of client data (the greater data the client
has, the more data are accessed by the average
query, thus its execution time increases);
4. client data size and activity smoothly change
over time.
The verification of the above assumptions has been
performed on the basis of statistics and logs of the
existing multi-tenant cloud application. This
application is the online service that provides an
electronic flow of documents and accounting. The
diversity of offered services leads to the diversity of
possible scenarios of interaction between a client
and the application, thus making a complicated
query flow. The application uses Postgres SQL
server as its primary data storage. All management
stuff is performed by a set of specialized services
and routers. Currently, the cluster consists of about
Third International Symposium on Business Modeling and Software Design
238