Evaluating Data Integrity in the Cloud using the UPPAAL
Sachi Nishida
1
and Yoshiyuki Shinkawa
2
1
Fujitsu FSAS Inc., 13-2 Nakamaruko, Nakahara-ku, Kawasaki, Kanagawa, Japan
2
Graduate School of Science and Technology, Ryukoku University, 1-5 Seta Oe-cho Yokotani, Otsu, Shiga, Japan
Keywords:
Cloud Computing, UPPAAL, Transaction Processing, Data Integrity.
Abstract:
There are several considerations when implementing a transaction processing system in cloud environments
like Google App Engine (GAE). One of the most critical ones is the data integrity, since the cloud provides us
with limited capability for it. Therefore we need to evaluate the applications and the cloud platform carefully
from the data integrity viewpoint. This paper presents a model based data integrity evaluation method using the
UPPAAL model checker. In order to make the model reusable, we built it as a set of application independent
functional modules. On the other hand, the application unique functionalities are to be included in the model
as UPPAAL functions written by the C-like UPPAAL language. The data integrity evaluation is performed
in two different ways. One is a simulation based method in which the model is executed by the UPPAAL
simulator to obtain the resultant variable values. The other is a verification based method in which the given
integrity constraints are examined by the UPPAAL verifier using full state space search of the model.
1 INTRODUCTION
Data integrity is one of the most critical concern
for distributed and concurrent systems, especially for
those in cloud environments, e.g. Google App En-
gine (GAE) (Sanderson, 2009),Amazon Web Ser-
vices (AWS) (van Vliet and Paganelli, 2011), or IBM
Bluemix (IBM, 2015). One of the typical systems
is “database transaction processing”, and the data in-
tegrity becomes crucial issue to make such systems
robust (Nishida and Shinkawa, 2014).
Therefore, the evaluation of the data integrity,
from both application and platform viewpoints, for
transaction processing in the cloud seems important
to the spread of cloud computing. However, there are
several difficulties in evaluating and validating this
data integrity for transaction processing. The above
difficulties are mainly caused by the different princi-
ple of the data integrity from traditional transaction
processing, which is adopted by the cloud.
This new and different principle is referred to as
“BASE” standing for Basically Available, Soft state,
and Eventually consistent (Pritchett, 2008). The basic
differences between the “BASE” and the traditional
principle ACID”
1
(Gray and Reuter, 1993), both of
which are a set of properties to be satisfied in order to
1
ACID” is standing for Atomicity, Consistency,
Isolation, and Durability.
guarantee the data integrity in transaction processing,
are
1. While the ACID restricts the concurrent database
accesses within a critical section, the BASE al-
lows arbitrary concurrent database accesses from
any transaction by Basically Available property.
2. While the ACID postulates the transparent repli-
cation of the databases, the BASE tolerates the
non-transparent replication by Soft state property.
3. While the ACID aims at the data integrity at ev-
ery instant, the BASE tries for achieving the data
integrity within some duration by Eventually con-
sistent.
According to the above differences between these two
principles, namely, BASE and ACID, we need a dif-
ferent approach to evaluating the data integrity in the
cloud.
Since this evaluation must be performed before
system implementation, we need a precise model that
reflects the cloud platform mechanism implementing
the BASE principle, along with the detailed applica-
tion logic that determines the data values. The reason
why is that the data integrity of transaction processing
is affected by both of them
However, most modeling tools are specialized to
a specific aspect of a system, e.g. software speci-
fication languages like “Z” (Spivey, 2008), “VDM”
304
Nishida, S. and Shinkawa, Y.
Evaluating Data Integrity in the Cloud using the UPPAAL.
DOI: 10.5220/0006005103040309
In Proceedings of the 11th International Joint Conference on Software Technologies (ICSOFT 2016) - Volume 1: ICSOFT-EA, pages 304-309
ISBN: 978-989-758-194-6
Copyright
c
2016 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
(Fitzgerald et al., 2004), and so on, which are special-
ized to the functional aspect, system modeling tools
like “finite state machines”, “Petri Nets”
2
(Reisig,
1985), “SDL” (Thiel, 2001) and so on, which are
specialized to the behavioral aspect, and architecture
oriented modeling tools like “UML class diagrams”,
“block diagrams”, and so on, which are specialized to
the structural aspect.
On the other hand, it is desirable to express mul-
tiple aspects of a system simultaneously in a single
model for accurate evaluation of the data integrity.
For this purpose, we use the UPPAAL model checker
(David et al., 2015) as a modeling and evaluation tool,
since it can express the behavior of a system as a set
of timed automata connected through communication
channels, along with the functional and data structure
specifications using a C-like language provided by the
tool.
The rest of the paper is organized as follows. In
section 2, we introduce the basic concepts of the
data integrity in the cloud, along with the transac-
tion behavior following the BASE principle. Section
3 shows how transaction processing in the cloud is
modelled using the UPPAAL. Section 4 discusses an
evaluation and validation method for the data integrity
using the UPPAAL.
2 TRANSACTION PROCESSING
IN THE CLOUD
Data integrity in transaction processing has been hith-
erto relying on the ACID principle that is guaran-
teed by a transaction processing monitor (TPM) un-
der which they are running. One of the backgrounds
of the ACID is that the serialized execution of transac-
tions always maintains the data integrity. Therefore,
at the implementation level, the TPM isolates and
serializes the critical sections of each transaction by
locking mechanism. In addition, the ACID implicitly
presumes the transparent replication, or synchronous
replication of databases, to realize “C” (Consistency)
property of it.
This approach could cause the reduction of
database availability, along with the performance
degradation of transaction processing. In cloud com-
puting, the ACID principle becomes a burden too
much to guarantee the high availability, scalability,
and stable performance of a system. Therefore, more
light-weight mechanism to maintain the data integrity
is desired in the cloud.
2
Except for higher order Petri Nets like Coloured Petri
Nets (Jensen and Kristensen, 2009)
The “BASE” principle is a newly introduced prin-
ciple to compromise the conflicting requirements, that
is, availability and integrity in the cloud. In order to
improve the availability, the BASE principle does not
serialize the critical sections of each transaction, and
allows the non-transparent or asynchronous replica-
tion. For maintaining the data integrity in such an
environment, a TPM following the BASE principle
provides us with “version” information instead of a
locking mechanism, in order to determine whether the
referred data are valid. If some of the referred data are
invalid, the relevant transaction aborts the database
updates. This mechanism is known as “optimistic
locking”.
Before discussing the data integrity of “BASE
transactions”
3
, we need to define the concept of “data
integrity” rigorously, in order to evaluate it effec-
tively. The term “integrity” or “data integrity” is
used differently in various contexts. For example, it
focuses on the relationships between directories and
file allocation information (e.g. i-node in the case of
UNIX) at the operating system level, while it means
the referential integrity that requires the existence of
specific key values at the DBMS (Database Manage-
ment System) level.
On the other hand, at the application level, there
are no commonly recognized definitions, since it de-
pends on the semantics of the data rather than their
structure. Therefore, it seems more difficult to ex-
press the data integrity at this level than the former
two levels. In order to determine whether an applica-
tion can be performed in the cloud in the form of a
transaction, we have to evaluate the data integrity at
the application level in this circumstance.
Consequently, we first need to define rigorously
the concept of “data integrity” at the application level
using a unified notation. The data integrity at the ap-
plication level can be defined as a set of constraints
or rules on database occurrences. One of the ways to
express these constraints is to use predicate logic for-
mulae (Shinkawa, 2012). In order to compose these
logic formulae, we first have to define the language L
and the structure S to provide the syntax and seman-
tics of the formulae.
The languageL stipulates the usage of symbols
regarding constants, variables, functions, predicates,
and logical operators. In the data integrity evaluation,
the L deals with database related matters. Therefore
each symbol for a variable or constant represents an
entity or its value in the databases. As for functions
and predicates, there are two kinds of them, that is,
database oriented and application oriented. The for-
3
Transactions to be run under the control of a TPM im-
plementing the BASE principle
Evaluating Data Integrity in the Cloud using the UPPAAL
305
mer ones are the functions or predicates defined in
a database manipulation language like SQL. On the
other hand, the latter ones are those used in a specific
application domain, e.g. production control, product
management, or customer management applications.
Therefore, we are to prepare the L as composed
of two parts, namely the application independent part
and application dependent part. While the former
part can be reused among the different application do-
mains, the latter need to be built every time a new
application is dealt with. On the other hand, the struc-
ture S consists of the domain of discourse D and the
interpretation I. All the objects that are referred to
from the functions and predicates, or assigned to vari-
ables and constants, must be the elements of the above
D. In our case, this D includes
1. all the database instances DB
i
,
2. all the database records r
(i)
j
in each DB
i
, and
3. all the attribute values a
(ij)
k
in each r
(i)
j
The interpretation I maps each symbol in the L to
an actual entity defined over the D. Some of functions
and predicates are predefined in a database manipula-
tion language e.g. SQL. Other symbols in the L are
defined during the modeling process discussed in the
succeeding sections.
Using the above language L and the structure S,
each constraint to express an integrity rule is rep-
resented by a standardized logic formula (PCNF
Prenex Conjunctive Normal Form)
Q
1
···Q
n
(
j
i
P
ij
(t
(ij)
1
···t
(ij)
m
i j
)
)
where Q
i
is a variable with the quantifier ”, e.g. x
i
,
P
i j
is a predicate, and t
(ij)
k
is a term composed of vari-
ables, constants, and functions (Schoening, 2008).
There are several kinds of constraints regarding
data integrity, e.g. restrictions on data values, ex-
istence of a record with some specific key, or con-
straints on the values derived from a set of records.
However, any kinds of those constraints can be ex-
pressed by the above predicate logic formulae in the
form of PCNF.
3 MODELING THE
TRANSACTION PROCESSING
WITH THE BASE PRINCIPLE
Once the rules or constraints for data integrity are ex-
pressed in the form of predicate logic formulae, the
next step is to model the transaction processing with
the BASE principle, which updates the databases in
the cloud. For this modeling, we use the UPPAAL
model checker, or the UPPAAL in short. The UP-
PAAL expresses a system as a set of finite timed au-
tomata with variables, along with the functions that
manipulate them.
Each timed automaton consists of states (locations
in terms of the UPPAAL) and arcs (edges in terms
of the UPPAAL) that represent the state transitions.
Boolean expressions with clock type variables can be
used as time constraints, which are associated with
any above stated location or edge. These timed au-
tomata are defined as parameterizable templates, and
must be instantiated by the system definition.
In order to make the models reusable, these tem-
plates should be appropriately modularized. In our
approach, the behavior of the transaction processing
in the cloud is categorized into five types, namely
“Initialization”, “Scheduling”, “Thread”, “Database”,
and “Replication”. Each module works as follows.
1. The “Initialization” module sets up the databases
to be used during the simulation. The databases
are expressed as three-dimensional integer ar-
rays. The first dimension represents the replica-
tion number, the second represents the record or
row number, and the third represents the attributes
in the database schema.
2. The “Scheduling” module sends a transaction to
one of the “Thread” instances to process it. A
transaction is expressed in the form of integer ar-
ray, each element of which represents an argument
(or parameter) to the transaction. These integer
arrays compose a two dimensional “transaction
list”
4
.
3. The “Thread” module performs the functionality
of each transaction. The functionality is deter-
mined by the transaction type and the specified
arguments in the “transaction list”. The database
update requests from a transaction are routed to
the “Database” module through a UPPAAL chan-
nel.
4. The “Database” module is to be instantiated as
many as database replications. Each instance
reads and updates a specific replication of a
database expressed in the form of an integer ar-
ray.
5. The “Replication” module tries to keep the repli-
cated databases identical in an asynchronous way,
4
Since the UPPAAL allows only fixed size for arrays,
and each transaction type could require the different num-
ber of arguments, an individual two-dimensional array is
defined for each transaction type independently.
ICSOFT-EA 2016 - 11th International Conference on Software Engineering and Applications
306
implementing the Soft state property. This mod-
ule is instantiated only once and deals with all the
databases and their replications.
In addition to the above modules, we have to pre-
pare several functions to make the model executable
and verifiable. These functions are written by a C-like
UPPAAL unique language. While the model structure
is common among application domain, these func-
tions are application unique and must be built for each
application domain.
Figure 1 through Figure 5 show an example of
the above UPPAAL modules. As stated above, the
structure of their ve modules can be commonly used
among different application domains, including func-
tion names and channels associated with edges and
locations in the model. However, the implementation
of these functions and other supplemental functions
are differently built among different application do-
mains.
End
StartPrepare
initS!
dbLoad()
Figure 1: Initialization Module.
Next
ContinueSchedule
INIT
tranPos++
tranPos < tranMax
tranPos >= tranMax
S2T!initS?
Figure 2: Scheduler Module.
For example, the function “dbLoad()” in Figure 1
represents a function that initialize all the databases in
the system, and the name is common for all applica-
tions. However, its implementation usually different
among them, depending on the structure and usage of
the databases. Figure 6 shows a sample implemen-
tation of the “dbLoad()” function for a simplified li-
brary application.
When executing the model, these modules are in-
stantiated through the system definition as shown in
Figure 7. In this example, three concurrent threads
and three database replications are assumed.
These modules operate as follows.
SelectWait
step>0
D2T[m]?
step<0
step == 0
T2D[rn]!
x : int[0,2]
S2T?
setRn(x),
selectTran()
Figure 3: Thread Module.
INIT
D2T[th]!
replication!
performRequest(),
repReq[r][0] = 1,
repReq[r][1] = th
k: int[0, 2]
T2D[r]?
selectTh(k)
Figure 4: Database Module.
1. Firstly, the “dbLoad” function of the Initialization
module is invoked to prepare all the databases. At
this time, only the associated edge is eligible for
transition, since other modules are waiting for sig-
nals through the UPPAAL channels.
2. After the completion of the “dbLoad” function,
the Scheduler module is activated through the
“initS” channel.
3. The scheduler module sends a signal to the Thread
module through the channel “S2T”.
4. The Thread module selects a transaction from
the predefined transaction list by the “selectTran”
function, and sends a signal to the Database mod-
ule through the channel “T2D[m]” channel, where
the “m” is a replication number.
5. The Database module accesses and updates the
databases.
INIT
replicate()
replication?
Figure 5: Replication Module.
Evaluating Data Integrity in the Cloud using the UPPAAL
307
void dbLoad ( ) {
for ( i : in t [ 0 , 2 ] ) {
for ( j : in t [ 0 , 1 4 ] ) {
int x = s e a r c h B o o k ( i , bookKey [ j ] ) ;
i f ( x > 0 ) {
for ( k : in t [ 0 , 3 ] ) bookM [ i ] [ j ] [ k ] = book [
i ] [ x ] [ k ] ;
}
}
}
for ( i : in t [ 0 , 2 ] ) {
for ( j : in t [ 0 , 8 ] ) {
int x = s e a r c h A c c o u n t ( i , a c c o untKey [ j ] ) ;
i f ( x > 0 ) {
for ( k : in t [ 0 , 3 ] ) accoun tM [ i ] [ j ] [ k ] =
a c c o u n t [ i ] [ x ] [ k ] ;
}
}
}
for ( i : in t [ 0 , 2 ] ) {
for ( j : in t [ 0 , 4 ] ) {
int x = s e a r c h L o a n ( i , l o anKe y [ j ] [ 0 ] , l oanK e y
[ j ] [ 1 ] ) ;
i f ( x > 0 ) {
for ( k : in t [ 0 , 3 ] ) loanM [ i ] [ j ] [ k ] = l o a n [
i ] [ x ] [ k ] ;
}
}
}
}
Figure 6: “dbLoad” Function.
/ / Place t e mp l at e i n s t a n t i a t i o n s here .
I n = I n i t i a t o r ( ) ;
Sc = S c h e d u l e r ( ) ;
T1 = Thread ( 0 ) ;
T2 = Thread ( 1 ) ;
T3 = Thread ( 2 ) ;
DB1 = DBM( 0 ) ;
DB2 = DBM( 1 ) ;
DB3 = DBM( 2 ) ;
REP = R e p l i c a t i o n ( ) ;
/ / L i s t one or more p r o c e ss e s to be composed i n t o
a sys tem .
s y s t e m I n , Sc , T1 , T2 , T3 , DB1, DB2, DB3 , REP ;
Figure 7: System Definition for Module Innstantiation.
6. The step 2 to 5 are repeated until all the predefined
transactions are processed.
The version control and commit/abort processes are
embedded in the Database module as functions.
4 DATA INTEGRITY
EVALUATION USING THE
UPPAAL
The UPPAAL model checker provides us with three
major functionalities. The first is a graphical model
editor with programming capability that we have used
in the previous section. The second is a model sim-
ulator that executes the model we build to show an
instance of its behavior. The third is a model veri-
fier that examines all the possible behavior whether
the model satisfies the given properties written in the
form of CTL (Computational Logic Tree) formulae.
Therefore, two alternative ways are available to
evaluate the data integrity of transaction processing.
The first is to execute the model to obtain the val-
ues of the variables for the database records at each
state transition. As discussed in the previous sec-
tion, the data integrity is expressed as a set of pred-
icate logic formulae in the form of PCNF. In the UP-
PAAL model, these logic formulae refer to the vari-
ables associated with the database records and at-
tributes. Therefore, we can determine whether the
data integrity is maintained in the transaction process-
ing by examining the above variables using a function
implementing each constraint logic formula. Since
this method can evaluate only one instance of the sys-
tem behavior selected by the simulation, we have to
perform the simulation for every possible behavior.
However, this possible behavior could be uncount-
able. Therefore this method would be sampling based
evaluation.
On the other hand, the UPPAAL verifier provides
us with a capability of full state space search against
a set of CTL formulae. In order to evaluate the data
integrity in this way, we have to transform a set of
predicate logic formulae into a set of CTL formulae.
Unlike the predicate logic formulae, CTL formulae
can include the path operator A” and “E” which deal
with state transition paths of a system, and tempo-
ral operator and which define the valida-
tion points of the formulae. In addition, there are
no quantifiers and in CTL. Therefore, sev-
eral considerations should be taken into account in the
above transformation from predicate logic formulae
into CTL formulae. These considerations include
1. If a property P must always holds in a predicate
logic formulae, the CTL formula is AP”.
2. If a property P always implies a property Q”,
then the CTL formula is A(P Q)”.
3. If a property P eventually implies a property
Q”, the CTL formula is A(P Q).
ICSOFT-EA 2016 - 11th International Conference on Software Engineering and Applications
308
4. If a property P must hold at specific point, we
introduce a boolean variable to express the point,
and set it true at the point in the model. In this
case we need to modify the model.
5. If the original predicate logic formula includes the
quantifiers and ”, we introduce a boolean
function into the model to examine whether all of
or some of the variables in the model satisfy the
formula. A model modification is required in this
case again.
After the above transformation is completed, we can
evaluate the data integrity by running the verifier that
the UPPAAL provides.
This CTL based evaluation seems simpler than
the simulator based one, however it performs full
state space search and consumes huge computing re-
sources. As a result, it takes long time to obtain the
result. In such cases, we need to reduce the model,
by decreasing the number of variables or values to be
assigned.
5 CONCLUSIONS
In cloud environments, the behavior of transaction
processing is considerably different from the tradi-
tional ones. One of the major reasons is that the cloud
introduces a new principle for the data integrity called
“BASE”, instead of the traditional ACID”. In order
to make the transaction processing stable in the cloud,
we need to reveal the behavior of it clearly, and eval-
uate the data integrity rigorously.
This paper proposed a model based data in-
tegrity evaluation using the UPPAAL model checker.
In order to make the model easily understand-
able and reusable, we composed it using ve func-
tional modules, namely, “Initialization”, “Schedul-
ing”, “Thread”, “Database”, and “Replication”, fol-
lowing the BASE principle. While the model struc-
ture can be reused among different application do-
mains, we need to build application unique functions
for the model.
The UPPAAL provides us with two different ways
to evaluate the data integrity. One is a simulation-
based evaluation which examines only one instance
of the behavior of transaction processing. The other is
a verifier-based evaluation which examines full state
space search to determine whether the given con-
straints are satisfied. While the latter way can evalu-
ate the integrity more precisely, we need to transform
the original predicate logic formulae into the CTL for-
mulae. In addition, it consumes huge computing re-
sources for full state space search. and takes long time
to obtain the evaluation results.
ACKNOWLEDGEMENTS
This work was supported by JSPS KAKENHI Grant
Number 25330094.
REFERENCES
David, A., Larsen, K. G., and Legay, A. (2015). UP-
PAAL SMC tutorial. In International Journal on Soft-
ware Tools for Technology Transfer Volume 17 Issue
4, pages 397–415. Springer.
Fitzgerald, J., Larsen, P., Mukherjee, P., Plat, N., and
Verhoef, M. (2004). Validated Designs for Object-
oriented Systems. Springer.
Gray, J. and Reuter, A. (1993). Transaction Processing:
Concepts and Techniques. Morgan Kaufmann.
IBM (2015). Bluemix. http://www.ibm.com/cloud-
computing/bluemix/.
Jensen, K. and Kristensen, L. (2009). Coloured Petri
Nets: Modeling and Validation of Concurrent Sys-
tems. Springer-Verlag.
Nishida, S. and Shinkawa, Y. (2014). Data Integrity in
Cloud Transactions. In Proc. 4th International Con-
ference on Cloud Computing and Services Science,
pages 457–462.
Pritchett, D. (2008). BASE: An ACID alternative. In ACM
QUEUE Volume 6 Issue 3, pages 48–55. ACM.
Reisig, W. (1985). Petri Nets: An Introduction. Springer.
Sanderson, D. (2009). Programming Google App Engine.
Oreilly & Associates Inc.
Schoening, U. (2008). Logic for Computer Scientists (Mod-
ern Birkhaeuser Classics). Birkhaeuser Boston.
Shinkawa, Y. (2012). CPN Based Data Integrity Evalua-
tion for Cloud Transactions. In Proc. 6rd International
Conference on Software Paradigm Trends, pages 267–
272.
Spivey, J. M. (2008). Understanding Z: A Specification
Language and its Formal Semantics. Cambridge Uni-
versity Press.
Thiel, A. M. (2001). Systems Engineering with SDL: Devel-
oping Performance-Critical Communication Systems.
Wiley.
van Vliet, J. and Paganelli, F. (2011). Programming Amazon
EC2. Oreilly & Associates Inc.
Evaluating Data Integrity in the Cloud using the UPPAAL
309