Evaluating Data Integrity in the Cloud using the UPPAAL

Sachi Nishida

and Yoshiyuki Shinkawa

Fujitsu FSAS Inc., 13-2 Nakamaruko, Nakahara-ku, Kawasaki, Kanagawa, Japan

Graduate School of Science and Technology, Ryukoku University, 1-5 Seta Oe-cho Yokotani, Otsu, Shiga, Japan

Keywords:

Cloud Computing, UPPAAL, Transaction Processing, Data Integrity.

Abstract:

There are several considerations when implementing a transaction processing system in cloud environments

like Google App Engine (GAE). One of the most critical ones is the data integrity, since the cloud provides us

with limited capability for it. Therefore we need to evaluate the applications and the cloud platform carefully

from the data integrity viewpoint. This paper presents a model based data integrity evaluation method using the

UPPAAL model checker. In order to make the model reusable, we built it as a set of application independent

functional modules. On the other hand, the application unique functionalities are to be included in the model

as UPPAAL functions written by the C-like UPPAAL language. The data integrity evaluation is performed

in two different ways. One is a simulation based method in which the model is executed by the UPPAAL

simulator to obtain the resultant variable values. The other is a veriﬁcation based method in which the given

integrity constraints are examined by the UPPAAL veriﬁer using full state space search of the model.

1 INTRODUCTION

Data integrity is one of the most critical concern

for distributed and concurrent systems, especially for

those in cloud environments, e.g. Google App En-

gine (GAE) (Sanderson, 2009),Amazon Web Ser-

vices (AWS) (van Vliet and Paganelli, 2011), or IBM

Bluemix (IBM, 2015). One of the typical systems

is “database transaction processing”, and the data in-

tegrity becomes crucial issue to make such systems

robust (Nishida and Shinkawa, 2014).

Therefore, the evaluation of the data integrity,

from both application and platform viewpoints, for

transaction processing in the cloud seems important

to the spread of cloud computing. However, there are

several difﬁculties in evaluating and validating this

data integrity for transaction processing. The above

difﬁculties are mainly caused by the different princi-

ple of the data integrity from traditional transaction

processing, which is adopted by the cloud.

This new and different principle is referred to as

“BASE” standing for Basically Available, Soft state,

and Eventually consistent (Pritchett, 2008). The basic

differences between the “BASE” and the traditional

principle “ACID”

(Gray and Reuter, 1993), both of

which are a set of properties to be satisﬁed in order to

“ACID” is standing for Atomicity, Consistency,

Isolation, and Durability.

guarantee the data integrity in transaction processing,

are

1. While the ACID restricts the concurrent database

accesses within a critical section, the BASE al-

lows arbitrary concurrent database accesses from

any transaction by Basically Available property.

2. While the ACID postulates the transparent repli-

cation of the databases, the BASE tolerates the

non-transparent replication by Soft state property.

3. While the ACID aims at the data integrity at ev-

ery instant, the BASE tries for achieving the data

integrity within some duration by Eventually con-

sistent.

According to the above differences between these two

principles, namely, BASE and ACID, we need a dif-

ferent approach to evaluating the data integrity in the

cloud.

Since this evaluation must be performed before

system implementation, we need a precise model that

reﬂects the cloud platform mechanism implementing

the BASE principle, along with the detailed applica-

tion logic that determines the data values. The reason

why is that the data integrity of transaction processing

is affected by both of them

However, most modeling tools are specialized to

a speciﬁc aspect of a system, e.g. software speci-

ﬁcation languages like “Z” (Spivey, 2008), “VDM”

304

Nishida, S. and Shinkawa, Y.

Evaluating Data Integrity in the Cloud using the UPPAAL.

DOI: 10.5220/0006005103040309

In Proceedings of the 11th International Joint Conference on Software Technologies (ICSOFT 2016) - Volume 1: ICSOFT-EA, pages 304-309

ISBN: 978-989-758-194-6

(Fitzgerald et al., 2004), and so on, which are special-

ized to the functional aspect, system modeling tools

like “ﬁnite state machines”, “Petri Nets”

(Reisig,

1985), “SDL” (Thiel, 2001) and so on, which are

specialized to the behavioral aspect, and architecture

oriented modeling tools like “UML class diagrams”,

“block diagrams”, and so on, which are specialized to

the structural aspect.

On the other hand, it is desirable to express mul-

tiple aspects of a system simultaneously in a single

model for accurate evaluation of the data integrity.

For this purpose, we use the UPPAAL model checker

(David et al., 2015) as a modeling and evaluation tool,

since it can express the behavior of a system as a set

of timed automata connected through communication

channels, along with the functional and data structure

speciﬁcations using a C-like language provided by the

tool.

The rest of the paper is organized as follows. In

section 2, we introduce the basic concepts of the

data integrity in the cloud, along with the transac-

tion behavior following the BASE principle. Section

3 shows how transaction processing in the cloud is

modelled using the UPPAAL. Section 4 discusses an

evaluation and validation method for the data integrity

using the UPPAAL.

2 TRANSACTION PROCESSING

IN THE CLOUD

Data integrity in transaction processing has been hith-

erto relying on the ACID principle that is guaran-

teed by a transaction processing monitor (TPM) un-

der which they are running. One of the backgrounds

of the ACID is that the serialized execution of transac-

tions always maintains the data integrity. Therefore,

at the implementation level, the TPM isolates and

serializes the critical sections of each transaction by

locking mechanism. In addition, the ACID implicitly

presumes the transparent replication, or synchronous

replication of databases, to realize “C” (Consistency)

property of it.

This approach could cause the reduction of

database availability, along with the performance

degradation of transaction processing. In cloud com-

puting, the ACID principle becomes a burden too

much to guarantee the high availability, scalability,

and stable performance of a system. Therefore, more

light-weight mechanism to maintain the data integrity

is desired in the cloud.

Except for higher order Petri Nets like Coloured Petri

Nets (Jensen and Kristensen, 2009)

The “BASE” principle is a newly introduced prin-

ciple to compromise the conﬂicting requirements, that

is, availability and integrity in the cloud. In order to

improve the availability, the BASE principle does not

serialize the critical sections of each transaction, and

allows the non-transparent or asynchronous replica-

tion. For maintaining the data integrity in such an

environment, a TPM following the BASE principle

provides us with “version” information instead of a

locking mechanism, in order to determine whether the

referred data are valid. If some of the referred data are

invalid, the relevant transaction aborts the database

updates. This mechanism is known as “optimistic

locking”.

Before discussing the data integrity of “BASE

transactions”

, we need to deﬁne the concept of “data

integrity” rigorously, in order to evaluate it effec-

tively. The term “integrity” or “data integrity” is

used differently in various contexts. For example, it

focuses on the relationships between directories and

ﬁle allocation information (e.g. i-node in the case of

UNIX) at the operating system level, while it means

the referential integrity that requires the existence of

speciﬁc key values at the DBMS (Database Manage-

ment System) level.

On the other hand, at the application level, there

are no commonly recognized deﬁnitions, since it de-

pends on the semantics of the data rather than their

structure. Therefore, it seems more difﬁcult to ex-

press the data integrity at this level than the former

two levels. In order to determine whether an applica-

tion can be performed in the cloud in the form of a

transaction, we have to evaluate the data integrity at

the application level in this circumstance.

Consequently, we ﬁrst need to deﬁne rigorously

the concept of “data integrity” at the application level

using a uniﬁed notation. The data integrity at the ap-

plication level can be deﬁned as a set of constraints

or rules on database occurrences. One of the ways to

express these constraints is to use predicate logic for-

mulae (Shinkawa, 2012). In order to compose these

logic formulae, we ﬁrst have to deﬁne the language L

and the structure S to provide the syntax and seman-

tics of the formulae.

The languageL stipulates the usage of symbols

regarding constants, variables, functions, predicates,

and logical operators. In the data integrity evaluation,

the L deals with database related matters. Therefore

each symbol for a variable or constant represents an

entity or its value in the databases. As for functions

and predicates, there are two kinds of them, that is,

database oriented and application oriented. The for-

Transactions to be run under the control of a TPM im-

plementing the BASE principle

Evaluating Data Integrity in the Cloud using the UPPAAL

305

mer ones are the functions or predicates deﬁned in

a database manipulation language like SQL. On the

other hand, the latter ones are those used in a speciﬁc

application domain, e.g. production control, product

management, or customer management applications.

Therefore, we are to prepare the L as composed

of two parts, namely the application independent part

and application dependent part. While the former

part can be reused among the different application do-

mains, the latter need to be built every time a new

application is dealt with. On the other hand, the struc-

ture S consists of the domain of discourse D and the

interpretation I. All the objects that are referred to

from the functions and predicates, or assigned to vari-

ables and constants, must be the elements of the above

D. In our case, this D includes

1. all the database instances DB

2. all the database records r

(i)

in each DB

, and

3. all the attribute values a

(ij)

in each r

(i)

The interpretation I maps each symbol in the L to

an actual entity deﬁned over the D. Some of functions

and predicates are predeﬁned in a database manipula-

tion language e.g. SQL. Other symbols in the L are

deﬁned during the modeling process discussed in the

succeeding sections.

Using the above language L and the structure S,

each constraint to express an integrity rule is rep-

resented by a standardized logic formula (PCNF –

Prenex Conjunctive Normal Form)

···Q

(

∨

∧

(ij)

···t

(ij)

i j

)

where Q

is a variable with the quantiﬁer “∀”, e.g. ∀x

i j

is a predicate, and t

(ij)

is a term composed of vari-

ables, constants, and functions (Schoening, 2008).

There are several kinds of constraints regarding

data integrity, e.g. restrictions on data values, ex-

istence of a record with some speciﬁc key, or con-

straints on the values derived from a set of records.

However, any kinds of those constraints can be ex-

pressed by the above predicate logic formulae in the

form of PCNF.

3 MODELING THE

TRANSACTION PROCESSING

WITH THE BASE PRINCIPLE

Once the rules or constraints for data integrity are ex-

pressed in the form of predicate logic formulae, the

next step is to model the transaction processing with

the BASE principle, which updates the databases in

the cloud. For this modeling, we use the UPPAAL

model checker, or the UPPAAL in short. The UP-

PAAL expresses a system as a set of ﬁnite timed au-

tomata with variables, along with the functions that

manipulate them.

Each timed automaton consists of states (locations

in terms of the UPPAAL) and arcs (edges in terms

of the UPPAAL) that represent the state transitions.

Boolean expressions with clock type variables can be

used as time constraints, which are associated with

any above stated location or edge. These timed au-

tomata are deﬁned as parameterizable templates, and

must be instantiated by the system deﬁnition.

In order to make the models reusable, these tem-

plates should be appropriately modularized. In our

approach, the behavior of the transaction processing

in the cloud is categorized into ﬁve types, namely

“Initialization”, “Scheduling”, “Thread”, “Database”,

and “Replication”. Each module works as follows.

1. The “Initialization” module sets up the databases

to be used during the simulation. The databases

are expressed as three-dimensional integer ar-

rays. The ﬁrst dimension represents the replica-

tion number, the second represents the record or

row number, and the third represents the attributes

in the database schema.

2. The “Scheduling” module sends a transaction to

one of the “Thread” instances to process it. A

transaction is expressed in the form of integer ar-

ray, each element of which represents an argument

(or parameter) to the transaction. These integer

arrays compose a two dimensional “transaction

list”

3. The “Thread” module performs the functionality

of each transaction. The functionality is deter-

mined by the transaction type and the speciﬁed

arguments in the “transaction list”. The database

update requests from a transaction are routed to

the “Database” module through a UPPAAL chan-

nel.

4. The “Database” module is to be instantiated as

many as database replications. Each instance

reads and updates a speciﬁc replication of a

database expressed in the form of an integer ar-

ray.

5. The “Replication” module tries to keep the repli-

cated databases identical in an asynchronous way,

Since the UPPAAL allows only ﬁxed size for arrays,

and each transaction type could require the different num-

ber of arguments, an individual two-dimensional array is

deﬁned for each transaction type independently.

ICSOFT-EA 2016 - 11th International Conference on Software Engineering and Applications

306

implementing the Soft state property. This mod-

ule is instantiated only once and deals with all the

databases and their replications.

In addition to the above modules, we have to pre-

pare several functions to make the model executable

and veriﬁable. These functions are written by a C-like

UPPAAL unique language. While the model structure

is common among application domain, these func-

tions are application unique and must be built for each

application domain.

Figure 1 through Figure 5 show an example of

the above UPPAAL modules. As stated above, the

structure of their ﬁve modules can be commonly used

among different application domains, including func-

tion names and channels associated with edges and

locations in the model. However, the implementation

of these functions and other supplemental functions

are differently built among different application do-

mains.

End

StartPrepare

initS!

dbLoad()

Figure 1: Initialization Module.

ContinueSchedule

INIT

tranPos++

tranPos < tranMax

tranPos >= tranMax

S2T!initS?

Figure 2: Scheduler Module.

For example, the function “dbLoad()” in Figure 1

represents a function that initialize all the databases in

the system, and the name is common for all applica-

tions. However, its implementation usually different

among them, depending on the structure and usage of

the databases. Figure 6 shows a sample implemen-

tation of the “dbLoad()” function for a simpliﬁed li-

brary application.

When executing the model, these modules are in-

stantiated through the system deﬁnition as shown in

Figure 7. In this example, three concurrent threads

and three database replications are assumed.

These modules operate as follows.

SelectWait

step>0

D2T[m]?

step<0

step == 0

T2D[rn]!

x : int[0,2]

S2T?

setRn(x),

selectTran()

Figure 3: Thread Module.

INIT

D2T[th]!

replication!

performRequest(),

repReq[r][0] = 1,

repReq[r][1] = th

k: int[0, 2]

T2D[r]?

selectTh(k)

Figure 4: Database Module.

1. Firstly, the “dbLoad” function of the Initialization

module is invoked to prepare all the databases. At

this time, only the associated edge is eligible for

transition, since other modules are waiting for sig-

nals through the UPPAAL channels.

2. After the completion of the “dbLoad” function,

the Scheduler module is activated through the

“initS” channel.

3. The scheduler module sends a signal to the Thread

module through the channel “S2T”.

4. The Thread module selects a transaction from

the predeﬁned transaction list by the “selectTran”

function, and sends a signal to the Database mod-

ule through the channel “T2D[m]” channel, where

the “m” is a replication number.

5. The Database module accesses and updates the

databases.

INIT

replicate()

replication?

Figure 5: Replication Module.

Evaluating Data Integrity in the Cloud using the UPPAAL

307

void dbLoad ( ) {

for ( i : in t [ 0 , 2 ] ) {

for ( j : in t [ 0 , 1 4 ] ) {

int x = s e a r c h B o o k ( i , bookKey [ j ] ) ;

i f ( x > 0 ) {

for ( k : in t [ 0 , 3 ] ) bookM [ i ] [ j ] [ k ] = book [

i ] [ x ] [ k ] ;

}

for ( i : in t [ 0 , 2 ] ) {

for ( j : in t [ 0 , 8 ] ) {

int x = s e a r c h A c c o u n t ( i , a c c o untKey [ j ] ) ;

i f ( x > 0 ) {

for ( k : in t [ 0 , 3 ] ) accoun tM [ i ] [ j ] [ k ] =

a c c o u n t [ i ] [ x ] [ k ] ;

}

for ( i : in t [ 0 , 2 ] ) {

for ( j : in t [ 0 , 4 ] ) {

int x = s e a r c h L o a n ( i , l o anKe y [ j ] [ 0 ] , l oanK e y

[ j ] [ 1 ] ) ;

i f ( x > 0 ) {

for ( k : in t [ 0 , 3 ] ) loanM [ i ] [ j ] [ k ] = l o a n [

i ] [ x ] [ k ] ;

}

Figure 6: “dbLoad” Function.

/ / Place t e mp l at e i n s t a n t i a t i o n s here .

I n = I n i t i a t o r ( ) ;

Sc = S c h e d u l e r ( ) ;

T1 = Thread ( 0 ) ;

T2 = Thread ( 1 ) ;

T3 = Thread ( 2 ) ;

DB1 = DBM( 0 ) ;

DB2 = DBM( 1 ) ;

DB3 = DBM( 2 ) ;

REP = R e p l i c a t i o n ( ) ;

/ / L i s t one or more p r o c e ss e s to be composed i n t o

a sys tem .

s y s t e m I n , Sc , T1 , T2 , T3 , DB1, DB2, DB3 , REP ;

Figure 7: System Deﬁnition for Module Innstantiation.

6. The step 2 to 5 are repeated until all the predeﬁned

transactions are processed.

The version control and commit/abort processes are

embedded in the Database module as functions.

4 DATA INTEGRITY

EVALUATION USING THE

UPPAAL

The UPPAAL model checker provides us with three

major functionalities. The ﬁrst is a graphical model

editor with programming capability that we have used

in the previous section. The second is a model sim-

ulator that executes the model we build to show an

instance of its behavior. The third is a model veri-

ﬁer that examines all the possible behavior whether

the model satisﬁes the given properties written in the

form of CTL (Computational Logic Tree) formulae.

Therefore, two alternative ways are available to

evaluate the data integrity of transaction processing.

The ﬁrst is to execute the model to obtain the val-

ues of the variables for the database records at each

state transition. As discussed in the previous sec-

tion, the data integrity is expressed as a set of pred-

icate logic formulae in the form of PCNF. In the UP-

PAAL model, these logic formulae refer to the vari-

ables associated with the database records and at-

tributes. Therefore, we can determine whether the

data integrity is maintained in the transaction process-

ing by examining the above variables using a function

implementing each constraint logic formula. Since

this method can evaluate only one instance of the sys-

tem behavior selected by the simulation, we have to

perform the simulation for every possible behavior.

However, this possible behavior could be uncount-

able. Therefore this method would be sampling based

evaluation.

On the other hand, the UPPAAL veriﬁer provides

us with a capability of full state space search against

a set of CTL formulae. In order to evaluate the data

integrity in this way, we have to transform a set of

predicate logic formulae into a set of CTL formulae.

Unlike the predicate logic formulae, CTL formulae

can include the path operator “A” and “E” which deal

with state transition paths of a system, and tempo-

ral operator “” and “♢” which deﬁne the valida-

tion points of the formulae. In addition, there are

no quantiﬁers “∀” and “∃” in CTL. Therefore, sev-

eral considerations should be taken into account in the

above transformation from predicate logic formulae

into CTL formulae. These considerations include

1. If a property “P” must always holds in a predicate

logic formulae, the CTL formula is “AP”.

2. If a property “P” always implies a property “Q”,

then the CTL formula is “A(P → Q)”.

3. If a property “P” eventually implies a property

“Q”, the CTL formula is “A(P → ♢Q).

ICSOFT-EA 2016 - 11th International Conference on Software Engineering and Applications

308

4. If a property “P” must hold at speciﬁc point, we

introduce a boolean variable to express the point,

and set it true at the point in the model. In this

case we need to modify the model.

5. If the original predicate logic formula includes the

quantiﬁers “∀” and “∃”, we introduce a boolean

function into the model to examine whether all of

or some of the variables in the model satisfy the

formula. A model modiﬁcation is required in this

case again.

After the above transformation is completed, we can

evaluate the data integrity by running the veriﬁer that

the UPPAAL provides.

This CTL based evaluation seems simpler than

the simulator based one, however it performs full

state space search and consumes huge computing re-

sources. As a result, it takes long time to obtain the

result. In such cases, we need to reduce the model,

by decreasing the number of variables or values to be

assigned.

5 CONCLUSIONS

In cloud environments, the behavior of transaction

processing is considerably different from the tradi-

tional ones. One of the major reasons is that the cloud

introduces a new principle for the data integrity called

“BASE”, instead of the traditional “ACID”. In order

to make the transaction processing stable in the cloud,

we need to reveal the behavior of it clearly, and eval-

uate the data integrity rigorously.

This paper proposed a model based data in-

tegrity evaluation using the UPPAAL model checker.

In order to make the model easily understand-

able and reusable, we composed it using ﬁve func-

tional modules, namely, “Initialization”, “Schedul-

ing”, “Thread”, “Database”, and “Replication”, fol-

lowing the BASE principle. While the model struc-

ture can be reused among different application do-

mains, we need to build application unique functions

for the model.

The UPPAAL provides us with two different ways

to evaluate the data integrity. One is a simulation-

based evaluation which examines only one instance

of the behavior of transaction processing. The other is

a veriﬁer-based evaluation which examines full state

space search to determine whether the given con-

straints are satisﬁed. While the latter way can evalu-

ate the integrity more precisely, we need to transform

the original predicate logic formulae into the CTL for-

mulae. In addition, it consumes huge computing re-

sources for full state space search. and takes long time

to obtain the evaluation results.

ACKNOWLEDGEMENTS

This work was supported by JSPS KAKENHI Grant

Number 25330094.

REFERENCES

David, A., Larsen, K. G., and Legay, A. (2015). UP-

PAAL SMC tutorial. In International Journal on Soft-

ware Tools for Technology Transfer Volume 17 Issue

4, pages 397–415. Springer.

Fitzgerald, J., Larsen, P., Mukherjee, P., Plat, N., and

Verhoef, M. (2004). Validated Designs for Object-

oriented Systems. Springer.

Gray, J. and Reuter, A. (1993). Transaction Processing:

Concepts and Techniques. Morgan Kaufmann.

IBM (2015). Bluemix. http://www.ibm.com/cloud-

computing/bluemix/.

Jensen, K. and Kristensen, L. (2009). Coloured Petri

Nets: Modeling and Validation of Concurrent Sys-

tems. Springer-Verlag.

Nishida, S. and Shinkawa, Y. (2014). Data Integrity in

Cloud Transactions. In Proc. 4th International Con-

ference on Cloud Computing and Services Science,

pages 457–462.

Pritchett, D. (2008). BASE: An ACID alternative. In ACM

QUEUE Volume 6 Issue 3, pages 48–55. ACM.

Reisig, W. (1985). Petri Nets: An Introduction. Springer.

Sanderson, D. (2009). Programming Google App Engine.

Oreilly & Associates Inc.

Schoening, U. (2008). Logic for Computer Scientists (Mod-

ern Birkhaeuser Classics). Birkhaeuser Boston.

Shinkawa, Y. (2012). CPN Based Data Integrity Evalua-

tion for Cloud Transactions. In Proc. 6rd International

Conference on Software Paradigm Trends, pages 267–

272.

Spivey, J. M. (2008). Understanding Z: A Speciﬁcation

Language and its Formal Semantics. Cambridge Uni-

versity Press.

Thiel, A. M. (2001). Systems Engineering with SDL: Devel-

oping Performance-Critical Communication Systems.

Wiley.

van Vliet, J. and Paganelli, F. (2011). Programming Amazon

EC2. Oreilly & Associates Inc.

Evaluating Data Integrity in the Cloud using the UPPAAL

309