Logical Unified Modeling for NoSQL Databases

Fatma Abdelhedi

1,2

, Amal Ait Brahim

, Faten Atigui

and Gilles Zurfluh

Toulouse Institute of Computer Science Research (IRIT), Toulouse Capitole University, Toulouse, France

CBI

– TRIMANE, Paris, France

CEDRIC-CNAM, Paris, France

Keywords: Big Data, NoSQL, UML Conceptual Model, MDA, QVT.

Abstract: NoSQL data stores are becoming widely used to handle Big Data; these systems operate on schema-less

data model enabling users to incorporate new data into their applications without using a predefined

schema. But, there is still a need for a conceptual model to define how data will be structured in the

database. In this paper, we show how to store Big Data described by conceptual model within NoSQL

systems. For this, we use the Model Driven Architecture (MDA) that provides a framework for models

automatic transformation. Starting from a conceptual model describing a set of complex objects, we propose

transformation rules formalized with QVT to generate NoSQL physical models. To ensure efficient

automatic transformation and to limit the impacts related to technical aspects of NoSQL systems, we

propose a generic logical model that is compatible with the three types of NoSQL systems (column,

document and graph). We provide experiments of our approach using a case study related to the health care

field. The results of our experiments show that the proposed logical model can be effectively transformed

into different NoSQL physical models independently of their specific details.

1 INTRODUCTION

Company digital transformation is accompanied by

an exponential growth in data collected which is

known as Big Data. Generally, we describe Big Data

according to three vectors (Gartner, 2001): Volume

(many terabytes of data that need to be processed),

Variety (different data type including factors such as

format, structure, and sources) and Velocity (speed

of data loading and processing). Relational systems

representing the majority of DBMS, prove to be

inadequate for all applications, especially these

involving Big Data (Abello, 2015). As a result, new

kind of DBMS, known as “NoSQL” (Cattell, 2011),

has appeared. These systems, with flexible schemas,

are well suited for managing large volume of data.

They also offer good performance when scaling up

(Angadi, 2013). NoSQL encompasses a wide variety

of different systems that were developed to meet

specific needs. They can be classified into four basic

types: key-value, column-oriented, document and

graph-oriented. In this paper, we exclude the key-

value because column-oriented, document-oriented

and graph-oriented systems extend the concepts of

key-value systems (Abadi, 2008).

Big Data applications developers are faced with

the problem of storing data in NoSQL systems. To

address this problem, some solutions dealing with

model transformation have been proposed. Li et al.

(Li, 2014) propose MDA-based process to transform

UML class diagram into column-oriented physical

HBase model. Daniel et al. (Daniel, 2016) describe

mapping between an UML conceptual model and a

NoSQL physical model compatible only with graph-

oriented systems. In these works, the adopted

processes depend only on one type of NoSQL

systems (column-oriented in (Li, 2014) and graph-

oriented in (Daniel, 2016)). However, users need to

choose the system type most suited to their needs.

For example, processing operations require access to

hierarchically structured data, document-oriented is

the most adapted solution.

The main purpose of our work is to assist

developers in implementing Big Data on NoSQL

systems. For this, we propose a new MDA-based

process that transforms a conceptual data model

describing Big Data into several NoSQL physical

models. This automatic process allows developer to

choose the system type he wants to use.

The rest of the paper is structured as follows:

Abdelhedi, F., Brahim, A., Atigui, F. and Zurﬂuh, G.

Logical Uniﬁed Modeling for NoSQL Databases.

DOI: 10.5220/0006311702490256

In Proceedings of the 19th International Conference on Enterprise Information Systems (ICEIS 2017) - Volume 1, pages 249-256

ISBN: 978-989-758-247-9

249

Section 2 motivates our work using a case study in

the healthcare field, Section 3 introduces our MDA-

based approach, Section 4 presents a first

transformation that creates a NoSQL logical model

starting from UML class diagram, Section 5 presents

a second transformation that generates NoSQL

physical models from the logical model, Section 6

details our experiments and Section 7 reviews

previous work on models transformation. Finally,

Section 8 ends up with the conclusion and future

work.

2 MOTIVATION

To motivate and illustrate our work, we present a

case study in healthcare filed. This case study

concerns national or international scientific

programs for monitoring patients having serious

diseases. The main goal of this program is (1) to

collect data about disease development over time,

(2) to study interactions between different diseases

(3) to evaluate the short and medium-term effects of

their treatments. The medical program can last up to

3 years. Data collected from establishments involved

in such a program have the characteristics of Big

Data (the 3 V): Volume: The amount of data

collected from all the establishments in three years

can reach several terabytes. Variety: Data created

while monitoring patients come in different types ;

they can be (1) structured like patient's vital signs

(respiratory rate, blood pressure, temperature, etc.),

patient name, diagnosis codes, etc. (2) unstructured

such as patient histories, consultation summaries,

paper prescriptions, radiology reports, and (3) semi-

structured document such as the package leaflets of

medicinal products that provide a set of

comprehensible information enabling the use of the

medicinal product safely and appropriately.

Velocity: Some data are produced in continuous

flow by sensors; it must be processed in near real

time because it can be integrated into time-sensitive

processes (for example, some measurements, like

temperature, require an emergency medical

treatment if they cross a given threshold).

3 UMLtoNoSQL APPROACH

Our purpose in this paper is to define, specify and

automate a process for storing Big Data in NoSQL

systems. For this, we propose the process called

UMLtoNoSQL that automatically transforms a

conceptual model (UML class diagram) provided by

the developer into the physical model of the NoSQL

system he wants to use. In UMLtoNoSQL process,

we introduce a logical level between conceptual

(business description) and physical (technical

description) levels in which a generic model is

developed. This generic logical model has a double

interest: (1) compatible with the three NoSQL

systems, which allow developers to choose the

NoSQL system type that best meets their needs. (2)

independent of the technical aspects of NoSQL

systems that can evolve and create new versions. To

formalize and automate our process, we use the

Model Driven Architecture proposed by OMG.

One of the main aims of MDA is to separate the

functional specification of a system from the details

of its implementation in a specific platform

(Hutchinson, 2011). This architecture defines a

hierarchy of models from three points of view:

Computation Independent Model (CIM), Platform

Independent Model (PIM), and Platform Specific

Model (PSM) (Bézivin, 2001). Among this proposed

models, we use PIM and PSM.

Figure 1: Overview of UMLtoNoSQL process.

In our scenario, the UML and generic models

would conform to the PIM level. UMLtoNoSQL

process takes care of generating the generic model

(logical PIM) from the UML class diagram

(conceptual PIM). At the PSM level, we consider

three physical models that correspond to Cassandra

(column-oriented system), MongoDB (document-

oriented system) and Neo4J (graph-oriented system).

Figure 1 presents the different component of

UMLtoNoSQL process. UMLtoGenericModel (1) is

the first transformation (section 4) in UMLtoNoSQL

process. It is in charge of converting the input UML

class diagram (conceptual PIM) into the generic

logical model (2) conforming to the generic logical

metamodel proposed in Section 4; this metamodel

describes a data structure compatible with the three

types of NoSQL systems.

GenericModeltoPhysicalModel (3) is the second

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

250

transformation (section 5) in UMLtoNoSQL. It is in

charge of transforming the generic logical model

into NoSQL physical models (PSMs) (4).

We note that UMLtoNoSQL process generates

several NoSQL physical models from a UML class

diagram. In order to do this, it’s necessary to

parameters (transformation rules). To illustrate our

work, we have taken as example three physical

models that correspond to: Cassandra, MongoDB

and Neo4j systems. If the developer chooses to use

another system, the process must be completed by

adding new parameters specific to this system.

4 UML TO GENERIC MODEL

TRANSFORMATION

In this section we present the UMLtoGenericModel

transformation, which is the initial transformation in

our approach presented in Figure 2. We first define

the source (UML Class Diagram) and the target

(Generic Logical Model), and then we focus on the

transformation itself.

4.1 Source: UML Class Diagram

(Conceptual PIM)

UML is widely accepted as a standard modelling

language for describing data. Therefore, we model

Big Data using UML class diagram. A Class

Diagram (CD) is defined as a tuple (N, C, L), where:

N is the CD name,

C is a set of classes. Classes are composed from

structural and behavioural constituents. In this paper,

we consider only the structural part; since the

operations are linked to the behaviour, we will not

take them into account. The schema of each class c

∈C is a tuple (N, A, IdentO), where:

 c.N is the class name,

 c. A = {a





,…,a





} is a set of q attributes. The

schema of each attribute a



∈ A is a pair (N,C)

where “a



.N” is the attribute name and “a



.C”

the attribute type; C can be a predefined class,

i.e. a standard data type (String, Integer, Date,

etc.) or a business class (class defined by

user),

 c.IdentO is a special attribute of c; it has a name

IdentO



.N and a type called “Oid”. In this

paper, an attribute whose type is “Oid”

represents a unique object identifier, i.e. an

attribute whose value distinguishes an object

from all other objects of the same class,

L is a set of links. Each link l between n classes,

with n>=2, is defined as a tuple (N, Ty,



), where:

 l.N is the link name.

 l.Ty is the link type : Association, Composition

or Generalization.

 l.Pr



= {pr





,…,pr





} is a set of n pairs. ∀i ∈

{1,..,n}, pr





= (c,cr), where pr





.c is a linked

class and pr





.cr is the cardinality placed next

to c. Note that pr





.cr can contain a null value

if no cardinality is indicated next to c (like in

generalization link).

Class diagram metamodel is shown in figure 2.

This metamodel is adapted from the one proposed

by OMG.

Figure 2: Source Metamodel.

4.2 Target: Generic Logical Model

(Logical PIM)

This section aims to define a generic logical model

that describes data according to the common

characteristics to the three types of NoSQL systems:

column-oriented, document-oriented and graph-

oriented. In the generic logical model, DataBase

(DB) is defined as a tuple (N, T, R), where:

N is the database name,

T is a set of tables. The schema of each table t ∈

T is a tuple (N, A, IdentL), where:

 t.N is the table name,

 t.A = {a





,…,a





} is a set of q attributes that will

be used to define rows of t; each row can have

a variable number of attributes. The schema of

each attribute a



∈ A is a pair (N,Ty) where

“a



.N” is the attribute name and “a



.Ty” the

attribute type.

 t.IdentL is a special attribute of t; it has a name

IdentL



.N and a type called “row-key”. In this

paper, an attribute whose type is “row-key”

represents a unique row identifier, i.e. an

Logical Uniﬁed Modeling for NoSQL Databases

251

attribute whose value distinguishes a row from

all other rows of the same table,

R is a set of relationships. A relationship is a link

between two tables. In the generic logical model

there are only binary relationships between tables

Each relationship r ∈R between 



and 



is defined

as a tuple (N, Ty,



), where:

 r.N is the relationship name.

 r.Ty is the relationship type : Association,

Composition or Generalization.

 r.Pr



= {pr





,pr





} is a set of two pairs. ∀i ∈

{1,2}, pr





= (t,cr), where pr





.t is a related

table and 





.cr is the cardinality placed next

to t.

Metamodel of the proposed generic logical

model is shown in figure 3.

Figure 3: Target Metamodel.

4.3 Transformation Rules

R1: each CD is transformed into a database DB,

where DB.N = CD.N.

R2: each class c ∈ C is transformed into a table t ∈

DB, where t.N = c.N, IdentL



.N = IdentO



.N.

R3: each attribute a



∈ c.A is transformed into an

attributea



, where a



.N = a



.N, a



.Type = a



.C, and

added to the attribute list of its transformed

container t such as a



∈ t.A.

R4: each link l ∈ L between two classes c



and c



transformed into a relationship r ∈ R between two

tables t



and t



, where r.N = l.N, r.Ty = l.Ty et r.Pr



= l.Pr



, where t



and t



are the tables representing c



and c



R5: each link l ∈ L between n classes {c



,…,c



}

(n>=3) is transformed into (1) a new table t



, where



.N = l.N, and (2) n relationships {r



,…,r



}, ∀ i ∈

{1,..,n} r



links t



to another table t



representing a

related class c



, where r



.N = (t



.N)_(t



.N), r



.Type =

Association and r



.Pr



= {(t



, cr), (t



, cr)}.

R6: each association class c



between n classes



,…,c



} (n>=2) is transformed like a link

between multiple classes (R5) using (1) a new table



, where t



.N = l.N, (2) n relationships



,…,r



}, ∀ i ∈ {1,..,n} r



links t



to another table



representing a related class c



, where r



.N =



.N)_(t



.N), r



.Type = Association et r



.Pr



{(t



, cr), (t



, cr)}. Like any other class, t



contain

also a set of attributes A, where t



.A = c



.A.

These transformation rules have been formalized

with QVT (figure 4.b), which is a standard defined

by OMG for expressing models transformation.

5 GENERIC MODEL TO

PHYSICAL MODEL

TRANSFORMATION

In this section we present the second transformation

in our approach UMLtoNoSQL (figure 2). It is in

charge of creating NoSQL physical models from the

proposed generic logical model.

5.1 Source: Generic Logical Model

(Logical PIM)

The source of this transformation is the target of the

previous UMLtoGenericModel transformation.

5.2 Target: NoSQL Physical Models

(PSMs)

To illustrate our approach, we have chosen three

NoSQL systems: Cassandra, MongoDB and Neo4j;

three well known NoSQL systems.

5.2.1 Cassandra Physical Model

In Cassandra physical model, KeySpace (KS) is the

top-level container that owns all the elements. It’s

defined as a tuple (N, F), where:

N is the keyspace name,

F is a set of columns-families. The schema of

each columns-family f ∈F is a tuple (N, Cl,

PrimaryKey), where:

 f.N is the columns-family name,

 f.Cl = {cl



,…,cl



} is a set of q columns that

will be used to define rows of f; each row can

have a variable number of columns. The

{XOR}

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

252

schema of each column cl ∈ Cl is a pair

(N,Ty) where “cl.N” is the column name and

“cl.Ty” the column type.

 f.PrimaryKey is a special column of f; it has a

name PrimaryKey



. N and a type

PrimaryKey



.Ty (standard data type).

PrimaryKey



identifies each row of f.

5.2.2 MongoDB Physical Model

In MongoDB physical model, DataBase (DB



) is

the top-level container that owns all the elements.

It’s defined as a tuple (N, Cll), where:

N is the database name,

Cll is a set of collections. The schema of each

collection cll ∈Cll is a tuple (N, Fl, Id), where:

 cll.N is the collection name,

 cll.Fl = Fl



∪ Fl



is a set of atomic and

complex fields that will be used to define

rows, called documents, of Cll; each

document can have a variable number of

fields. The schema of each atomic field fl



∈



is a tuple (N,Ty) where “fl



.N” is the field

name and “fl



.Ty” is the field type. The

schema of each complex field fl



∈ Fl



also a tuple (N, Fl’) where fl



.N is the field

name and fl



.Fl’ is a set of fields where Fl’

Fl.

 cll.Id is a special field of cll; it has a name



.N and a type Id



.Ty (standard data

type). Id



identifies uniquely each document

of cll.

5.2.3 Neo4j Physical Model

In Neo4j physical model, Graph (GR) is the top-

level container that owns all the elements. It’s

defined as a tuple (V, E), where:

V is a set of vertex. The schema of each vertex v

∈ V is a tuple (L, Pro, Id), where:

 v.L is the vertex label,

 v.Pro = {pro



,…,pro



} is a set of q properties.

The schema of each property pro ∈ Pro is a

pair (N,Ty), where “pro.N” is the property

name and “pro.Ty” the property type.

 v.Id is a special property of v; it has a name



.N, a type Id



.Ty and the constraint “Is

Unique ”. It identifies uniquely v in the graph.

E is a set of edges. The schema of each edge e ∈

E is a tuple (L, H



, H



), where:

 e.L is the edge label,

 e.H



and e.H



are the nodes related by e.

5.3 Transformation Rules

Several solutions can ensure the transformation of

the generic logical model into a NoSQL physical

model. We provide all transformation possibilities

available; the developer chooses the one that meets

better his needs. We note that the set of solutions

proposed in this section is not inclusive. More

marginal solutions may be considered.

5.3.1 To Cassandra Physical Model

R1: each database DB is transformed into a

keyspace KS, where KS.N = DB.N.

R2: each table t ∈ DB is transformed into a

columns-family f ∈ KS, where f.N = t.N,

PrimaryKey



.N = IdentL



.N.

R3: each attribute 



∈ t.A is transformed into a

column cl, where cl.N = 



.N, cl.Ty = 



.Ty, and

added to the column list of its transformed container

f such as cl ∈ f.Cl.

R4: each relationship r ∈ R between two tables t



and t



is transformed by using references. Cassandra

does not support imbrication; the only solution we

can use to express relations between columns-

families consists in using references.

Depending on the relationship type, we distinguish

the following solutions:

 if r = (N, Association, {(t



,cr),(t



,cr)}), we

transform r according to its cardinalities :

o if r = (N, Association, {(t



,*),(t



,1)}), there

are two possible solutions:

Solution 1: r is transformed into a new column cl

referencingf



(the columns-family representingt



where cl.N = (f



.N)_Ref et cl.Ty =

PrimaryKey



.Ty, and added to the columns list of

f



(the columns-family representingt



) such as cl ∈



.Cl.

Solution 2: r is transformed into a new multivalued

column cl referencingf



(the columns-family

representingt



), where cl.N = (f



.N)_Ref et cl.Ty =

set<PrimaryKey



>.Ty, and added to the columns

list of f



(the columns-family representingt



) such

as cl ∈ f



.Cl.

o if r = (N, Association, {(t



,1),(t



,1)}) : r is

transformed into a new column cl referencing

the columns-family f representing one of the

two related tables (t



or t



), where cl.N =

(f.N)_Ref et cl.Ty = PrimaryKey



.Ty, and

added to the columns list of the columns-

family f’ representing the other related table

such as cl ∈ f’.Cl.

Logical Uniﬁed Modeling for NoSQL Databases

253

o if r = (N, Association, {(t



,*),(t



,*)}), two

solutions could be considered:

Solution 1: r is transformed into a new multivalued

column cl referencing the columns-family f

representing one of the two related tables (t



or t



where cl.N = (f.N)_Ref et cl.Ty =

set<PrimaryKey



>.Ty, and added to the columns list

of the columns-family f’ representing the other

related table such as cl ∈ f’.Cl.

Solution 2: r is transformed into a new columns-

family f, where f.N = r.N, f.Cl = {cl



,cl



}, cl



.N =



.N)_Ref, cl



.Ty = PrimaryKey



.Ty, cl



.N =



.N)_Ref and cl



.Ty = PrimaryKey



.Ty, where f



and f



are the columns-families represent t



and t



 if r = (N, Composition, {(t



,1),(t



,*)}) : in

composition relationship, cardinality of the

composite is 1 which means that a component

could be included in at most one composite at

a time and the cardinality of the component is

* which means that the composite could have

multiple components. To transform it, there

are two possible solutions:

Solution 1: r is transformed into a new multivalued

column cl referencing the columns-family f



representing the component (t



), where cl.N =



.N)_Ref and cl.Ty = set <PrimaryKey



.Ty ,

and added to the columns list of the columns-family



representing the composite (t



) such as cl ∈ f



.Cl.

Solution 2: r is transformed into a new column cl

referencing the columns-family f



representing the

composite (t



), where cl.N = (f



.N)_Ref et cl.Ty =

PrimaryKey



.Ty , and added to the columns list of

the columns-family f



representing the component



) such as cl ∈ f



.Cl.

 if r = (N, Generalization, {(t



,1),(t



, null)}) : in

generalization relationship between a super-

table t



and a sub-table t



, cardinality of the

super-table is 1 which means that each

instance of the sub-table is also an indirect

instance of the super-table. Because of this,

generalization relationship is also informally

called "Is A" relationship. We transform it into

a new column cl referencing the columns-

family f



representing the super-table (t



where cl.N = (f



.N)_Ref et cl.Ty =

PrimaryKey



.Ty , and added to the columns

list of the columns-family f



representing the

sub-table(t



) such as cl ∈ f



.Cl.

5.3.2 To MongoDB Physical Model

R1: each database DB is transformed into a

MongoDB database 



, where 



.N = DB.N.

R2: each table t ∈ DB is transformed into a

collection cll ∈ 



, where cll.N = t.N et Id



=IdentL



.N.

R3: each attribute 



∈ t.A is transformed into a

field fl, where fl.N = 



.N, fl.Ty = 



.Ty, and added

to the field list of its transformed container cll such

as fl ∈ cll.



R4: a relationship r between two tables 



and 



could be transformed in MongoDB by using

references or imbrication. Depending on the

relationship type, we distinguish the following

solutions:

 if r = (N, Association, {(t



,cr),(t



,cr)}), we

transform r according to its cardinalities :

o if r = (N, Association, {(t



,*),(t



,1)}), there

are two possible solutions:

Solution 1: r is transformed into a new field fl

referencingcll



(the collection representingt



where fl.N = (cll



.N)_Ref and fl.Ty = Id





.Ty, and

added to the fields list of cll



(the collection

representingt



) such as fl ∈ cll



.Fl



Solution 2: r is transformed into a new multivalued

field fl referencingcll



(the collection

representingt



), where fl.N = (cll



.N)_Ref and fl.Ty

= set<Id





>.Ty, and added to the field list of cll



(the collection representingt



) such as fl ∈ cll



.Fl



o if r = (N, Association, {(t



,1),(t



,1)}) : r is

transformed into a new field fl referencing the

collection cll representing one of the two

related tables (t



or t



), where fl.N =

(cll.N)_Ref and fl.Ty = Id



.Ty, and added to

the field list of cll’ representing the other

related table such as fl ∈ cll’.



o if r = (N, Association, {(t



,*),(t



,*)}), two

solutions could be considered:

Solution 1: r is transformed into a new multivalued

field fl referencing the collection cll representing

one of the two related tables (t



or t



), where fl.N =

(cll.N)_Ref and fl.Ty = set<Id



>.Ty, and added to

the field list of cll’ representing the other related

table such as fl ∈ cll’.Fl



Solution 2: r is transformed into a new collection cll,

where cll.N = r.N, cll.Fl = {fl



,fl



}, fl



.N =

(cll



.N)_Ref, fl



.Ty = Id





.Ty, fl



.N = (cll



.N)_Ref

and fl



.Ty = Id





.Ty, where cll



and cll



are the

collections representing t



and t



 if r = (N, Composition, {(t



,1),(t



,*)}) : there

are three possible solutions:

Solution 1: r is transformed by embedding the

collection cll



representing the component (t



) in the

collection cll



representing the composite (t



where cll



∈ cll



.Fl



ICEIS 2017 - 19th International Conference on Enterprise Information Systems

254

Solution 2: r is transformed into a new field fl

referencing the collection cll



representing the

composite (t



), where fl.N = (cll



.N)_Ref et fl.Ty =



.Ty , and added to the field list of the collection

cll



representing the component (t



) such as fl ∈

cll



.Fl



Solution 3: r is transformed into a new multivalued

field referencing the collection cll



representing the

component (t



), where fl.N = (cll



.N)_Ref and fl.Ty

= set <Id



.Ty , and added to the field list of the

collection cll



representing the composite (t



) such

as fl ∈ cll



.Fl



 if r = (N, Generalization, {(t



,1),(t



, null)}) :

it’s transformed into a new field fl referencing

the collection cll



representing the super-table



), where fl.N = (cll



.N)_Ref and fl.Ty =



.Ty , and added to the field list of the

collection cll



representing the sub-table (t



)

such as fl ∈ cll



.



5.3.3 To Neo4j Physical Model

R1: each table t ∈ DB is transformed into a vertex v

∈ V, where v.L = t.N, Id



.N = IdentL



.N.

R2: each attribute 



∈ t.A is transformed into a

property pro, where pro.N = 



.N, pro.Ty = 



.Ty,

and added to the property list of its transformed

container v such as pro ∈ v.Pro.

R3: Each relationship r between two tables t



and



is transformed into an edge e, where e.L = r.N,

e.H



= v



and e.H



= v



, where v



and v



are the

vertex representing t



and t



6 EXPERIMENTS

In this section, we first provide the implementation

of UMLtoGenericModel transformation as presented

in sections 4, and then we show how to generate

NoSQL physical models starting from the proposed

generic logical model.

6.1 Experimental Environment

We carry out the experimental assessment using: (1)

Eclipse Modeling Framework (EMF): a modeling

framework and code generation to support the

development of tools and model driven applications;

(2) Ecore: a metamodeling language that we used to

create our metamodels; (3) XML Metadata

Interchange (XMI): XML based standard for

metadata interchange. We use XMI to create models

as instance of metamodels; and (4) Query / View /

Transformation (QVT): the OMG standard for

models transformation.

6.2 UMLtoGenericModel

Transformation

Before proceeding to the implementation of the

transformation rules, first, we created Ecore

metamodels corresponding to the source (Figure 2)

and the target (Figure 3). The next step is to create

an instance of the source metamodel (Figure 4.a). In

parallel, we used QVT plugin to implement the

transformation rules (Figure 4.b); the comments in

the script indicate the rules used. Finally, we tested

the transformation by running the QVT script. The

execution of this script provides the generic logical

model (figure 4.c).

(a) : Source Model (excerpts).

(b) QVT Rules. (c) Target Model..

Figure 4: UMLtoGenericModel transformation.

6.3 GenericModeltoPhysicalModel

Transformation

The generic model proposed in this paper does not

imply a specific system. Consequently, several

NoSQL physical models could be generated starting

Logical Uniﬁed Modeling for NoSQL Databases

255

from it. Lack of place, we show only Cassandra

physical model (figure 5.b) generated from the

generic logical model (figure 4.a). An excerpt from

the QVT transformation script is shown in Figure

5.a.

(a) QVT Rules. (b) Cassandra Model.

Figure 5: GenericModeltoCassandraModel transformation.

7 RELATED WORK

To the best of our knowledge, there are only few

solutions that have dealt with NoSQL databases

conceptual modeling. Chevalier et al. (Chevalier,

2015) defined a set of rules to map a

multidimensional model into column-oriented and

document-oriented models. The links between facts

and dimensions have been converted using

imbrications. Although the transformation process

proposed by authors start from a conceptual level

(multidimensional model), this specific model is

different from the UML standard; it contains facts,

dimensions and one type of links only. Other studies

investigate the process of transforming relational

databases into HBase (Li, 2010) and MongoDB

(Vajk, 2013). However, the relational model does

not present the semantic richness of UML

(especially through the several types of relationships

between classes: association, composition,

generalization, etc.). Few works have presented

approaches to implement UML conceptual models

in NoSQL databases. Li et al. (Li, 2014) propose a

MDA-based approach to transform UML class

diagram into HBase. After building the source and

the target metamodels, the authors have proposed

mapping rules to realize the transformation from the

conceptual level to the physical level. These rules

are applicable to HBase, only. Daniel et al. (Daniel,

2016) describe the mapping between UML

conceptual models and graph databases via an

intermediate graph metamodel. These rules are

specific to graph databases used as a framework for

managing complex data with many connections.

Generally, this kind of NoSQL databases is used in

social networks where data are highly connected.

8 CONCLUSION AND

PERSPECTIVES

In this paper we have presented a MDA-based

approach to implement UML conceptual model

describing Big Data in NoSQL systems. Our

approach consists of a chain of transformations that

generate a generic logical model compatible with the

three types of NoSQL systems (column, document

and graph) and independent of a specific NoSQL

platform, which makes it easier to transform it into

several NoSQL physical models. As future work, we

plan to complete our transformation process and

propose a mapping for OCL expressions defined in

the conceptual model; queries languages provided

by NoSQL databases could be used for this.

REFERENCES

Angadi, A., Gull, K., 2013. Growth of New Databases &

Analysis of NOSQL Datastores. In IJARCSSE.

Cattell, R., 2011. Scalable SQL and NoSQL data stores. In

ACM SIGMOD Record.

Abelló, A., 2015. Big data design. In DOLAP.

Hutchinson, J., Rouncefield, M., Whittle, J., 2011. Model-

driven engineering practices in industry. In ICSE.

Li, C., 2010. Transforming relational database into HBase:

A case study. In ICSESS.

Douglas, L., 2001. 3d data management: Controlling data

volume, velocity and variety. Gartner.

Bézivin, J., Gerbé, O,. 2001. Towards a precise definition

of the OMG/MDA framework. In ASE.

Chevalier, M., El Malki, M., Kopliku, A., Teste, O.,

Tournier, R., 2015. How Can We Implement a

Multidimensional Data Warehouse Using NoSQL?. In

ICEIS.

Abadi, D., Madden, S., Hachem, N., 2008. Column-stores

vs. row-stores: How different are they really?. In

ICMD.

Li, Y., Gu, P., Zhang, C., 2014. Transforming UML class

diagrams into HBase based on metamodel. In ISEEE.

Daniel, G., Sunyé, G., Cabot, J., 2016. Mapping

conceptual schemas to graph databases. In ER.

Vajk, T., Feher, P., Fekete, K., Charaf, H., 2013.

Denormalizing data into schema-free databases. In

CogInfoCom.

main(){

Source.rootObjects()[DataBase]‐>

maptoKeySpace();

}



mapping

DataBase::toKeySpace():KeySpace{

name:=self.name;

columnsfamily:=self.tables‐>map

toColumnsFamily();

}

‐‐TransformingCtableto

ColumnsFamily

mappingCOLM

::Table::toColumnsFamily():Cassandr

a::ColumnsFamily{

name:=self.name;

column:=self.columns‐>map

toColumn();

}

‐‐TransformingAttributetoColumn

mappingCOLM

::Column::toColumn():Cassandra::Col

umn{

if(self.cType="Rid"){

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

256