JSON-based Interoperability Applying

the Pull-parser Programming Model

Leandro Pulgatti and Marcos Didonet Del Fabro

C3SL Labs, Federal University of Paran

a, Curitiba, Brazil

Keywords:

NoSQL Models, JSON Interoperability, Pull-parser Programming Model.

Abstract:

The JSON format is been applied in a variety of applications: it is established as the de-facto standard for

representing document stores; it is widely used to achieve interoperability and as the exchange format in RE-

STful web APIs. For these reasons, it is necessary to provide interoperability between JSON and other NoSQL

formats. There are several approaches that aims to translate between different NoSQL formats, however, most

of them attempt to be generic and do not focus on JSON. They aim on providing an abstract and generic

representation capturing all the data models constructs and to provide wrapper-like structures, or to develop

pairs of translators. In this paper, we present an approach that uses the JSON data model as driving format

for interoperability with distinct NoSQL data models. We take advantage of its nested textual structure to

apply the pull-parser programming model to process it and to develop translators between JSON and a set of

representative NoSQL formats. We focus on the JSON extraction and on the development and application of

the data transformations. We validate our approach through an implementation handling a large number of

data representation strategies.

1 INTRODUCTION

The JSON (Java Script Object Notation) is a data for-

mat that has been used in a large variety of applicati-

ons. It is today established as the de-facto standar for

representing document stores, for instance, the Mon-

goDb database. It is used as well as the request/re-

sponse format of several RESTful web APIs. Many

NoSQL stores have connectors to achieve interopera-

bility through JSON, a role that was previously ﬁlled

by XML documents.

There are several solutions that aim to provide

JSON and NoSQL interoperability. However, most

of them try to be generic to support JSON and several

other formats as input and also as output, covering

data migration issues between NoSQL data sources

(Bugiotti et al., 2013). This generality comes with the

drawback of implementing integrated frameworks or

datamodels not always easy to use.

The approaches can be classiﬁed into two main

groups. First, the approaches that provide an abstract

and generic representation that captures all the con-

structs of different NoSQL formats, such as (Bugiotti

et al., 2013; Atzeni et al., 2014; Alomari et al., 2015).

These generic representations act like wrapper struc-

tures to access the data sources. The access can be

done directly in the original sources or through the

translation into the common format. However, it is

necessary to maintain the wrapper components or fra-

mework throughout the distinct data sources life cy-

cle. In addition, all the sources need to follow the

API convention, which may not be always a technical

option. Second, many solutions provide translations

between speciﬁc NoSql Database (Scavuzzo et al.,

2014). The translations include a limited number of

systems, often between two distinct NoSQL databa-

ses. These approaches are more efﬁcient, since they

are adapted for speciﬁc scenarios. However, their ex-

tension requires the implementation of new translati-

ons, which may be a costly task. All the given ap-

proaches need to store the full object in memory, or to

use some lazy loading API. Several other works focus

on the migration between RDBMSs and NoSQL, but

they are not in the central scope of this paper.

To overcome these issues, we present an appro-

ach that focuses on the JSON format as the interope-

rability data format, and that develops a set of rules

to translate to a series of NoSQL formats. We have

two main contributions. First, we use the pull-parser

programming model (Slomiski, 2001) to read the in-

put JSON objects. The pull-parser programing model

has already been used in different scenarios to parse

Pulgatti, L. and Didonet Del Fabro, M.

JSON-based Interoperability Applying the Pull-parser Programming Model.

DOI: 10.5220/0006646400950102

In Proceedings of the 20th International Conference on Enterprise Information Systems (ICEIS 2018), pages 95-102

ISBN: 978-989-758-298-1

XML

and it has been started to be used with JSON,

but not in an interoperability context. This enables to

take advantage of well-formed nested JSONs and to

read only the parts of the input that are being proces-

sed. Second, we provide a set of interoperability rules

from JSON to a set of representative NoSQL formats.

These rules, which are fully described in the paper,

are simple to develop and to extend. They handle 12

NoSQL formats, which cover mostly of the existing

representations (Bugiotti et al., 2013).

We validate our approach with an implementation

of a prototype that applies the transformations bet-

ween these data formats, using a public data set as

input.

2 RELATED WORK

There are several works aiming to interoperate/con-

vert/migrate/access between different NoSQL databa-

ses. We separate them into two major categories.

The ﬁrst category concentrates on creating wrap-

pers or some kind of homogeneous way to access dif-

ferent data sources, and to translate between the data

sources only when necessary. The CDPort framework

(Alomari et al., 2015) aims at building a standardized

way to access RDBMS and NoSQL Databases though

a common data model and an API, both in a cloud-

based environment. Each entity can have multiple

properties. The different data structures are always

accessed with the same primitives. (Michel et al.,

2014) proposes a mapping language called xR2RML,

to convert heterogeneous data formats to RDF (Re-

source Description Framework), extending the work

from (Consortium et al., 2012) for a NoSql Databases

. (Chung et al., 2014) developed a GUI that connects

to the column store Hbase. Despite being focused on

the translation of queries, the study on the difference

of the models also serves to conduct a migration. (At-

zeni et al., 2012) presents a programming interface

common to NoSql Databases and which can be ex-

tended to a RDBMS, called Save Our Systems (SOS).

The solution has three main components: a standard

interface, one meta-layer responsible for storing the

form of the data and speciﬁc handlers for each data-

base system. It is the foundation to many other works

for uniform data access, including our idea of acces-

sing the databases only through get() and set() met-

hods. (Scavuzzo et al., 2014) creates a system for mi-

grating data between NoSql columnar databases. He

creates a client/server application which uses a meta-

This model is supported by APIs such as Xerces,

kXML, or SAX.

model designed solely to handle columnar databases,

taking into account details like indexing.

The second major category uses a metamodel, or

other kind of intermediate representation, that helps

on the NoSQl migration process. The goal is to dimi-

nish the number of translation between the data sour-

ces, compared to the case of NxN direct translations.

(Atzeni et al., 2014) is an extension of the work of

(Atzeni et al., 2012), but focusing on the interface

utilization. A series of articles present the NoAM

(NoSQL Abstract Model) (Bugiotti et al., 2013; Bu-

giotti et al., 2014; Atzeni et al., 2016), developing so-

lutions based on the observation that the NoSql Da-

tabases share similar features, specially the capacity

to access their data in what was called ”data access

units”. The classiﬁcation of representation strategies

of this work are the basis for our classiﬁcation and

for the kinds of rules implemented. (Bugiotti et al.,

2014) focuses on describing a data modeling and a

data design methodology to ensure that the data can

be represented in the major NoSql Databases models,

and this generic model can be reﬁned or redesigned

to better accommodate in the chosen NoSql Databa-

ses database. This work is a direct derivate from (Bu-

giotti et al., 2013) when the database design problem

are mainly addressed.

Our approach has two main differences from these

previous works. First, it uses JSON as base format,

since it is well-established and has many support, wit-

hout the need to create extra control structures. Se-

cond, the input processing and rule execution is done

on a stream of objects using the pull-parser program-

ming model, not an API or other similar data access

process.

3 JSON-BASED

INTEROPERABILITY

In this section we present our approach for JSON-

based data interoperability. First, we present how we

process the nested JSON format using the pull parser

programming model. Second, we describe the migra-

tion rules covering different representation strategies.

A JSON document is denoted by the ordered

list JSON = (e

, e

, ..., e

), where each element e

, v

) contains a key k

and a value v

, which is either

a String s

, a numeral n

, a complex object co

or a

collection of elements C

= (ec

, ec

, ..., ec

, where

each ec

is itself another element.

Consider the listing below to illustrate the syntax

of JSON. The key is the identiﬁer of each element,

such as ”Person”, ”ﬁrstName” or ”type”, always in

the left side. The elements values, in the right side,

ICEIS 2018 - 20th International Conference on Enterprise Information Systems

may store three kinds of values: 1) simple objects

or scalars, such as the String ”Smith” or the number

25; 2) complex objects, composed by other objects,

such as the ”Person” object; 3) collections, such as

the ”phoneNumber” collection, formed by two ele-

ments. This format allows to manipulate and persist

a wide diversity of complex values(Hecht and Jablon-

ski, 2011).

{ "Person":

{"firstName":"John","lastName":"Smith","age":25,

"phoneNumber": [

{ "type": "home", "number": "212 555-1234" },

{ "type": "fax", "number": "646 555-4567" } ] }

}

3.1 Pull-parsing a JSON

The processing of the input JSON elements is done by

reading a stream of objects, which means it is not pos-

sible to obtain a complete object in advance to store

it in memory. We apply the pull-parser programming

model to read the input objects and to identify its li-

mits and structure. The pull-parser programming mo-

del has been used to parse XML documents read from

streams in different scenarios. We apply a similar

methodology to read JSON input streams.

In this model, the processing algorithm receives

a stream of objects SO = (o

, o

, ..., o

), where each

object o

is a tuple < ek

, ov

>; ek

is the event kind

and ov

is the object value. The object value is an

input JSON element or it can be a NULL value.

The event kinds are separated into four categories:

1) to state the object boundaries (START OBJECT ,

END OBJECT ), 2) to state the boundaries of col-

lections (START ARRAY , END ARRAY ), 3) to iden-

tify objects (KEY NAME) and 4) to set the ob-

ject types (VALU E ST RING, VALUE NUMBER,

VALUE T RUE, VALU E FALSE, VALUE NULL).

We adopt the same kind of events supported by the

JSonParser API

, since we consider they are enough

for many interoperability requirements.

We added the events kinds before each JSON ele-

ment to illustrate what would be the virtual input of a

stream of objects.

{START_OBJECT

"Person"KEY_NAME:

{START_OBJECT "firstName"KEY_NAME: "John"

VALUE_STRING, "lastName"KEY_NAME:

"Smith"VALUE_STRING, "age"KEY_NAME: 25

VALUE_NUMBER,

"phoneNumber"KEY_NAME : [START_ARRAY

{START_OBJECT "type"KEY_NAME:

"home"VALUE_STRING, "number"KEY_NAME:

http://docs.oracle.com/javaee/7/api/javax/json/stream/

JsonParser.html

"212 555-1234"VALUE_STRING }END_OBJECT,

{START_OBJECT "type"KEY_NAME:

"fax"VALUE_STRING, "number"KEY_NAME:

"646 555-4567"VALUE_STRING }END_OBJECT

]END_ARRAY

}END_OBJECT

Every time the application developer calls a next()

method or function, a new event is processed, which

means it is categorized and the input objects are read.

The read objects are stored in memory using an inter-

mediate nested data format.

Each object of the intermediate data format stored

in memory has the following ﬁelds:

ObjectId a unique identiﬁer for each object.

DataValue the value of the given object, if any.

Label the event associated.

FatherObj the ObjectId of the father’s object, if any.

The unique identiﬁer is created automatically as a

numerical sequence added to each new object. The

event is set up as soon as the objects are read. The

hierarchy between the objects depends on the exis-

tence of collection boundaries events.

The output of the pull parser is illustrated below. It

shows the intermediate format after parsing the pho-

neNumber attribute.

Ob j e c tI d : 8

Da t aV a lu e : p h on e N um b e r

La b el : K E Y _ NA M E

Fa t he r Ob j : 1

Ob j e c tI d : 9

Da t aV a lu e : nu l l

La b el : S T AR T _ AR R A Y

Fa t he r Ob j : 8

( t h e n e s te d o b j e ct s w i th in

the p ho n e n u mb er arr ay )

Ob j e c tI d : 22

Da t aV a lu e : nu l l

La b el : E N D _ A R R A Y ;

Fa t he r Ob j : 9

Ob j e c tI d : 23

Da t aV a lu e : nu l l

La b el : E N D_ O B J E C T ;

Fa t he r Ob j : 1

Listing 1: Data format for the phone attribute.

It is important to note that these objects are not

serialized, but they are processed as soon as they are

read from the input stream. The data migration rules

follow the sample principle, as it will be shown in the

next section.

JSON-based Interoperability Applying the Pull-parser Programming Model

3.2 Interoperability Rules

The interoperability rules developed take into account

the representation strategies presented in (Bugiotti

and Cabibbo, 2013), since they cover a large number

of NoSQL representations. We separate the rule des-

cription by the category of input data model and we

illustrate the output of each rule execution. The exe-

cution of each rule is illustrated by using the ”Person”

element already presented

Each rule is ﬁred once a new object is identiﬁed,

i.e., a START

OBJECT event occurs. For each exe-

cution, the rules process the following properties:

• Class: The class name deﬁnes the identiﬁer of a

given composed object

. This means that all the

nested objects or arrays have the same kind. In the

Document Store model, the class name is called

Collections; in the Graph model the class name is

the main node.

• Key: each object will have a main key, according

to the data model properties.

• Value: the value indexed by a given MainKey.

The difﬁculty on specifying the rules may vary de-

pending on the output data model. For instance, in

some cases it is more difﬁcult to produce the output

key than the output data, or vice-versa. This will be

clearer in the following sections.

3.3 Key-Value Stores

A key-value store contains collections of key-value

(K,V) pairs, where the key K is used as an index to

perform operations over the value V.

Key-value per Object - kvpo: there is only one ob-

ject associated per each key. The key is a concatena-

tion of the collection name and an identiﬁer for the

object. The collection name could be considered the

object type. The value is a serialization of the entire

value of the object, which may be a atomic data type

or a composition of values or objects.

The MainKey that identiﬁes an object is formed

by the object Class plus the ﬁrst VALUE STRING

found. The Value is generated by concatenating all

the nested values of the object. The output is a se-

quence of key-values pairs, as shown in Table 1.

Key-value per Field - kvpf: there are multiple key-

value pairs to represent each object. The key is a con-

catenation of the collection name, the object identi-

ﬁer and the name of the top-level ﬁeld. The format of

We removed the second phone number in the illustrati-

ons for brevity

In this work a class is used as a noun to categorize an

object with a set of common attributes

Table 1: Key-value per object - kvpo().

Key Value

MainKey

for all Obj.value do

Value = Value + Obj.value

end for

Person:John

”ﬁrstName”:”John”, ”lastName”:

”Smith”, ”age”: 25, ”phone-

Number”: [ { ”type”: ”home”,

”number”: ”212 555-1234” }, ... ]

the key may vary depending of the implementation,

keeping the requirement that the value is only the va-

lue of the corresponding ﬁeld.

The MainKey is the object Class plus the

KEY NAME, and this is repeated for each

KEY NAME found in the input object. The va-

lue is the data associated at the KEY NAME. If the

data is an Array or other Object all the values are

concatenated until the end of the Array or Object (see

Table 2).

Table 2: Key-value per ﬁeld - kvpf().

Key Value

MainKey

+ ”/” +

Obj.KEY NAME

for all Obj.KEY NAME do

if Value = (Array or Ob ject)

then

for all Obj.value do

Value = Value +

Obj.value

end for

else

Value = Obj.value

end if

end for

Person:John/

ﬁrstName

lastName

age

phoneNumber

John

Smith

{ ”type”: ”home”, ”number”: ”212

555-1234” }, ...

Key-value per Field Object - kvpfo: the key is a

concatenation of a major and a minor key. The major

key contains information related to the main object,

such as its collection name and an identiﬁer and the

minor key has information related to each ﬁeld.

The Key is composed by the MainKey , plus /-

/, plus each KEY NAME found in the object. The

values are formed by the KEY VALUE associated to

the KEY NAME. If the the value is an array or other

object, it is sequentially concatenated (3).

Key-value per Atomic Value - kvpav: the key is a

concatenation of identiﬁers, and the value is a unique

atomic value, not allowing complex objects.

The values are formed by each of the

ICEIS 2018 - 20th International Conference on Enterprise Information Systems

Table 3: Key-value per ﬁeld object - kvpfo().

Key Value

MainKey

+ ”/-/” +

Objs.KEY NAME

for all Obj.KEY NAME do

if Value = (Array or Ob ject)

then

for all Obj.value do

Value = Value +

Obj.value

end for

Value = Objs.KEY NAME

+ ”:” + Value

end if

Value = Objs.KEY NAME +

”:” + Obj.Value

end for

Person/John/-

/ﬁrstName

Person/John/-

/lastName

Person/John/-

/age

Person/John/-

/phoneNumber

John

Smith

”type”: ”home”, ”number”: ”212

555-1234” , ...

KEY VALUE’s found. The Key is composed

by the MainKey , plus /-/, plus all the path until the

KEY NAME before the value. If the value is an array

or another object, a sequential number is added in the

key to maintain the uniqueness (see Table 4).

Table 4: Key-value per atomic value - kvpav().

Key Value

for all Obj.KEY VALUE do

Key = MainKey + ”/-/” +

Objs.KEY NAME

if Value = (Array or

Ob ject) then

for all Ob j

Key = Key +

”/” + Ob j

Obj.KEY

NAME

end for

end if

end for

Objs.KEY NAME.

Value

Person/John/-/ﬁrstName

Person/John/-/lastName

Person/John/-/age

Person/John/-

/phoneNumber/0/type

Person/John/-

/phoneNumber/0/number

Person/John/-

/phoneNumber/1/type

Person/John/-

/phoneNumber/1/number

John

Smith

home

212 555-1234

fax

646 555-4567

Key-hash per Object - khpo: there is a key for each

complex object and a hash for each ﬁeld value, which

is commonly the ﬁeld value.

The Key has the same format of the kvpo repre-

sentation. The same MainKey has several vales, each

one composed by the KEY NAME plus the associa-

ted value. If the value is an array or other object, the

value is the concatenation of all elements of the array

or object (see Table 5).

Table 5: Key-hash per object - khpo().

Key Value

MainKey

for all Obj.KEY NAME do

if Value = (Array or Ob ject)

then

for all Obj.value do

Value = Value +

Obj.value

end for

Value = Objs.KEY NAME

+ ”:” + Value

end if

Value = Objs.KEY NAME +

”:” + Obj.Value

end for

Person:John

ﬁrstName:John

lastName:Smith

age:25

phoneNumber:[ ”type”: ”home”,

”number”: ”212 555-1234” , ... ]

3.4 Column Stores

Column Stores are organized on columns (as its cen-

tral entity), tables and rows. Thus, they are optimized

for reading columns, or groups of columns.

Column: a Column organizes keyed records as a col-

lection of columns, where a column contains collecti-

ons of key-value pairs. The key is the column name,

and the value can be an arbitrary data type.

The column name is each individual KEY NAME

and the values are formed by each of the indivi-

dual KEY VALUE’s. If the value is an array or ot-

her object, the columns’ name are composed by the

KEY NAME of the father plus the ﬁnal KEY NAME

found. No group is created, and the columns are sto-

red individually (see Table 6 (a)).

Super Column: it is a collection containing records

of other columns, so each column is a group of other

columns, and these groups are stored and manipula-

ted based on a ”Super Column” name, which can be

deﬁned as a Key part, and the columns group itself

determine the value.

The migration rule is a variation of the previous

one. The identiﬁcation of the key is the same, as

well as the assignment of the values. The rule chan-

ges when the value is an array or another object: the

JSON-based Interoperability Applying the Pull-parser Programming Model

KEY NAME of the father object is used as a Super

Column name, with the other KEY NAME’s serving

as the column name (see Table 6 (b)).

Table 6: Column and super column rules.

(a) Column

Column Value

for all Obj.KEY NAME do

if Obj.hasFather = true

then

Key =

Ob j

.KEY NAME +

”/” + Obj.KEY NAME

else

Key =

Obj.KEY NAME

end if

end for

Obj.KEY VALUE

ﬁrstName

lastName

age

phoneNumber/type

phoneNumber/number

phoneNumber/type

phoneNumber/number

John

Smith

home

212 555-1234

fax

646 555-4567

(b) Super Column

Super Column Column Value

Ob j

.KEY NAME KEY NAME KEY VALUE

phoneNumber

ﬁrstName

lastName

age

type

number

type

number

John

Smith

home

212 555-

1234

fax

646 555-

4567

Column Family: it groups the columns based in a

Row Key, which is set by the ﬁrst VALUE STRING

found (see Table 7 (a)). The creation of the columns

follow the creation rules of a Super Column.

Super Column Family: the Row Key groups co-

lumns that are correlated. The Row Key is set by

the object Class, which plays a role similar of a ta-

ble name. The columns follow the creation rules of a

Super Column. The rule is shown in Table 7 (b).

3.5 Document Stores (DS)

The document stores are designed to manipulate and

persist a wide diversity of complex values (Hecht and

Jablonski, 2011), which can comprise scalar values,

lists, and other documents in a nested format. These

documents are organized into collections of objects,

i.e., a group of documents.

Similarly to Key-Value stores, there are variations

on how to encode the documents. The three main va-

Table 7: Column Family and super column family.

(a) Column Family, row key ’John’

Super Column Column Value

Ob j

.KEY NAME KEY NAME KEY VALUE

phoneNumber

ﬁrstName

lastName

age

type

number

type

number

John

Smith

home

212 555-1234

fax

646 555-4567

(b)

Super Column Family, column family ’Person’

Super Column Column Value

Ob j

.KEY NAME KEY NAME KEY VALUE

phoneNumber

ﬁrstName

lastName

age

type

number

type

number

John

Smith

home

212 555-1234

fax

646 555-4567

riations are document per object - cpo, item per ob-

ject - ipo and cell per object - cpo.

The migration rules have similarities to the Key

Value stores, since the objects may be identiﬁed by

unique keys. We describe the particularities in the fol-

lowing.

Document per Object: the migration rule is similar

to the kvpo strategy. The main difference is that the

MainKey is split into the class name, acting as a col-

lection name and the ﬁrst VALUE

STRING, acting as

the ”Document id”. The nested values are concatena-

ted sequentially. This rule is described in Table 8.

Table 8: Document per object - dpo(), class Person.

Document id Value

VALUE STRING

for all Obj.value do

Value = Value + Obj.value

end for

John

{”ﬁrstName”:”John”, ”last-

Name”: ”Smith”, ”age”: 25,

”phoneNumber”: { ”type”:

”home”, ”number”: ”212

555-1234” }, ...

Item per Object: this rule is similar to the kvpf one.

The class name is the Collection name and the data

is composed by the KEY NAME and the associated

value. To distinguish each collection within the same

element, one ID is generated for each inner document.

If the value is an array or other object, it is the conca-

tenation of all the nested elements (see Table 9).

ICEIS 2018 - 20th International Conference on Enterprise Information Systems

100

Table 9: Item per object - ipo(), class Person.

Documents Value

KEY NAME

for all Obj.KEY NAME do

Value = Value + Obj.value

end for

ﬁrstName

lastName

age

phoneNumber

John

Smith

{ ”type”: ”home”, ”number”:

”212 555-1234” }, ...

Cell per Object: the table name receives the Class

name. The ID is created based on the ﬁrst VA-

LUE STRING found. The Value receives all the nes-

ted values concatenated sequentially (see Table(10).

Table 10: Cell per object - cpo(), class Person.

Value

VALUE STRING

for all Obj.value do

Value = Value +

Obj.value

end for

John

{”ﬁrstName”:”John”, ”las-

tName”: ”Smith”, ”age”:

25, ”phoneNumber”: [ {

”type”: ”home”, ”number”:

”212 555-1234” }, ]}

3.6 Graph Stores

A graph store organizes the data as nodes, edges and

properties. Is important to note that the properties are

key/values pairs. Nodes can represent entities, and the

edges are the connection between two nodes repre-

senting a relationship and the properties are the data

itself (Bondiombouy and Valduriez, 2016). There are

several possible representations, such as not conside-

ring properties as separate entities as well. They are

best suited to applications involving large connected

elements, graph traversals and sub-graph matching.

The Main Node is composed by the object Class,

plus the ﬁrst VALUE STRING found. This is the

same process used to form the MainKey . The leaf

nodes are composed by each KEY NAME, plus the

associated value. If the value is an array or another

object, it is the concatenation of all elements of the

array or object (see Table 11). Note that graph data-

bases may have many other encoding, which are not

covered by this migration rule.

Table 11: Graph - graph(), node Person.

Leaf Node Value

Objs.KEY NAME

for all Obj.KEY NAME do

if Value = (Array or

Ob ject) then

for all Obj.value do

Value = Value +

Obj.value

end for

Value =

Obj.KEY NAME

+ ”:” + Value

end if

Value = Obj.KEY NAME

+ ”:” + Obj.Value

end for

ﬁrstName

lastName

age

phoneNumber

John

Smith

{ ”type”: ”home”, ”number”:

”212 555-1234” }, ...

3.7 Implementation

The implemented tool

uses different NoSQL data-

bases per category of data store. They where chosen

because they have all implemented get() and put() in-

terfaces to access the data, as well as ways to serialize

the results in JSON. As Key value store, we use the

Oracle NoSQL Community Edition; for the column

stores, Apache HBase; Mongo Db as document store

and Neo4J as graph database.

We used the data that is freely available from the

City of Chicago Data Portal and the ”Food Inspecti-

ons” data set

. The dataset describes inspections of

restaurants and other food establishments in Chicago

from January 1, 2010 to December 1, 2016. There

is no particular reason about the kind of data chosen,

just because they are public domain, with easy access

through its API. The input data contains 139.535 ob-

jects. Each object is composed by 23 ﬁelds and 1 ar-

ray of objects, containing itself 5 distinct ﬁelds. Table

12 shows the number of output pairs for each repre-

sentation strategy for key value stores.

For Column Stores, it generates the same number

of columns as output, 3.906.980, for Column, Super

Column, Column Family and Super Column Family.

The output is different only in the way the columns

are grouped. For the Document Stores, the choice of

http://www.inf.ufpr.br/didonet/ﬁles/Jsonpullparser.zip

Food Inspections Data Set: https://data.

cityofchicago.org/Health-Human-Services/Food-Inspections/

4ijn-s7e5

JSON-based Interoperability Applying the Pull-parser Programming Model

101

Table 12: Generated elements for Key Value stores.

MainKey Values Output Pairs

Kvpo 1 1 139.535

Kvpf 24 24 3.348.840

Khpf 1 24 3.348.840

Kvpfo 24 24 3.348.840

Kvpav 28 28 3.906.980

the key that will compose the document has a direct

consequence in the number of generated values: dpo

produced 139.535 elements; ipo generated 3.348.840

and cpo generated 139.535 elements. Finally, the out-

put for the Graph databases was one main node, the

input class, and one leaf node for each ﬁeld or array

in the original ﬁle. The values are then inserted into

each leaf node, totalling 3.348.840 elements.

4 CONCLUSIONS

We presented an approach for NoSQL interoperabi-

lity based on the JSON format and applying the pull-

parser programming model for executing a set of rules

over a stream of objects. We have two main contribu-

tions. First, we use the JSON nested data model as

a basis for interoperability between different NoSQL

data formats. The utilization of JSON has confrimed

to be an effective choice, since it has many support

for several APIs, making it easy to connect to diffe-

rent output datastores.

The second main contribution is the utilization of

the pull-parser programming model, which has alre-

ady been used in the XML context, for reading the

input from a stream of objects. This enables to have

large ﬁles as input, since it does not need to keep the

input objects in memory. The translation itself is free

of context, if the JSON objects are well-formed nes-

ted documents.

We detailed a set of rules from JSON to a set of

NoSQL data representation strategies. The data mi-

gration rules are simple to implement, relying only on

get() and set() primitives, available in several imple-

mentations of NoSQL databases. Despite covering a

large number of representations,other representations

exist, specially with respect to the composition of the

input keys. They are often path/based expressions to

reach a given object.

As future work, we could extend the model to sup-

port complex query compositions, and to compare the

results of a same query in different NoSQL stores.

REFERENCES

Alomari, E., Barnawi, A., and Sakr, S. (2015). Cdport: A

portability framework for nosql datastores. Arabian

Journal for Science and Engineering, pages 1–23.

Atzeni, P., Bugiotti, F., Cabibbo, L., and Torlone, R. (2016).

Data modeling in the nosql world. Computer Stan-

dards & Interfaces.

Atzeni, P., Bugiotti, F., and Rossi, L. (2012). Uniform

access to non-relational database systems: The sos

platform. In Advanced Information Systems Engineer-

ing, pages 160–174. Springer.

Atzeni, P., Bugiotti, F., and Rossi, L. (2014). Uni-

form access to nosql systems. Information Systems,

43:117–133.

Bondiombouy, C. and Valduriez, P. (2016). Query Proces-

sing in Multistore Systems: an overview. PhD thesis,

INRIA Sophia Antipolis-M

editerran

ee.

Bugiotti, F. and Cabibbo, L. (2013). A comparison of data

models and apis of nosql datastores. Dipartamento di

Ingegneria della Universit

a di Roma.

Bugiotti, F., Cabibbo, L., Atzeni, P., and Torlone, R. (2013).

A logical approach to nosql databases.

Bugiotti, F., Cabibbo, L., Atzeni, P., and Torlone, R. (2014).

Database design for nosql systems. In In proc. of ER,

pages 223–231. Springer.

Chung, W.-C., Lin, H.-P., Chen, S.-C., Jiang, M.-F., and

Chung, Y.-C. (2014). Jackhare: a framework for sql

to nosql translation using mapreduce. Automated Soft-

ware Engineering, 21(4):489–508.

Consortium, W. W. W. et al. (2012). R2rml: Rdb to rdf

mapping language.

Hecht, R. and Jablonski, S. (2011). Nosql evaluation: A use

case oriented survey.

Michel, F., Djimenou, L., Faron-Zucker, C., and Montag-

nat, J. (2014). xr2rml: Relational and non-relational

databases to rdf mapping language. Technical report,

ISRN I3S/RR 2014-04-FR v3.

Scavuzzo, M., Di Nitto, E., and Ceri, S. (2014). Interopera-

ble data migration between nosql columnar databases.

In 2014 IEEE 18th EDOCW, pages 154–162. IEEE.

Slomiski, A. (2001). TR550: Design of a Pull and Push

Parser System for Streaming XML. Technical report,

University of Indiana, US.

ICEIS 2018 - 20th International Conference on Enterprise Information Systems

102