Sensor-based Database with SensLog: A Case Study of SQL to NoSQL

Migration

Prasoon Dadhich

, Andrey Sadovykh

, Alessandra Bagnato

, Michal Kepka

, Ond

rej Kaas

and Karel Charv

Softeam Group, Paris, France

University of West Bohemia, Plze

n, Czech Republic

Lesprojekt-slu

zby Ltd., Martinov, Czech Republic

Keywords:

NoSQL, Performance Testing, Modeling, Data Migration, Benchmarking, Sensor-based Database, Cloud

Computing, Virtualization, Scaling, Data Architecture and Analysis.

Abstract:

Sensors gained a signiﬁcant role in the Internet of Things (IoT) applications in various industry sectors. The

information retrieved from the sensors are generally stored in the database for post-processing and analysis.

This sensor database could grow rapidly when the data is frequently collected by several sensors altogether. It

is thus often required to scale databases as the volume of data increases dramatically. Cloud computing and

new database technologies has become key technologies to solve these problems. Traditionally relational SQL

databases are widely used and have proved reliable over time. However, the scalability of SQL databases at

large scale has always been an issue. With the ever-growing data volumes, various new database technologies

have appeared which proposes performance and scalability gains under severe conditions. They have often

named as NoSQL databases as opposed to SQL databases. One of the challenges that have arisen is knowing

how and when to migrate existing relational databases to NoSQL databases for performance and scalability.

In the current paper, we present a work in progress with the DataBio project for the SensLog application case

study with some initial success. We will report on the ideas and the migration approach of SensLog platform

and the performance benchmarking.

1 INTRODUCTION

The growth of the IoT domain has increased impor-

tance for processing a large amount of sensor data.

Sensor-based databases are a particular type of da-

tabase with a constant predictable data ﬂow(Kepka

et al., 2017) in a ﬁxed interval of time. The data ﬂow

is generally ﬁxed with the sensors which are then atta-

ched to the units. These units are physical entities, for

instance: a tractor in the ﬁeld, ﬁeld node with several

connected sensors or a smart-phone (

Rezn

ık et al.,

2016),(Kepka M., 2012). Sensor networks are widely

used in several surveillance applications. These appli-

cations help us get the useful data for post-processing

and analysis.

1.1 SensLog Case Study

SensLog is web-based sensor data management sy-

stem suitable to process data both from static and

mobile sensors. The SensLog application consists

of server-side part and database part. Raw measu-

rements from sensors are taken to gateways(Kepka

et al., 2017). These gateways send the ﬂow of measu-

rements to the database with the SensLog API. Sens-

Log database schema was designed and developed

during recent years with features respecting standard

ISO 19156 (Probst, 2008). With the help of Sens-

Log team, we have observed SensLog sensor-based

data model and storage (Kepka et al., 2013), which,

as in traditional data-intensive applications, relies on

the relational database. The current implementation

of SensLog data model in PostgreSQL database sy-

stem (Pos, 2018) seems to be still effective. However,

SensLog in future needs a much larger deployment

and thus preparing for the infrastructure capable of

dealing with big data. Therefore, this study will in-

vestigate the applicability of document-oriented data-

bases for the future growth of SensLog.

Dadhich, P., Sadovykh, A., Bagnato, A., Kepka, M., Kaas, O. and Charvát, K.

Sensor-based Database with SensLog: A Case Study of SQL to NoSQL Migration.

DOI: 10.5220/0006909202390244

In Proceedings of the 7th International Conference on Data Science, Technology and Applications (DATA 2018), pages 239-244

ISBN: 978-989-758-318-6

239

1.2 Objectives

Over the past years, Softeam research team has con-

ducted several studies on the methods for model-

driven design of data-intensive applications (Mar,

2018; da Silva et al., 2014; Aur

elio Almeida da Silva

and Sadovykh, 2014). Part of this work is the migra-

tion problem to help the architects and software engi-

neers to move easier from one data storage techno-

logy to another with automated tools. In the con-

text of the DataBio project(Dat, 2018) which focuses

on data-driven bioeconomy, we develop the model-

driven NoSQL migration tools in Modelio (Mod,

2018). Our major hypothesis is that the migration

process should start with analysis and denormaliza-

tion of relational structures (Mar, 2018). The next

steps are the transformation to NoSQL (e.g. Mon-

goDB) meta-model and deployment. Finally, an ex-

tensive testing should prove the beneﬁts of the migra-

tion process comparing to the legacy application.

In this paper, we manually proceed with the

above-mentioned process for our case study with

SensLog. We concentrate on the performance testing

part in order to prove the promised beneﬁts which jus-

tiﬁes further development of our methods in Modelio

modeling tool.

2 BACKGROUND

In the world of software engineering, we have seen

continuous changes in languages, platforms, and ar-

chitectures. However throughout relational database

has been the default choice for serious data storage in

the industry context. With the growing data volumes,

the need to integrate data from multiple sources and

continuous strive for scale, there came new challen-

ges in data storage. Thus, new database technologies

such as MongoDB (Mon, 2018b), Cassandra (Cas,

2018), Neo4j(Neo, 2018) and Azure table storage

(Azu, 2018) came into existence. NoSQL databases

provided the advantage of building systems (Pokorny,

2013) that were more performing, scaled much better

and were easier to program for.

Various proof of concepts and surveys have shown

beneﬁts to apply NoSQL databases(Pokorny, 2013).

It has been proven particularly advantageous for the

databases which do not rely much on the relati-

ons and/or could be denormalized with some measu-

res(Pokorny, 2013; Dhanachandran, 2012). One of

the promising cases in this respect could be the sen-

sor industry.

Working with SensLog team who specializes in

their tool for processing sensor queries, we furt-

Figure 1: SensLog relational Schema observed with Mo-

delio SQL designer.

hermore explore the idea of processing sensor que-

ries with the warehousing approach that relies on

a document-oriented NoSQL database(Balazinska

et al., 2007). In this approach, we principally have

a data lake and an analytics database. The data lake

is populated by the sensor gateway, which stores the

sensor data in a predeﬁned way for all the insertions.

Then, this data could be transformed into a form that

is easier for analytics application. The analytics data-

base is usually smaller, centrally stored for faster read

access with SELECT queries on speciﬁc indicators. In

the meantime, the data lake containing the historical

data can still be accessed with extraction mechanisms

that may take longer time for the query. This approach

enables acceptable response time for analytics data,

while still allowing for complex query processing if

needed through the data lake.

3 MIGRATION

The current SensLog application is implemented with

a PostgreSQL database. The SensLog application en-

counters certain performance limitations and our goal

was to understand the bottlenecks and propose a more

efﬁcient architecture utilizing NoSQL concepts. The-

refore, to start developing proof of concept, we had

to explore the existing relational schema of the cur-

rent SensLog application. We chose to translate the

SQL schema into a data model using Modelio SQL

Designer (SQL, 2018).

Our working hypothesis is denormalizing the data

model in order to map it to NoSQL (Mar, 2018). The-

refore, after observing complex relations among the

DATA 2018 - 7th International Conference on Data Science, Technology and Applications

240

tables in the SensLog data model, we investigated cer-

tain methods for denormalizing those tables towards

getting to the MongoDB(Mon, 2018b) database. The

main idea is to achieve a state where tables can be

transformed into related documents and mapped to

MongoDB. The denormalization methods are discus-

sed in (Mar, 2018).

We applied methods starting with changing the

connections to IDs to substitute relations among

the MongoDB collections. In the second itera-

tion, we substituted joints by embedding certain ta-

bles(Kanade et al., 2014). This method should be ap-

plied strategically depending on the amount of data to

be duplicated. The good indication is the complexity

of joint relations that can be observed with Modelio

SQL Designer.

Figure 2: MongoDB migrated database snapshot from

Robo3T interface.

Once deciding to migrate we used a speciﬁc

Mongify tool (Mon, 2018a) where we could script

these denormalization rules and translate the relatio-

nal schema to a MongoDB database structure. That

way, we could instrument the methods for ID repla-

cement and embedding tables (Jadhav, 2017). The

figure 2 gives a snapshot of tables achieved after

migration as seen in the Robo3T.

4 EVALUATING RESULTS

Once both systems, the legacy one and migrated

one, were ready, we could proceed with the perfor-

mance testing and compare both database systems.

To achieve this we needed to set up a testbed con-

sisting of two identical cloud machines hosting Post-

greSQL and MongoDB. On both machines, we set up

the SensLog data models and deployed the historical

data.

With these two machines we organized the test

plans that included the following scenarios:

• Throughput on insertion query.

• Response Time on insertion query.

• Throughput and Response Time with varying

number of users varying from 10 to 30 as discus-

sed with the SensLog team.

• Throughput and Response Time with speciﬁc que-

ries.

• Throughput and Response Time with speciﬁc

schema set.

Considering the limits of infrastructure, we con-

centrated our experiments with micro cloud machi-

nes. Choosing micro machines with small capacity of

storage instead of bigger ones helped to demonstrate

and learn the physical limits of stressing. That way,

we could observe the behavior of the data insertion

under stress conditions.

4.1 Testbed

In order to compare the results on the PostgreSQL and

the MongoDB NoSQL SensLog data models, we de-

ployed two identical Amazon instance machines with

MongoDB and PostgreSQL schema respectively. We

used two identical t2.micro Ubuntu machines with the

following speciﬁcations:

Table 1: Amazon instance characteristics.

Amazon instance type t2.micro

#Cores 1

#Threads 1

Instance storage EBS Only

Operating System Ubuntu 14.0

The ﬁrst machine is deployed with the Post-

greSQL version 9.6 with the initial schema received

from SensLog team. The second machine is deployed

with the MongoDB version 3.2 with the data model

that we obtained in the migration process using Mo-

delio SQL Designer and Mongify tool.

4.2 Performance Testing

The graphs below illustrate the results achieved with

JMeter(Jme, 2018) performance testing tool. Those

graphs compare the behavior of both PostgreSQL-

based and MongoDB-base SensLog applications.

In this ﬁrst set of experiments, we have concentra-

ted on insertion testing (DataStax, 2018). We stres-

sed both the machines by a test case involving a large

number of users (i.e. sensors) to insert data each se-

cond in order to determine the maximum number of

users to be served with a stable Response Time. We

chose the most critical part of the data model in order

to demonstrate the impact of the insertions on the sy-

stem. In our context, the SensLog application heavily

Sensor-based Database with SensLog: A Case Study of SQL to NoSQL Migration

241

relies on the Observation table where all measure-

ments are stored. Hence, we concentrated insertion

queries on this table.

SensLog team targets that application is capable to

handle 30 sensor data insertions per second in a typi-

cal scenario. Thus, in the ﬁrst part of our experiments,

we measured the Response Time in this typical scena-

rio on our testbed. We run test suite on both machines

simultaneously and observed the difference. The re-

sults are shown in the snapshots below.

Figure 3: Observations Comparison Response Time Graph

PostgreSQL.

Figure 4: Observations Comparison Response Time Graph

MongoDB.

We observe in the figure 3 that it is going to-

wards exponential curve whereas in figure 4 we get

linear performance stability in the NoSQL MongoDB

insertions. In terms of throughput during insertion,

we observed MongoDB performing three times more

than the PostgreSQL machine.

In the second iteration, we decided to investigate

in more details the MongoDB behavior by ﬁne-tuning

different parameters such as the number of users and

the ramp-up time.

In the figure 5 we observed a consistent Re-

sponse Time when MongoDB was stressed with 15

users for ramp-up time of 1 minute. There was a

big jump after several insertions which is most likely

because of the memory shortage in the small Ama-

zon machine we have utilized in our proof of concept.

From this lap, we can also see the behavior of Mon-

goDB machine when it is near to getting full. That is

a very appreciable behavior of a little jump in a still

consistent way comparing to completely breaking up

as in the PostgreSQL case.

In the next experiment, we did the same for 30

users and observed similar behavior. Thus, with our

experiments, we proved our hypothesis on linear con-

sistency when using MongoDB.

The data insertions are done with the help of

scripts in the JMeter tool (Jme, 2018) by generating

relevant fake data as below.

PostgreSQL:

INSERT INTO public.observations(observation_id, gid,

time_stamp, observed_value, sensor_id, unit_id,

time_received)

VALUES (

${count_value},

${__Random(2000,2017)}-${__Random(10,12)}

-01 00:00:00,

${__Random(0,100000)},

${__Random(1000,10000)},

${count_value},

${__Random(2000,2017)}-${__Random(10,12)}

-01 00:00:00));

MongoDB:

db.observations.insert(

{

"observation_id" : ${count_value},

"gid" : ${count_value},

"time_stamp" : ISODate("${__Random(2000,2017)}-

${__Random(10,12)}-01T00:00:00Z"),

"observed_value" : ${__Random(0,100000)},

"sensor_id" : ${__Random(1000,10000)} ,

"unit_id" : ${count_value} ,

"time_received" : ISODate("2014-10-01T00:00:00Z")

}

)

5 CONCLUSION AND FUTURE

WORK

The major point with the sensor database is the con-

stant ﬂow of data from the sensor networks. This ﬂow

Figure 5: Observations Response Time Graph while Stres-

sing with 15 users MongoDB.

DATA 2018 - 7th International Conference on Data Science, Technology and Applications

242

if decoupled from the relational-based database, using

the technique of data warehousing(Balazinska et al.,

2007) can beneﬁt the organization of the insertion in

the database with respect to the Response Time. In

SensLog database, we also observe the existing Post-

greSQL heavily utilizing SELECT queries. MongoDB

lacks support in SELECT queries and hence we weigh

on preserving a part of the original architecture which

will still have a centralized database operating with

PostgreSQL.

From our tests, we observed SensLog Response

Time graph in MongoDB to be growing linear com-

pared to the equivalent PostgreSQL relational data-

base. Therefore, we consider that a data warehouse

on MongoDB will be a good solution for SensLog da-

tabase, while still keeping the centralized database for

the reporting system.

If there is any architecture decision to be taken

for SensLog application, we would point to deploying

MongoDB servers in the front receipting part to get

direct data from the Sensors. We can later strate-

gically utilize this database as a data pool for post-

processing. Thus, our experimentations on the basis

of proved concept with MongoDB sensor data model,

we expect a performance optimization of threefold at

the optimistic level in this architecture system.

This study is an important step forward for justi-

fying our efforts for further research in the direction

of automated tools for model-driven SQL to NoSQL

migration with Modelio.

FUNDING ACKNOWLEDGEMENT

This work is under European research project Data-

Bio (Dat, 2018) funded from European Union’s Ho-

rizon 2020 research and innovative programme under

grant agreement No. 732064.

REFERENCES

(2018). Azure. https://azure.microsoft.com/en-us/servi

ces/storage/tables/. A NoSQL key-value store for ra-

pid development using massive semi-structured data-

sets.

(2018). Cassandra. http://cassandra.apache.org/. Cassandra

is an open-source distributed NoSQL database mana-

gement system designed to handle large amounts of

data across many commodity servers.

(2018). Databio. https://www.databio.eu/en/summary/. Da-

taBio : Project to show the beneﬁts of Big Data

technologies in the raw material production from agri-

culture, forestry and ﬁshery/aquaculture for the bioe-

conomy industry to produce food, energy and bioma-

terials responsibly and sustainably.

(2018). Jmeter. https://jmeter.apache.org/usermanual/i

ndex.html. The Apache JMeter application is open

source software, a pure Java application designed

to load test functional behavior and measure perfor-

mance.

(2018). Modelio. https://www.modelio.org/. The open

source modeling tool environment.

(2018a). Mongify. http://mongify.com/. Mongify is a data

translator system for moving your SQL data to Mon-

goDB.

(2018b). Mongodb. https://www.mongodb.com/what-is-

mongodb. Information about MongoDB.

(2018). Neo4j. https://neo4j.com/. Neo4j is a graph data-

base management system developed by Neo4j.

(2018). Postgres. https://www.postgresql.org/. The imple-

mentation of PostgreSQL.

(2018). Sql to nosql migration. https://www.modelio.org/.

An assessment of the migration from a relational da-

tabase to a NoSQL store.

(2018). Sqldesigner. http://store.modelio.org/resource/modul

es/sql-designer.html. A module in Modelio for Data-

base modeling.

Aur

elio Almeida da Silva, M. and Sadovykh, A. (2014).

Multi-cloud and multi-data stores. In Proceedings of

the 4th International Conference on Cloud Computing

and Services Science, CLOSER 2014, pages 703–713,

Portugal. SCITEPRESS - Science and Technology Pu-

blications, Lda.

Balazinska, M., Deshpande, A., Franklin, M. J., Gibbons,

P. B., Gray, J., Hansen, M., Liebhold, M., Nath, S.,

Szalay, A., and Tao, V. (2007). Data management in

the worldwide sensor web. IEEE Pervasive Compu-

ting, 6(2):30–40.

da Silva, M. A. A., Sadovykh, A., Bagnato, A., Cheptsov,

A., and Adam, L. (2014). Juniper: Towards modeling

approach enabling efﬁcient platform for heterogene-

ous big data analysis. In Proceedings of the 10th

Central and Eastern European Software Engineering

Conference in Russia, CEE-SECR ’14, pages 12:1–

12:7, New York, NY, USA. ACM.

DataStax (2018). Benchmarking top nosql databases: Apa-

che cassandra, couchbase, hbase, and mongodb.

Dhanachandran, S. (2012). Dynamic real time distributed

sensor network based database management system

using xml, java and php technologies. 4:9–20.

Jadhav, B. (2017). Gui for data migration and query con-

version. International Journal of Advanced Research

in Computer Science and Software Engineering.

Kanade, A., Gopal, A., and Kanade, S. (2014). A study of

normalization and embedding in mongodb.

Kepka, M., Charv

at, K.,

Spl

ıchal, M., K

riv

anek, Z., Musil,

M., Leitgeb,

S., Ko

zuch, D., and B

erzin¸

s, R. (2017).

The senslog platform – a solution for sensors and ci-

tizen observatories. In H

reb

cek, J., Denzer, R., Schi-

mak, G., and Pitner, T., editors, Environmental Soft-

ware Systems. Computer Science for Environmental

Protection, pages 372–382, Cham. Springer Interna-

tional Publishing.

Sensor-based Database with SensLog: A Case Study of SQL to NoSQL Migration

243

Kepka, M., Je

zek, J., Charvat, K., and Musil, M. (2013).

Complex solution for sensor network in precision far-

ming.

Kepka M., J. J. (2012). Server-side solution for sensor data.

Pokorny, J. (2013). Nosql databases: a step to database

scalability in web environment. International Journal

of Web Information Systems, 9(1):69–82.

Probst, F. (2008). Observations, measurements and seman-

tic reference spaces. Appl. Ontol., 3(1-2):63–89.

Rezn

ık, T., Kepka, M., Charvat, K., Horakova, S., and Lu-

kas, V. (2016). Challenges of agricultural monitoring:

integration of the open farm management information

system into geoss and digital earth. 34:012031.

DATA 2018 - 7th International Conference on Data Science, Technology and Applications

244