On-Premise Internet of Things (IoT) Data Storage:

Comparison of Database Management Systems

Anna Wolters

, Mevludin Blazevic

and Dennis M. Riehle

Institute for IS Research, University of Koblenz, Universitätsstraße 1, Koblenz, Germany

Keywords:

Internet of Things, Big Data, Database Management System, Time-series Database.

Abstract:

The Internet of Things (IoT) connects millions of devices, leading to the production of vast amounts of data.

For such data to be of value, efﬁcient and effective data storage is of utmost importance. In this paper, we

present a comparison of on-premise database management systems in the context of the IoT. We perform a

market analysis on relational, Not Only SQL (NoSQL), and time-series database systems as well as a require-

ment analysis in order to comprehensively compare database systems based on functional and non-functional

criteria. After an initial selection, we compare MySQL, PostgreSQL, Cassandra, MongoDB, InﬂuxDB, and

QuestDB. As a result, we provide a best practice guide to support the decision-making on which database to

select for an IoT use case.

1 INTRODUCTION

Connected objects in the Internet of Things (IoT) pro-

duce vast amounts of data, making the data itself one

of the essential elements in the IoT (Nasar & Kausar,

2019). The increasing amounts of data generated in

the IoT create the challenge of storing the data efﬁ-

ciently, which is essential to ensure better decision-

making based on the data. IoT data is generated

mainly by IoT sensors which collect environmental

data (Rayes & Salam, 2017). These data sets are too

large to be analyzed with classic data-processing soft-

ware and “have entered the common language with

the term Big Data” (Sestino et al., 2020, p. 1). More-

over, IoT data and Big Data are equal in their em-

phasis on information that is rich in volume, velocity,

and variety and therefore requires innovative forms of

processing (Lycett, 2013; Sestino et al., 2020).

The characteristics that Big Data and IoT have in

common, impose requirements for the database sys-

tem. Those include technical requirements to, for in-

stance, support heterogeneous sensor infrastructures

(Amghar et al., 2018), as well as other factors such as

data security, query speed, reliability, pricing, main-

tainer, business model (on-premise, cloud), licensing

model, support, API, and interfaces (Beynon-Davies,

2017; Rautmare and Bhalerao, 2016). This makes the

selection of the right database system solution chal-

https://orcid.org/0000-0002-4075-5737

https://orcid.org/0000-0003-0347-7392

https://orcid.org/0000-0002-5071-2589

lenging, as different requirements and contextual in-

formation are needed to ﬁnd the right balance be-

tween costs and beneﬁts.

In this paper, we present a best practice guide for

selecting a suitable database system for storing IoT

data on-premise. We perform a comprehensive com-

parison of available database systems by identifying

functional and non-functional criteria to guide our se-

lection process. In total, we compare six database sys-

tems, two of which are relational, two Not Only SQL

(NoSQL), and two time-series databases.

The remainder of the paper is structured as fol-

lows. In Section 2 we provide background informa-

tion on the IoT and Big Data, as well as database sys-

tems. Section 3 presents the research method applied.

Next, we perform a market and requirement analysis

in Section 4. The ﬁnal set of selected databases is

compared in more detail in Section 5. We discuss and

conclude our ﬁndings in Section 6.

2 BACKGROUND

2.1 Internet of Things and Big Data

Since its ﬁrst appearance, many deﬁnitions for IoT

were established which can be found in scientiﬁc and

non-scientiﬁc literature (e.g., Baiyere et al., 2020;

Chui et al., 2020; ITU-T, 2011; Miorandi et al., 2012;

Rayes & Salam, 2017). Mostly, these deﬁnitions have

the interrelationship between sensors and physical ob-

jects in common. IoT can be deﬁned as a “system

140

Wolters, A., Blazevic, M. and Riehle, D.

On-Premise Internet of Things (IoT) Data Storage: Comparison of Database Management Systems.

DOI: 10.5220/0011851200003482

In Proceedings of the 8th International Conference on Internet of Things, Big Data and Security (IoTBDS 2023), pages 140-149

ISBN: 978-989-758-643-9; ISSN: 2184-4976

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

of interconnections between digital technologies and

physical objects that enable such (traditionally mun-

dane) objects to exhibit computing properties and in-

teract with one another with or without human inter-

vention” (Baiyere et al., 2020, p. 557). Another def-

inition focuses on IoT applications and information

technology, whereupon IoT is the “interconnection of

sensing and actuating devices providing the ability to

share information across platforms through a uniﬁed

framework, developing a common operating picture

for enabling innovative applications” (Gubbi et al.,

2013). This can be achieved by pervasive sensing and

data processing using cloud infrastructures as a uni-

fying architecture. IoT sensors generate a variety of

different data that differs from data from traditional

Information Systems (IS). One approach in describing

IoT data is the distinction between the origins of data

that is processed by computers. IoT data is generated

mainly by IoT sensors which collect environmental

data (Rayes & Salam, 2017). Moreover, IoT enables

different physical devices to connect to the Internet

and engage in data exchange. These datasets are too

large to be analyzed with classic data-processing soft-

ware and can be seen as a subcategory of “Big Data”

(Sestino et al., 2020).

The term Big Data refers to the generation of

large amounts of structured and unstructured data.

Big Data is usually associated with real-time analysis,

which distinguishes it from traditional datasets (Chen

et al., 2014). Originally, Big Data was deﬁned by

the 3 V’s: volume, velocity, and variety (Strohbach et

al., 2015), while further characteristics such as value,

veracity, and visualization can be considered as well

(Chen et al., 2014; Strohbach et al., 2015). IoT is con-

sidered to be one of the most promising drivers of Big

Data expansion, due to the increasing availability of

connected sensor devices in companies that generate

a vast amount of data (De Mauro et al., 2015).

2.2 Database Systems

When discussing and comparing database systems in

more detail, the CAP theorem and ACID properties

must be considered. Both ideas describe character-

istics of distributed systems, and the ACID proper-

ties particularly focus on transactional databases. The

CAP theorem contains the characteristics of consis-

tency, availability, and partition tolerance. According

to the theorem, a distributed system is only able to ful-

ﬁll two of these properties . The ACID properties are

atomicity, consistency, isolation, and durability (El-

masri & Navathe, 2016, pp. 754-755).

In recent years, database or storage model tech-

nologies have evolved and improved signiﬁcantly. In

our research, we focus on relational, NoSQL, and

time-series database systems.

The idea of a relational data model was introduced

by Codd (1970) in 1970. In a relational database,

data is structured in the form of tables that are eventu-

ally connected to each other using unique identiﬁers

as reference keys (Elmasri & Navathe, 2016, p. 73),

and the data is accessed using Structured Query Lan-

guage (SQL) (Elmasri & Navathe, 2016, p. 59). Re-

lational databases provide a very simple data struc-

ture as well as data access. With the concept of data

normalization, data is structured clearly and data con-

sistency is improved (Elmasri & Navathe, 2016, pp.

501 ff.). However, relational databases are not able to

store more complex data formats, such as documents.

Compared to relational database systems, NoSQL

databases provide ﬂexible and schema-less data mod-

els (Fatima & Wasnik, 2016), but do not provide

the ACID characteristics characteristics (Chauhan &

Bansal, 2017). However, it adheres to BASE, which

means basically available, soft state, and eventually

consistent (Chauhan & Bansal, 2017). There exist

four categories of NoSQL databases: key-value store,

column-oriented database, document database, and

graph database. Besides a more ﬂexible data model,

NoSQL databases also support horizontal scaling,

while relational databases can only be scaled verti-

cally (Chauhan & Bansal, 2017).

In IoT, time-series databases are very prominent,

since IoT data usually contain a timestamp for every

data point, which might be used to index the data.

Time-series databases are insert-heavy, while there

is no update operation as it is the case for relational

and NoSQL databases (Mostafa et al., 2022). The

time-based storage enables easier analysis of trends

and changes in data over time. Additionally, query-

ing and fetching data from a time-series database is

usually very efﬁcient (Musa et al., 2019). Time-series

databases do not necessarily adhere to the ACID prop-

erties. Because of the continuous collection of new

data, durability, and strong consistency do not have

to be ensured. InﬂuxDB, a prominent time-series

database, for instance, supports eventual consistency

(Musa et al., 2019).

3 RESEARCH METHOD

In this research, we apply Design Science Re-

search (DSR) based on the Design Science Research

Methodology (DSRM) proposed by Peffers et al.

(2007). As given by Peffers et al. (2007), artifacts can

be “constructs, models, methods, or instantiations”

(Peffers et al., 2007), making the knowledge contribu-

On-Premise Internet of Things (IoT) Data Storage: Comparison of Database Management Systems

141

tion of DSR versatile (Gregor & Hevner, 2013). In our

research, the ﬁnal artifact is a best practice guide to

assist decision-makers in selecting the most appropri-

ate database system for IoT use cases. Aligned with

the understanding of an artifact in DSR, we present

a more abstract artifact similar to design principles or

implicit technological rules (Gregor & Hevner, 2013).

Our research follows the six phases proposed in

the DSRM: Based on these phases, our research is

structured as follows:

Problem Identiﬁcation. Storing IoT data is a chal-

lenging task since IoT devices produce vast

amounts of data – Big Data – which must be

stored and processed efﬁciently. Choosing the

right database is just as challenging as there ex-

ists a large variety of database systems. Deciding

on which system to use requires a comprehensive

overview of available systems and their strengths

and weaknesses.

Objectives. The aim of this research is a compari-

son of available database systems regarding their

functionality in the context of IoT. Requirements

are collected based on related research, market re-

search, and given restrictions on the scope of the

study.

Design and Development. Initially, a market and

requirement analysis is performed to identify

database systems for a detailed comparison. The

selected database systems are installed and tested

on-premise to collect data on their performance.

Demonstration. Results of the tests are consoli-

dated, compared, and displayed in an understand-

able format.

Evaluation. Resulting data is evaluated regarding the

predeﬁned criteria representing the systems’ suit-

ability for IoT use cases.

Communication. The research ﬁndings on the suit-

ability of speciﬁc database systems for storing and

processing IoT data are provided.

4 MARKET AND REQUIREMENT

ANALYSIS

4.1 Market Analysis

For the initial selection of databases, we performed

a market analysis. Since there exists a plethora of

database systems, a collection of the information on

all database systems seems impossible. Still, we

aimed for a comprehensive overview of available sys-

tems and collected information on 50 systems in to-

tal. At this stage of our research, we ﬁrst tried to

gather all popular database systems without applying

exclusion criteria. For each selected database sys-

tem we collected information on its name, type (re-

lational, NoSQL, or time-series), the maintainer, li-

cense, as well as stable release, latest version, and

ﬁrst release. Additionally, we state the operating sys-

tems that the system can be installed on (i.e., Win-

dows, Linux, and MacOS). Since our research focuses

on on-premise database systems, we also check the

business model, i.e., if the systems can be installed

on-premise or if they are only hosted as a cloud ser-

vice. To be able to properly compare non-functional

requirements of the database systems, we decided to

use Docker

to ensure a comparable setup. Although

a native installation might be slightly faster compared

to using Docker, trough the lack of virtualization, we

used Docker to ensure that the technical conditions

are uniform across all experiments. This allows us to

assess the performance of the analyzed database sys-

tems. We selected Docker as it is one of the state-of-

the-art container-based technologies, providing an ef-

ﬁcient and scalable foundation for distributed applica-

tions. As such it is widely applied for IoT applications

in practice (Alam et al., 2018). Hence, the last crite-

rion included in the market analysis is the availability

of a Docker image. We only considered Docker im-

ages that are marked as ofﬁcial on Docker hub or are

provided by a veriﬁed publisher. To limit the scope of

our research, we solely focus on relational, NoSQL,

and time-series databases and created a comprehen-

sive long list.

We collected information on 15 relational, 15

NoSQL, and 20 time-series databases and ordered

them by their popularity. We applied a popularity

ranking published and maintained by solid IT gmbh

(2022), who deﬁne a popularity score for roughly 400

different database systems. The underlying method

is based on six parameters, which are the number

of mentions on websites based on the query results

from Google and Bing, general interest as measured

by search frequency based on Google Trends, fre-

quency of online discussions about the system based

on primarily IT-related forums such as Stack Over-

ﬂow, number of job offers that mention the database

systems based on the job search engines Indeed and

Simply Hired, number of LinkedIn proﬁles that state

the database system as a skill, and the systems’ rel-

evance on social media as represented by Twitter

tweets mentioning the system.

The resulting long list of database systems is de-

https://www.docker.com/

IoTBDS 2023 - 8th International Conference on Internet of Things, Big Data and Security

142

picted in appendix A.1. Next, we apply the col-

lected information to create a short list of six database

systems that are then used for our detailed compar-

ison. For our research, we make the initial restric-

tions, that we only consider two systems per category

and additionally, only consider open-source and on-

premise databases that can be installed via Docker on

an Ubuntu

system. We considered the criteria in the

following order: business model (on-premise or cloud

solution), license, availability of Docker image, fur-

ther criteria, and popularity.

From the set of relational databases, ﬁve were

excluded because they are only hosted as a cloud

service (Microsoft Azure, Snowﬂake, Databricks,

Google Big Query, and Amazon Redshift). Six fur-

ther databases could not be considered because they

are not open-source systems (Oracle, Microsoft SQL

Server, IBMDB2, Microsoft Access, Teradata, and

FileMaker). From the remaining four, one system

(SQLite) does not provide an ofﬁcial Docker image,

thus the set of databases was reduced to the systems

MySQL, PostrgreSQL, and Maria DB. Since we only

wanted to select two systems from each category, we

excluded Maria DB in the ﬁnal step because it is less

popular.

For the set of NoSQL database systems, we were

able to exclude seven systems because they are cloud-

only, and thus do not meet the scope of our re-

search (Amazon Dynamo DB, Microsoft Azure Cos-

mos DB, Firebase Realtime Database, CouchBase,

Google Cloud FireStore, Microsoft Azure Table Stor-

age, and Google Cloud Bigtable). One of the remain-

ing eight systems could also not be considered due to

a commercial license (Datastax Enterprise). Based on

the popularity, we selected MongoDB and Cassandra

as ﬁnal candidates for our requirement analysis. Not

chosen because of their lower popularity were Re-

dis, Hbase, Memcached, CouchDB, and Neo4j. Be-

sides, Redis and Memcached aim at different targets,

i.e., they are primarily designed for key-value storage,

which does not ﬁt our research goal.

From the set of 20 time-series databases, 3 were

excluded because they cannot be installed on-premise

(Fauna, TD Engine, and Amazon Timestream). An

additional set of 5 database systems only have a com-

mercial license, and thus could not be considered here

(Kdb+, Dolphin DB, eXtremeDB, IBM Db2 Event

Store, and Raima Database Manager). In the next

step, we identiﬁed that 9 systems did not provide

an ofﬁcial Docker image (Graphite, Timescale DB,

Apache Druid, RRD Tool, Open TSDB, Grid DB,

KairosDB, Victoria Metrics, and Apache IOTDB).

The ﬁnal set of tables consisted of InﬂuxDB,

https://ubuntu.com/

Prometheus, and QuestDB, but since Prometheus is

rather used for monitoring purposes, we excluded it

here, so we selected InﬂuxDB and QuestDB as time-

series databases for our requirements analysis. In-

ﬂuxDB is already a very well-established database

in the ﬁeld of IoT, while QuestDB is still fairly new

since it was ﬁrst released in 2016.

4.2 Requirement Analysis

For the comparison of our ﬁnal set of databases, we

identiﬁed functional and non-functional requirements

that are important criteria to consider when select-

ing a suitable database for a given context. As func-

tional requirements, we consider the following crite-

ria proposed by Bader et al. (2017): (i) distribution

or clusterability, (ii) availability of (a) basic and ag-

gregate and (b) advanced functions, (iii) granularity

and downsampling, (iv) interfaces and extensibility

and (v) support and license.

For the distribution and clusterability criteria, we

consider the CAP theorem, scalability, and load bal-

ancing. In the IoT, consistency is the least important

characteristic from the CAP theorem since eventual

consistency is sufﬁcient as data is collected continu-

ously. The focus is therefore on high availability and

partition tolerance. Due to the large amounts of data,

storage and performance should be scalable to meet

the criteria of an IoT application. For further efﬁ-

ciency, load balancing should be included to evenly

distribute the workload in a distributed system (Bader

et al., 2017).

As basic functions we consider inserting, reading,

scanning, updating, and deleting the data, while aver-

aging, summing, counting, and identifying the maxi-

mal or minimal value are the aggregate functions we

tested. Further, to compare advanced functions we

compare continuous calculation, tags, long-term stor-

age, and matrix time series. Matrix time series de-

scribes that multiple timestamps can be entered per

item (Bader et al., 2017).

In order to compare granularity and down-

sampling, four criteria are considered: support of

down-sampling, the smallest sample interval, the

smallest granularity for storage, and the smallest

guaranteed granularity for storage. Most databases

support down-sampling, i.e., when executing queries

the results can be ﬁtted for outputs with longer time

frames. Two timestamps must be deﬁned, as well as

data for short time intervals within the wider time in-

tervals. The shorter time interval is also referred to

as the sampling interval. Granularity describes the

smallest gap feasible between two timestamps, i.e.,

the degree of detail in data that can be stored. If

On-Premise Internet of Things (IoT) Data Storage: Comparison of Database Management Systems

143

a system stores data in ﬁner granularity than is se-

curely guaranteed, data could be aggregated or might

get lost. For our comparison, we consider the small-

est sample interval, the smallest granularity for stor-

age, and the smallest guaranteed granularity for stor-

age (Bader et al., 2017).

In the next category, interfaces and extensibil-

ity, we consider collecting consisting interfaces and

checking for client libraries and plugins. As inter-

faces, we consider both, graphical and non-graphical

interfaces that are provided for a speciﬁc database

system. Client libraries support handling network

management and accessing the data. The function-

ality of the database system may be increased with

plugins (Bader et al., 2017).

To study support and license, we check for long-

term support (LTS) of stable versions, commercial

support, and free licenses (Bader et al., 2017).

Next to functional requirements, we also con-

sider non-functional requirements for our compari-

son. First, we study the databases’ efﬁciencies when

loading data. We load the data in batches and increase

the batch size in powers of 10. The resulting met-

ric represents the average number of data instances

loaded per unit of time (Hao et al., 2021).

Second, the query latency, i.e., the amount of time

needed to execute a query, is tested. We rely on the

research by Liu and Yuan (2019) as we apply their

deﬁned query types. The resulting metric per query is

the average response time. We slightly adapted the ten

queries presented by Liu and Yuan (2019) to better ﬁt

our research scope and data structure, since Liu and

Yuan (2019) solely focus on time-series databases.

5 COMPARISON OF DATABASES

5.1 Data Collection and Installation

To adequately compare the functionalities of the

database systems, we rely on large datasets. For

our research, we utilize three different data sources.

First, the majority of the data is fetched from The

Things Network (TTN)

, collecting data that is gen-

erated by end devices installed and conﬁgured by the

research group. Using a workﬂow created in Node-

RED

we track every new data point from the col-

lection of devices from TTN and store the data in a

MySQL database, which is only used to easily store

the data during the data collection. Second, we use

further data that is generated by weather sensors in-

https://www.thethingsnetwork.org/

https://nodered.org

stalled at the University of Koblenz. This data can

be accessed by an Application Programming Inter-

face (API) endpoint and is periodically saved in the

MySQL database as well. Last, we artiﬁcially create

data from the two previous data sources to further ex-

tend the size of the dataset. In total, we were able to

collect and generate around 10 million data points.

The data consists of ﬁve attributes: a unique iden-

tiﬁer for each data point, a timestamp, an identiﬁer

referencing the sensor that generated the data, the type

of measurement, and the measured value.

In order to test the database systems, we install

the corresponding Docker image of each system on

an Ubuntu machine. Using a uniﬁed setup for each

database system ensures a fair comparison of the non-

functional requirements of the database systems. For

each database system, we used the latest version of the

Docker image that was available at the time the paper

was written. Our evaluation is written in Python, us-

ing Python client libraries for each database system.

5.2 Comparison

5.2.1 Functional Requirements

The results of the comparison of functional require-

ments can be found in appendix A.2. First, we stud-

ied three criteria related to distributed systems. We

were able to conclude that both relational databases

favored consistency and availability over partition tol-

erance. The NoSQL database Cassandra, however, fa-

vors partition tolerance over consistency, while Mon-

goDB guarantees consistency over availability. Sim-

ilar to Cassandra, InﬂuxDB focuses on availability

over consistency, while QuestDB favors consistency

over availability. Both relational databases do not

support scalability or load balancing. Additionally,

compared to other time-series databases, the open-

source version of InﬂuxDB is not scalable.

All databases support the basic and aggregate

functions that were checked in our research. Conclud-

ing, since all the databases provide aggregate func-

tions, they also fulﬁll the continuous calculation cri-

teria. Only InﬂuxDB supports tags, and only the re-

lational databases do not support long-term storage.

As time-series databases use the timestamp as the pri-

mary key, they do not allow for matrix time series,

while the other database systems do.

All databases support down-sampling without re-

strictions. Additionally, all support a minimum time

range of one millisecond.

Each database provides an API to access the data.

In addition, they all provide client libraries and plug-

ins to connect to the database via multiple program-

IoTBDS 2023 - 8th International Conference on Internet of Things, Big Data and Security

144

Table 1: Comparison of Data Loading (in sec.).

Database 10

MySQL 0.311 2.359 24.570 253.74 2476.4 - - -

PostgreSQL 0.102 0.083 0.158 0.859 15.436 37.727 276.273 3086.24

Cassandra 0.034 0.626 3.131 29.807 247.68 2421.68 - -

MongoDB 1.110 1.123 4.479 40.033 450.10 3844.97 24127.58 -

InﬂuxDB 1.714 1.479 12.222 119.83 1263.70 12273.91 - -

QuestDB 0.251 0.190 0.285 0.315 0.316 2.0347 19.161 272.30

ming languages. Most of them also support a free

open-source interface for querying and visualizing

data. Detailed information on the provided APIs can

be found in appendix A.2.

For the last subgroup of functional requirements,

we collected information on the current stable ver-

sion, if they have commercial support, and the license.

We provide that information to give an understand-

ing of how active the development of all the selected

databases is.

5.2.2 Non-functional Requirements

The test of the non-functional requirements is divided

into two parts. First, we measured the data inges-

tion time using the data that was collected from the

different sensor sources. This measurement should

represent the behavior of the databases when faced

with inserting large amounts of IoT data. From the

results in table 1, we can conclude that QuestDB is by

far the fastest database when inserting data. During

the test, we noticed that some databases crash when

the amount of data points inserted is too large. Only

QuestDB and PostgreSQL were able to cope with

the entire set of ten million data instances. MySQL

was the slowest database system and additionally also

crashed ﬁrst after inserting ten thousand data points.

To our surprise, InﬂuxDB showed a low performance.

Second, we studied the query latency using ten

thousand data points from the dataset. We needed

to reduce the entire dataset to ten thousand because

MySQL was not able to cope with more data. Thus,

to ensure a fair comparison, we reduced the dataset.

The results are depicted in table 2. In total, we ex-

ecuted 10 queries (Q1 - Q10) according to Liu and

Yuan (2019):

Q1: Exact Point Query. A particular data instance

was fetched from the databases based on speci-

ﬁed matching criteria. The relational databases

performed well, which can be explained by their

indexed primary key. Overall, Cassandra showed

the best performance followed by PostgreSQL

and QuestDB.

Q2: Time Range Query. For this query, a time

range for the data that needs to be fetched was

speciﬁed. Similarly, as for the ﬁrst query, Cas-

sandra showed the best performance followed by

QuestDB. Here MongoDB showed the worst per-

formance, i.e., being the slowest database the

fetch the data.

Q3: Time Range Query with Limit. The third

query is a modiﬁcation of the second query since

a limit was added, reducing the fetched data to

a pre-deﬁned amount. Here Cassandra and Post-

greSQL showed the best results.

Q4: Time Range Query with Multiple Filters.

For this query, we included multiple ﬁlter crite-

ria to the query. Cassandra was again the fastest

database to execute the query, while MongoDB

showed the slowest performance.

Q5: Time Range Query with Multiple Filters

with Limit. As an adaptation to the fourth query,

we added a limit to restrict the number of results.

Cassandra again showed the best performance,

followed by the two relational database systems.

Q6: Query with Aggregation Function (AVG).

From this query onward we test several aggrega-

tion functions. Starting with the average function,

our results show that QuestDB executed the query

ﬁrst. The two relational databases also show de-

cent results.

Q7: Query with Aggregation Function

(COUNT). When counting data points, Post-

greSQL showed the best performance. Due to an

error, the results of the Cassandra database could

not be considered. The Python client was not

able to execute the query, since the aggregation

of data was grouped by the devices, which is

not the primary key. Cassandra, however, only

permits an aggregation grouped by the primary

key, which in our case is the timestamp.

Q8: Query with Aggregation Function (SUM/

MEAN). For the SUM or MEAN function, Mon-

goDB showed the best performance. Similarly

to the previous query, the result of the Cassandra

database could not be considered due to an error

when executing the query.

Q9: Query with Aggregation Function (MAX/

MIN). QuestDB, MySQL, and InﬂuxDB showed

On-Premise Internet of Things (IoT) Data Storage: Comparison of Database Management Systems

145

Table 2: Comparison of Query Latency (in sec.).

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

MySQL 0.095 0.102 0.836 0.091 0.028 0.083 0.107 0.186 0.067 0.281

PostgreSQL 0.031 0.162 0.016 0.12 0.018 0.136 0.036 0.117 0.147 0.218

Cassandra 0.0004 0.0009 0.001 0.0005 0.001 0.485 - - 0.337 -

MongoDB 0.851 0.3 0.041 0.334 0.313 0.166 0.074 0.116 0.103 0.236

InﬂuxDB 0.203 0.197 0.131 0.071 0.087 0.213 0.213 1.584 0.087 0.213

QuestDB 0.037 0.037 0.045 0.058 0.044 0.0419 0.121 0.135 0.055 0.141

the best results when applying the MAX or MIN

aggregation function. The database Cassandra

showed rather weak performance.

Q10: Query with Aggregation Function (MAX)

and Group-By Clause. Due to the same exe-

cution error as before, the result of the Cassan-

dra database could not be considered. QuestDB

showed the best performance among the remain-

ing database systems.

Despite the reoccurring error, the NoSQL database

Cassandra shows the best performance for ﬁve queries

as compared to the other database systems. When cal-

culating the average, however, QuestDB shows the

best results, while InﬂuxDB comes in last due to a

signiﬁcantly higher value for Q8 as compared to re-

sults from other queries. On average, PostgreSQL and

MySQL also show decent results overall.

6 DISCUSSION AND

CONCLUSION

In this research, we demonstrated a comparison of

database systems regarding their functionalities re-

quired to meet the demands for IoT applications.

We initially performed a market analysis to iden-

tify a large set of databases. For our market study,

we focused on relational, NoSQL, and time-series

databases. After having collected information on 50

database systems, we reﬁned the set based on pre-

deﬁned criteria. We solely focused on open-source,

on-premise database systems that can be set up using

an ofﬁcial Docker image. After excluding systems

that did not meet our criteria, we applied a popularity

measure to identify the two most popular systems per

category from the systems that were still remaining.

Next, we performed a requirements analysis based

on related literature to identify functional and non-

functional requirements for a database system in an

IoT context. Based on these requirements we per-

formed a variety of tests. As a result, we were able

to demonstrate the weaknesses and strengths of the

database systems, which helps practitioners and re-

searchers in their decision on which database to use.

From our comparison of the functional require-

ments, we conclude the following aspects. The se-

lected relational database systems focus on consis-

tency and availability, while both, the NoSQL and

time-series databases deﬁnitely focus on partition tol-

erance, but then either aim for high availability or

high consistency. Only the relational database sys-

tems are not scalable and do not provide load bal-

ancing. Additionally, the open-source version of In-

ﬂuxDB is also not scalable. Since scalability is an

important aspect to handle the vast amounts of data

created by IoT devices, our results regarding this cri-

terion show that relational databases are less suitable.

Additionally, the relational databases also do not sup-

port long-term storage, which again makes them less

appropriate for IoT use cases. Tags are only supported

by InﬂuxDB, while matrix time series is only possi-

ble in the relational and NoSQL databases. For the

remaining functional criteria, we were not able to ob-

serve signiﬁcant differences between the systems.

Our test of the non-functional requirements

demonstrated inclusive results. First, the results of

the data ingestion test showed that QuestDB, one of

the time-series databases tested, shows the best re-

sults overall, while InﬂuxDB performed very poorly.

Based on this observation, we cannot make the gen-

eralized conclusion that time series databases are best

suited for storing large amounts of data quickly. Inter-

estingly, one of the relational databases, PostgreSQL,

showed very decent results for each test as well. Both

NoSQL databases performed moderately.

Second, our comparison of the query latencies

demonstrated that Cassandra shows by far the best

results. Additionally, the relational databases show

decent performance for basic queries since they used

a primary key on the timestamp improving the per-

formance of queries on the timestamp. On average,

QuestDB performs the best. Similar to our test on

loading the data into the databases, InﬂuxDB shows

inefﬁcient performance. As expected, the relational

databases performed only decently.

While based on the results of our test on func-

tional and non-functional requirements, we are not

able to identify a clear winner, we still conclude that

QuestDB appears to be most suitable to handle the

IoTBDS 2023 - 8th International Conference on Internet of Things, Big Data and Security

146

large amounts of data generated from IoT devices. We

additionally observed that there exist large differences

between databases of the same type since InﬂuxDB

indicated a weaker performance than QuestDB, for

instance.

Our study has limitations since we restricted the

set of tested systems to open-source and on-premise

databases, while commercial and/or systems hosted

on clouds make up a large majority of the database

systems available as demonstrated in our market re-

search. We did not consider further requirements for

using database systems in an IoT application, such

as security. Still, we demonstrated the strengths and

weaknesses of six popular databases regarding their

suitability for IoT use cases, providing a comprehen-

sive overview to support decision-making.

ACKNOWLEDGEMENTS

This research has been supported by the Deutsche

Forschungsgemeinschaft (DFG) under Research

Grant No. 432399058.

REFERENCES

Alam, M., Ruﬁno, J., Ferreira, J., Ahmed, S. H., Shah, N.,

& Chen, Y. (2018). Orchestration of microservices for

iot using docker and edge computing. IEEE Commun.

Mag., 56(9), 118–123.

Amghar, S., Cherdal, S., & Mouline, S. (2018). Which

NoSQL database for IoT Applications? 2018 Proc.

of MoWNeT, 131–137.

Bader, A., Kopp, O., & Falkenthal, M. (2017). Survey

and comparison of open source time series databases.

Datenbanksysteme für Business, Technologie und Web

(BTW 2017) - Workshopband, 249–268.

Baiyere, A., Topi, H., Wyatt, J., Venkatesh, V., & Donnel-

lan, B. (2020). Internet of things (IoT) – a research

agenda for information systems. Commun. Assoc. Inf.

Syst., 47(1), 21.

Beynon-Davies, P. (2017). Database systems. Bloomsbury

Publishing.

Chauhan, D., & Bansal, K. L. (2017). Using the advantages

of nosql: A case study on mongodb. Int. J. Recent In-

nov. Trends Comput. Commun., 5, 90–93.

Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey.

Mob. Netw. Appl., 19, 171–209.

Chui, M., Löfﬂer, M., & Roberts, R. (2020). The Internet of

Things. McKinsey Quarterly.

Codd, E. F. (1970). A relational model of data for large

shared data banks. Commun. ACM, 13, 377–387.

De Mauro, A., Greco, M., & Grimaldi, M. (2015). What is

big data? A consensual deﬁnition and a review of key

research topics. AIP Conference Proceedings, 1644,

97–104.

Elmasri, R., & Navathe, S. B. (2016). Fundamentals of

database systems (7th ed.). Pearson.

Fatima, H., & Wasnik, K. (2016). Comparison of sql, nosql

and newsql databases for internet of things. 2016

IEEE Bombay Section Symposium (IBSS), 1–6.

Gregor, S., & Hevner, A. R. (2013). Positioning and pre-

senting design science research for maximum impact.

MIS Quarterly, 37, 337–355.

Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M.

(2013). Internet of Things (IoT): A vision, architec-

tural elements, and future directions. Future Gener.

Comput. Syst., 29(7), 1645–1660.

Hao, Y., Qin, X., Chen, Y., Li, Y., Sun, X., Tao, Y., Zhang,

X., & Du, X. (2021). Ts-benchmark: A benchmark

for time series databases. Proc. of 37th IEEE ICDE,

588–599.

ITU-T, T. S. S. o. I. (2011). Overview of the Internet of

Things (tech. rep.) [Publication Title: International

Telecommunication Union].

Liu, R., & Yuan, J. (2019). Benchmarking time series

databases with iotdb-benchmark for iot scenarios.

arXiv preprint. http://arxiv.org/abs/1901.08304

Lycett, M. (2013). ’Dataﬁcation’: Making sense of (big)

data in a complex world. Eur. J. Inf. Syst., 22(4),

381–386.

Miorandi, D., Sicari, S., De Pellegrini, F., & Chlamtac,

I. (2012). Internet of things: Vision, applications

and research challenges. Ad Hoc Networks, 10(7),

1497–1516.

Mostafa, J., Wehbi, S., Chilingaryan, S., & Kopmann, A.

(2022). Scits: A benchmark for time-series databases

in scientiﬁc experiments and industrial internet of

things. Proc. of the 34th Int. Conf. Sci. Stat. Database

Manag., 1–11.

Musa, E., Dela

c, G., Šili

c, M., & Vladimir, K. (2019).

Comparison of relational and time-series databases

for real-time massive datasets. 2019 Proc. of the 42th

MIPRO, 1065–1070.

Nasar, M., & Kausar, M. A. (2019). Suitability of inﬂuxdb

database for iot applications. Int. J. Eng. Innov. Tech-

nol., 8, 1850–1857.

Peffers, K., Tuunanen, T., Rothenberger, M. A., & Chatter-

jee, S. (2007). A design science research methodology

for information systems research. J. Manag. Inf. Syst.,

24(3), 45–77.

Rautmare, S., & Bhalerao, D. M. (2016). MySQL and

NoSQL database comparison for IoT application.

2016 Proc. of. IEEE ICACA, 235–238.

Rayes, A., & Salam, S. (2017). Internet of things from hype

to reality. Springer Basel.

Sestino, A., Prete, M. I., Piper, L., & Guido, G. (2020). In-

ternet of Things and Big Data as enablers for business

digitalization strategies. Technovation, 98(July).

solid IT gmbh. (2022). Db-engines ranking. Retrieved

December 7, 2022, from https://db-engines.com/en/

ranking/relational+dbms

Strohbach, M., Ziekow, H., Gazis, V., & Akiva, N. (2015).

Towards a big data analytics framework for iot and

smart city applications. MOST, 4, 257–282.

On-Premise Internet of Things (IoT) Data Storage: Comparison of Database Management Systems

147

A APPENDIX

A.1 Market Analysis

Database Type

Operating

Systems

Maintainer License

Latest Stable

Release

First

Release

Cloud/

on-prem.

Ofﬁcial

Docker

Image

Oracle

relational,

multi-model

all

Oracle

Corp

commercial

in 2019:

19C

1980 both no

MySQL

relational,

multi-model

all MySQL open-source

in 2022:

8.0.29

1995 both yes

Microsoft

SQL Server

relational,

multi-model

Linux,

Windows

Microsoft commercial

in 2019:

SQL Server

2019

1989 both no

PostgreSQL

relational,

multi-model

all

PostgreSQL

Global Devel-

opment Group

open-source

in 2022:

15.1

1996 both yes

IBMDB2

relational,

multi-model

all IBM commercial

in 2022:

13.1

1983 both yes

Microsoft Access relational Windows Microsoft

commercial

with trailware

in 2019:

Microsoft

Access 2019

1992 both no

SQLite relational all SQLite public domain

in 2022:

3.39.0

2000 both no

Maria DB

relational,

multi-model

all

MariaDB

Corporation Ab,

MariaDB

Foundation

open-source

in 2022:

10.9.3

2009 both yes

Microsoft Azure

relational,

multi-model

Windows Microsoft commercial

in 2022:

V12

2010 cloud no

Snowﬂake relational

hosting

on all

Snowﬂake

Computing Inc.

commercial

in 2022:

6.15

2014 cloud no

Databricks multi-model

hosting

on all

Databricks open-source

in 2022:

11.2

2013 cloud no

Google Big Query relational

hosting

on all

Google commercial

in 2022:

3.3.3

2010 cloud no

Teradata relational Linux Teradata commercial

in 2019:

1.0 MU2

1984 both no

Amazon Redshift

relational,

multi-model

hosting

on all

Amazon commercial

in 2022:

1.0.41465

2012 cloud no

FileMaker relational all FileMaker commercial

in 2021:

19.4.1

1983 both no

Mongo DB Document Store all MongoDB, Inc

open-source

Server-side

public license

in 2022:

6.0.1

2009 both yes

Cassandra

Wide-column

store

all

Apache Software

Foundation

open-source

in 2022:

4.0.6

2008 both yes

Redis Key-value store all Redis

open-source

in 2022:

7.0.4

2009 both yes

Hbase

Wide-column

store

all

Apache Software

Foundation

open-source

in 2022:

2.5.0

2008 both yes

Amazon

Dynamo DB

Document store,

Key-value store

all Amazon commercial

in 2022

(no version)

2012 cloud no

Microsoft Azure

Cosmos DB

Graph DBMS,

Document

Key-value,

Wide-column

all Microsoft commercial

in 2022

(no version)

2014 cloud no

Memcached Key-value store all Danga Interactive open-source

in 2022:

1.6.15

2003 both yes

Firebase

Realtime Database

Document store all Google commercial

in 2022

(no version)

2012 cloud no

Datastax Enterprise

Wide-column

store

all DataStax commercial

in 2020:

6.8

2011 both no

CouchDB Document Store all

Apache Software

Foundation

open-source

in 2022:

3.2.2

2005 both yes

CouchBase Document Store all CouchBase, Inc open-source

in 2021:

7.1.0

2010 cloud yes

Google Cloud

FireStore

Document store all Google commercial

in 2022

(no version)

2017 cloud no

Microsoft Azure

Table Storage

Wide-column

store

all Microsoft commercial

in 2022

(no version)

2012 cloud no

Google Cloud

Bigtable

Key-value,

Wide-column

store

all Google commercial

in 2022

(no version)

2015 cloud no

IoTBDS 2023 - 8th International Conference on Internet of Things, Big Data and Security

148

Neo4j Grapf DBMS all Neo4j, Inc open-source

in 2022:

4.4.11

2007 both yes

InﬂuxDB Time-series all InﬂuxData open-source

in 2022:

2.4

2013 both yes

Kdb+ Time-series all Kx Systems commercial

in 2018:

3.6

2000 both no

Prometheus Time-series all Prometheus open-source

in 2022:

V2.3.0

2015 both yes

Graphite Time-series Linux

Orbitz

Worldwide, Inc

open-source

in 2022:

1.1.10

2006 both no

Timescale DB Time-series all Timescale Inc open-source

in 2022:

2.6.0

2018 both no

Apache Druid Time-series all

Apache Software

Foundation

open-source

in 2022:

24.0.0

2012 both no

RRD Tool Time-series Linux Tobias Oetiker open-source

in 2022:

1.8.0

1999 both no

Open TSDB Time-series all Yahoo open-source

in 2021:

2.4.1

2011 both no

Dolphin DB Time-series

Linux,

Windows

Dolphin Db, Inc commercial

in 2022:

V2.00.4

2018 both no

Fauna Time-series all Fauna, Inc commercial

in 2021

(no version)

2014 cloud no

Quest DB Time-series all QuestDB Limited open-source

in 2021:

6.0.3

2016 both yes

Grid DB Time-series Linux

Toshiba Digital

Solutions Corp.

open-source

in 2022:

5.0.0

2013 both no

TD Engine Time-series

Linux,

Windows

TDEngine open-source

in 2022:

3.0.1.3

2019 cloud no

Amazon Timestream Time-series all Amazon commercial

in 2022

(no version)

2022 cloud no

KairosDB Time-series all KairosDB open-source

in 2018:

1.2.2

2013 both no

eXtremeDB Time-series all McObject commercial

in 2021:

8.2

2001 both no

Victoria Metrics Time-series all VictoriaMetrics open-source

in 2022:

V1.77

2018 both no

IBM Db2 Event Store Time-series all IBM commercial

in 2021:

2017 both no

Raima Database

Manager

Time-series all Raima Inc commercial

in 2021:

15.2

1984 both no

Apache IOTDB Time-series all

Apache Software

Foundation

open-source

in 2022:

0.13.2

2018 both no

A.2 Comparison

Requirement MySQL PostgreSQL Cassandra MongoDB InﬂuxDB QuestDB

CAP Theorem CA CA AP CP AP CP

Scalability ✗ ✗ ✓ ✓ ✗ ✓

Load Balancing ✗ ✗ ✓ ✓ ✓ ✓

ii)

Basic Functions ✓ ✓ ✓ ✓ ✓ ✓

Aggregate Functions

✓ ✓ ✓ ✓ ✓ ✓

Continuous Calculation

✓ ✓ ✓ ✓ ✓ ✓

Tags ✗ ✗ ✗ ✗ ✓ ✗

Long-term Storage ✗ ✗ ✓ ✓ ✓ ✓

Matrix Time Series ✓ ✓ ✓ ✓ ✗ ✗

iii)

Down-Sampling ✓ ✓ ✓ ✓ ✓ ✓

Smallest Sample

Interval

1ms 1ms 1ms 1ms 1ms 1ms

Smallest Granularity

for Storage

1ms 1ms 1ms 1ms 1ms 1ms

Smallest Guaranteed

Granularity for Storage

1ms 1ms 1ms 1ms 1ms 1ms

iv)

API and Interfaces

MySQL Rest

API, MySQL

Workbench

and CLI

PostgREST

for API,

PgAdmin and

CLI

Cassandra

Restful API,

Cassandra Query

Language Shell

(cqlsh)

MongoDB

REST API,

MongoDB

Compass

Collect, CLI,

Graphite,

InﬂuxQL,

OpenTSDB

QuestDB

REST API,

Web Console

Client Libraries ✓ ✓ ✓ ✓ ✓ ✓

Plugins ✓ ✓ ✓ ✓ ✓ ✓

Version Used in Experiments

and Release Date

8.0.30

Jul 2022

15.0

Oct 2022

4.0

Oct 2022

6.0

Aug 2022

2.4.0

Aug 2022

6.5.4

Oct 2022

Commercial Support ✓ ✓ ✓ ✓ ✓ ✓

License GNU GPL

PostgreSQL

License

Apache Software

Foundation

GNU AGPL MIT Apache-2.0

On-Premise Internet of Things (IoT) Data Storage: Comparison of Database Management Systems

149