strategies, and analyzed its results.
In many computational problems, there is a trade-
off between storage space and time. In both test
cases performed in this work this trade-off remains.
In the first test case, Cassandra’s TWCS performed
20% faster than DTCS, but the former used, in peak,
18.4% more disk space. In addition, the configuration
of TWCS is simpler than the one to DTCS, what cor-
roborates with the community decision to deprecate
DTCS and continue developing TWCS. Users should
choose DTCS only if the disk space is limited.
The second test case evaluated different
configurations of the TWCS parameter com-
paction window size in relation to a scenario where
data becomes expired after 90 minutes. The 1 minute
value was near-optimal in terms of elapsed time,
latency and throughput. Nevertheless, the space
trade-off continued to be an issue. The fastest test
case was shown to be the one that used more space,
in peak 40.8%, in comparison with the 60 minute
window size, which lasted 72.6% more.
The 1 minute was the lowest value accepted in the
TWCS configuration. As it has shown to be the near
optimal, we recommend that other values lower than
1 minute be accepted by Cassandra.
The ratio between the expiration time (TTL plus
grace period) and compaction window size shall be
tested with different values. If a near-optimal relation
can be confirmed, users may have a ”golden rule” to
configure their column-families.
Further work is needed mainly in the second test
case. Tests must be performed in a larger test bed to
evaluate if the results are consistent. Likewise, other
read/write ratio operations must be tested. Although
600 million rows is already a considerable size, tests
should be made with larger data sets – preferably
with real data sets instead of generated data as in the
present paper.
This study will be useful within a broader research
goal the authors aim to achieve: the creation of an
auto tuning component for compaction parameters,
initially within TWCS. These results will help to cre-
ate rules that lead to an autonomous performance tun-
ing agent, which intends to eliminate the time and ef-
fort users spend tuning the database compaction strat-
egy parameters.
ACKNOWLEDGEMENTS
This research work has the support of the Brazil-
ian research and innovation Agencies CAPES
(Grant 23038.007604/2014-69 FORTE–Tempestive
Forensics Project), CNPq (Grant 465741/2014-
2 – Cybersecurity INCT) and FAPDF (Grants
0193.001366/2016 UIoT–Universal Internet of
Things and 0193.001365/2016–Secure Software De-
fined Data Center (SSDDC)), as well as the Ministry
of Planning, Development and Management (Grants
005/2016 DIPLA–Planning and Management Direc-
torate and 11/2016 SEST–Secretariat of State-owned
Federal Companies) and the DPGU–Brazilian Union
Public Defender (Grant 066/2016).
REFERENCES
Abu-Elkheir, M., Hayajneh, M., and Ali, N. (2013). Data
Management for the Internet of Things: Design Prim-
itives and Solution. Sensors, 13(11):15582–15612.
Apache, S. F. (2016). The Cassandra-stress tool.
http://cassandra.apache.org/doc/latest/tools/cassandra
stress.html. Last accessed 22 January 2018.
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach,
D. A., Burrows, M., Chandra, T., Fikes, A., and Gru-
ber, R. E. (2008). Bigtable: A distributed storage sys-
tem for structured data. ACM Transactions on Com-
puter Systems (TOCS), 26(2):4.
Chebotko, A., Kashlev, A., and Lu, S. (2015). A Big Data
Modeling Methodology for Apache Cassandra. pages
238–245. IEEE.
Cruz Huacarpuma, R., de Sousa Junior, R. T., de Holanda,
M. T., de Oliveira Albuquerque, R., Garc
´
ıa Villalba,
L. J., and Kim, T.-H. (2017). Distributed Data Service
for Data Management in Internet of Things Middle-
ware. Sensors, 17(5):977.
DataStax (2017). Datastax docs : The write path to
compaction. https://docs.datastax.com/en/cassandra/
2.1/cassandra/dml/dml write path c.html. Last ac-
cessed 22 January 2018.
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati,
G., Lakshman, A., Pilchin, A., Sivasubramanian, S.,
Vosshall, P., and Vogels, W. (2007). Dynamo: Ama-
zon’s Highly Available Key-value Store. In Proceed-
ings of Twenty-first ACM SIGOPS Symposium on Op-
erating Systems Principles, SOSP ’07, pages 205–
220, New York, NY, USA. ACM.
Ghosh, M., Gupta, I., Gupta, S., and Kumar, N. (2015).
Fast Compaction Algorithms for NoSQL Databases.
In 2015 IEEE 35th International Conference on Dis-
tributed Computing Systems, pages 452–461.
Gubbi, J., Buyya, R., Marusic, S., and Palaniswami, M.
(2013). Internet of Things (IoT): A vision, architec-
tural elements, and future directions. Future Genera-
tion Computer Systems, 29(7).
Jirsa, J. (2016). TWCS experiments and improvement
proposals - ASF JIRA [CASSANDRA-10195].
https://issues.apache.org/jira/browse/CASSANDRA-
10195.
Kona, S. (2016). Compactions in Apache Cassandra :
Performance Analysis of Compaction Strategies in
Apache Cassandra. Master’s thesis, Blekinge Insti-
tute of Technology, Karlskrona, Sweden.
NoSQL Database Performance Tuning for IoT Data
283