Cassandra’s Performance and Scalability Evaluation

Melyssa Barata, Jorge Bernardino

Abstract

In the past, relational databases were the most commonly used technology for storing and retrieving data, allowing easier management and retrieval of any stored information organized as a set of tables. However, today databases are larger in size and the query execution time can become very long, requiring servers with bigger capacities. The purpose of this paper is to describe and analyze the Cassandra NoSQL database using the Yahoo! Cloud Serving Benchmark in order to better understand the execution capabilities for various types of applications in environments with different amounts of stored data. The experiments with Cassandra show good scalability and performance results and how the database size and number of nodes affect it.

References

  1. Abramova V., Bernardino J., Furtado P. (2015), SQL or NoSQL? Performance and scalability evaluation. IJBPIM 7(4): 314-321 (2015)
  2. Barata, M., Bernardino J., and Furtado P. (2014), "YCSB and TPC-H: Big Data and Decision Support Benchmarks," 2014 IEEE International Congress on Big Data, Anchorage, AK, 2014, pp. 800-801.
  3. Cattell R. (2010) “Scalable SQL and NoSQL data stores” SIGMOD Record Vol. 39 No. 4 pp. 12-27.
  4. Chang F., Dean J., Ghemawat S., Hsieh W., Wallach A., Burrows M., Chandra T., Fikes A., and Gruber R. (2008) “Bigtable: A distributed storage system for structured data” ACM Trans. Comput. Syst. (26) no. 2.
  5. Charsyam - Cassandra Data Model - https://charsyam. wordpress.com/tag/cassandra-data-model/ Accessed 08-01-2015.
  6. Cooper, B. F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R. (2010) “Benchmarking Cloud Serving Systems with YCSB”, SoCC pp. 143-154.
  7. Data Magnum -Graph Databases (Including Object DBS) - http://data-magnum.com/lesson-8-graph-databasesincluding-object-dbs/ Accessed 29-11-2015.
  8. Datastax - Apache Cassandra™ 1.1 Documentation - http://www.datastax.com/doc-source/pdf/cassandra11. pdf - Accessed 31-05-2014.
  9. DeCandia G., Hastorun D., Jampani M., Kakulapati G., Lakshman A, Pilchin A, Vosshall P., and Vogels W., (2007) “Dynamo: Amazon's Highly Available KeyValue Store”, SOSP pp.205-220.
  10. GitHub - YCSB. - https://github.com/brianfrankcooper/ YCSB/wiki/core-workloads - Accessed 02-11-2015.
  11. High Scalability - 5 Steps To Benchmarking Managed NoSQL - DynamoDB Vs Cassandra - http:// highscalability.com/blog/2013/4/3/5-steps-to-benchma rking-managed-nosql-dynamodb-vs-cassandra.html - Accessed 04-11-2015.
  12. Huang, Y. and Luo T. (2013) “NoSQL Database: A Scalable, Availability, High Performance Storage for Big Data”. ICPCA/SWS pp. 172-182.
  13. Karp, A. and Flatt H. (1990) “Measuring Parallel Processor Performance”, Commum. ACM Vol. 33 No. 5, pp.539-543.
  14. Kuwahara, H., Fan, M., Wang, S., Gao, X., (2013) “A framework for scalable parameter estimation of gene circuit models using structural information”. Bioinformatics, Vol. 29 No. 13 pp. 98-107.
  15. Lakshman, A. and Malik P. (2010) “Cassandra - A Decentralized Structured Storage System”, Operating Systems Review Vol. 44 No. 2 pp. 35-40.
  16. MongoDB - MongoDB CRUD Introduction. http://docs.mongodb.org/manual/core/crud-introductio n/ Accessed 29-11-2014.
  17. Moniruzzaman A. B. M. and Hossain S. A. (2013) “NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison.” CoRR Vol. abs/1307.0191.
  18. Pokorny J. (2011) “NoSQL databases: a step to database scalability in web environment.” pp. 278-283.
  19. Smith C., and Williams L. G. (2000) “Performance and Scalability of Distributed Software Architectures: An SPE Approach”. Scalable Computing: Practice and Experience Vol. 3 No. 4.
Download


Paper Citation


in Harvard Style

Barata M. and Bernardino J. (2016). Cassandra’s Performance and Scalability Evaluation . In Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-193-9, pages 127-134. DOI: 10.5220/0005980101270134


in Bibtex Style

@conference{data16,
author={Melyssa Barata and Jorge Bernardino},
title={Cassandra’s Performance and Scalability Evaluation},
booktitle={Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2016},
pages={127-134},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005980101270134},
isbn={978-989-758-193-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Cassandra’s Performance and Scalability Evaluation
SN - 978-989-758-193-9
AU - Barata M.
AU - Bernardino J.
PY - 2016
SP - 127
EP - 134
DO - 10.5220/0005980101270134