Sharding by Hash Partitioning - A Database Scalability Pattern to Achieve Evenly Sharded Database Clusters

Caio H. Costa, João Vianney B. M. Filho, Paulo Henrique M. Maia, Francisco Carlos M. B. Oliveira

2015

Abstract

With the beginning of the 21st century, web applications requirements dramatically increased in scale. Applications like social networks, ecommerce, and media sharing, started to generate lots of data traffic, and companies started to track this valuable data. The database systems responsible for storing all this information had to scale in order to handle the huge load. With the emergence of cloud computing, scaling out a database system has became an affordable solution, making data sharding a viable scalability option. But to benefit from data sharding, database designers have to identify the best manner to distribute data among the nodes of shared cluster. This paper discusses database sharding distribution models, specifically a technique known as hash partitioning. Our objective is to catalog in the format of a Database Scalability Pattern the best practice that consists in sharding the data among the nodes of a database cluster using the hash partitioning technique to nicely balance the load between the database servers. This way, we intend to make the mapping between the scenario and its solution publicly available, helping developers to identify when to adopt the pattern instead of other sharding techniques.

References

  1. Abramson, I., Abbey, M., Corey, M. J., and Malcher, M. (2009). Oracle Database 11g. A Beginner's Guide. Oracle Press.
  2. Adler, B. (2011). Building scalable applications in the cloud. reference architecture and best practices.
  3. Boicea, A., Radulescu, F., and Agapin, L. I. (2012). Mongodb vs oracle - database comparison. In Proceedings of the 2012 Third International Conference on Emerging Intelligent Data and Web Technologies, pages 330-335. IEEE.
  4. Connolly, T. M. and Begg, C. E. (2005). DATABASE SYSTEMS. A Practical Approach to Design, Implementation, and Management. Addison-Wesley, 4th edition.
  5. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. (2007). Dynamo: amazon's highly available key-value store. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP 7807, pages 205-220, New York, NY, USA. ACM.
  6. DeWitt, D. and Gray, J. (1992). Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6):85-98.
  7. Eessaar, E. (2008). On pattern-based database design and implementation. In Proceedings of the 2008 International Conference on Software Engineering Research, Management and Applications, pages 235-242. IEEE.
  8. Elmasri, R. and Navathe, S. B. (2011). Fundamentals of Database Systems. Addison-Wesley, 6th edition.
  9. Fehling, C., Leymann, F., Retter, R., Schumm, D., and Schupeck, W. (2011). An architectural pattern language of cloud-based applications. In Proceedings of the 18th Conference on Pattern Languages of Programs, number 2 in PLoP 7811, pages 1-11, New York, NY, USA. ACM.
  10. Fowler, M., Rice, D., Foemmel, M., Hieatt, E., Mee, R., and Stafford, R. (2002). Patterns of Enterprise Application Architecture. Addison-Wesley.
  11. Gamma, E., Helm, R., Johnson, R., and Vlissides, J. (1994). Design Patterns. Elements of Reusable Object-Oriented Software. Addison-Wesley, 1st edition.
  12. Go, J. (2014). Designing a scalable partitioning strategy for azure table storage. http://msdn.microsoft.com/enus/library/azure/hh508997.aspx.
  13. Hafiz, M. (2006). A collection of privacy design patterns. In Proceedings of the 2006 Conference on Pattern Languages of Programs, number 7 in PLoP 7806, pages 1- 7, New York, NY, USA. ACM.
  14. Hohpe, G. and B.Woolf (2003). Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. Addison-Wesley, 1st edition.
  15. Kalotra, M. and Kaur, K. (2014). Performance analysis of reusable software systems. In 2014 5th International Conference on Confluence The Next Generation Informatino Technology Summit, pages 773-778. IEEE.
  16. Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., and Lewin, D. (1997). Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, STOC 7897, pages 654-663, New York, NY, USA. ACM.
  17. Liu, Y., Wang, Y., and Jin, Y. (2012). Research on the improvement of mongodb auto- sharding in cloud environment. In Proceedings of the 7th International Conference on Computer Science and Education, pages 851-854. IEEE.
  18. Pallmann, D. (2011). Windows azure design patters. http://neudesic.blob.core.windows.net/webpatterns/ index.html.
  19. Rivest, R. (1992). The md5 message-digest algorithm. IETF RFC 1321.
  20. Sadalage, P. J. and Fowler, M. (2013). NoSQL Distilled. A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley, 1st edition.
  21. Shumacher, M. (2003). Security patterns and security standards - with selected security patterns for anonymity and privacy. In European Conference on Pattern Languages of Programs.
  22. Shumacher, M., Fernandez-Buglioni, E., Hybertson, D., Buschmann, F., and Sommerlad, P. (2006). Security Patterns: Integrating Security and Systems Engineering. Wiley.
  23. Stonebraker, M. and Cattell, R. (2011). 10 rules for scalable performance in 'simple operation' datastores. Communications of the ACM, 54(6):72-80.
  24. Strauch, S., Andrikopoulos, V., Breitenbuecher, U., Kopp, O., and Leymann, F. (2012). Non-functional data layer patterns for cloud applications. In 2012 IEEE 4th International Conference on Cloud Computing Technology and Science, pages 601-605. IEEE.
  25. Wu, P. and Yin, K. (2010). Application research on a persistent technique based on hibernate. In International Conference on Computer Design and Applications, volume 1, pages 629-631. IEEE.
  26. Zilio, D. C., Jhingran, A., and Padmanabhan, S. (1994). Partitioning key selection for a shared-nothing parallel database system. Technical report, IBM Research Division, Yorktown Heights, NY.
Download


Paper Citation


in Harvard Style

H. Costa C., Vianney B. M. Filho J., Henrique M. Maia P. and Carlos M. B. Oliveira F. (2015). Sharding by Hash Partitioning - A Database Scalability Pattern to Achieve Evenly Sharded Database Clusters . In Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-096-3, pages 313-320. DOI: 10.5220/0005376203130320


in Bibtex Style

@conference{iceis15,
author={Caio H. Costa and João Vianney B. M. Filho and Paulo Henrique M. Maia and Francisco Carlos M. B. Oliveira},
title={Sharding by Hash Partitioning - A Database Scalability Pattern to Achieve Evenly Sharded Database Clusters},
booktitle={Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2015},
pages={313-320},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005376203130320},
isbn={978-989-758-096-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Sharding by Hash Partitioning - A Database Scalability Pattern to Achieve Evenly Sharded Database Clusters
SN - 978-989-758-096-3
AU - H. Costa C.
AU - Vianney B. M. Filho J.
AU - Henrique M. Maia P.
AU - Carlos M. B. Oliveira F.
PY - 2015
SP - 313
EP - 320
DO - 10.5220/0005376203130320