cloud, if the Sharding by Hash Partitioning pattern has
to be implemented in the data access layer, this must
be done at the cloud side to avoid network latency
if more than one node needs to be queried (Strauch
et al., 2012).
4.8 Example
To improve the understanding of the Sharding by
Hash Partitioning pattern, a system that logs each
transaction realized by customers of a bank will be
used as an example. A single table stores all the trans-
action log registries. Each registry has a field that de-
scribes any common bank transaction that can be per-
formed by a client of the bank, such as withdrawal,
deposit, or transfer. As expected, the table has a field
that holds the transaction date.
Over time the table becomes very large. The IT
staff decides to shard the table data across nodes of a
cluster to improve performance and obtain scalability.
The staff creates a cluster composed of three database
servers. In the first attempt, the table is chronologi-
cally partitioned, that is, a range partitioning based on
the transaction date is configured. Server A stores the
oldest transactions, and server C stores the more re-
cent transactions (Figure 2). This partitioning scheme
generates hot spots. All new transaction log registries
are stored in server C and most of the bank customers
consult their latest transactions, which are also stored
in server C.
In this case the use of hash key partitioning is rec-
ommended. The bank IT staff decides to shard the
transaction logs table using the customer ID as the
partitioning key. Furthermore, they create an index
based on the transaction date field. Now, the result
of a hash function applied to the customer ID deter-
mines where the transaction registry will be stored
(Figure 3). Due to the large amount of customers,
probabilistically, the data is more evenly partitioned.
When customers consult their latest transactions, the
requests will be distributed across the nodes of the
cluster. The index based on the transaction date will
keep the transaction log registries ordered within a
customer query result.
5 CONCLUSIONS
The data sharding based on hash key partitioning,
identified and formalized as a database scalability pat-
tern in this work, efficiently provides read and write
scalability improving the performance of a database
cluster. Data sharding by hash key partitioning, how-
ever, does not solve all database scalability problems.
Therefore, it is not recommended to all scenarios. The
formal description of the solution as a pattern helps
in the task of mapping the data sharding by hash key
partitioning to its recommended scenario.
As future work we intend to continue formaliz-
ing database horizontal scalability solutions as pat-
terns, so we can produce a catalog containing a list of
database scalability patterns that aims to solve scala-
bility problems.
REFERENCES
Abramson, I., Abbey, M., Corey, M. J., and Malcher, M.
(2009). Oracle Database 11g. A Beginner’s Guide.
Oracle Press.
Adler, B. (2011). Building scalable applications in the
cloud. reference architecture and best practices.
Boicea, A., Radulescu, F., and Agapin, L. I. (2012). Mon-
godb vs oracle - database comparison. In Proceedings
of the 2012 Third International Conference on Emerg-
ing Intelligent Data and Web Technologies, pages
330–335. IEEE.
Connolly, T. M. and Begg, C. E. (2005). DATABASE SYS-
TEMS. A Practical Approach to Design, Implementa-
tion, and Management. Addison-Wesley, 4th edition.
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati,
G., Lakshman, A., Pilchin, A., Sivasubramanian, S.,
Vosshall, P., and Vogels, W. (2007). Dynamo: ama-
zon’s highly available key-value store. In Proceedings
of Twenty-first ACM SIGOPS Symposium on Operat-
ing Systems Principles, SOSP ’07, pages 205–220,
New York, NY, USA. ACM.
DeWitt, D. and Gray, J. (1992). Parallel database sys-
tems: The future of high performance database sys-
tems. Commun. ACM, 35(6):85–98.
Eessaar, E. (2008). On pattern-based database design and
implementation. In Proceedings of the 2008 Interna-
tional Conference on Software Engineering Research,
Management and Applications, pages 235–242. IEEE.
Elmasri, R. and Navathe, S. B. (2011). Fundamentals of
Database Systems. Addison-Wesley, 6th edition.
Fehling, C., Leymann, F., Retter, R., Schumm, D., and
Schupeck, W. (2011). An architectural pattern lan-
guage of cloud-based applications. In Proceedings
of the 18th Conference on Pattern Languages of Pro-
grams, number 2 in PLoP ’11, pages 1–11, New York,
NY, USA. ACM.
Fowler, M., Rice, D., Foemmel, M., Hieatt, E., Mee, R.,
and Stafford, R. (2002). Patterns of Enterprise Appli-
cation Architecture. Addison-Wesley.
Gamma, E., Helm, R., Johnson, R., and Vlissides, J.
(1994). Design Patterns. Elements of Reusable
Object-Oriented Software. Addison-Wesley, 1st edi-
tion.
Go, J. (2014). Designing a scalable partitioning strategy
for azure table storage. http://msdn.microsoft.com/en-
us/library/azure/hh508997.aspx.
ShardingbyHashPartitioning-ADatabaseScalabilityPatterntoAchieveEvenlyShardedDatabaseClusters
319