Sharding by Hash Partitioning

A Database Scalability Pattern to Achieve Evenly Sharded Database Clusters

Caio H. Costa, Jo

ao Vianney B. M. Filho, Paulo Henrique M. Maia

and Francisco Carlos M. B. Oliveira

Universidade Estadual do Ceara, Fortaleza, Brazil

Keywords:

Database Sharding, Hash Partitioning, Pattern, Scalability.

Abstract:

With the beginning of the 21st century, web applications requirements dramatically increased in scale. Ap-

plications like social networks, ecommerce, and media sharing, started to generate lots of data trafﬁc, and

companies started to track this valuable data. The database systems responsible for storing all this information

had to scale in order to handle the huge load. With the emergence of cloud computing, scaling out a database

system has became an affordable solution, making data sharding a viable scalability option. But to beneﬁt

from data sharding, database designers have to identify the best manner to distribute data among the nodes of

shared cluster. This paper discusses database sharding distribution models, speciﬁcally a technique known as

hash partitioning. Our objective is to catalog in the format of a Database Scalability Pattern the best practice

that consists in sharding the data among the nodes of a database cluster using the hash partitioning technique

to nicely balance the load between the database servers. This way, we intend to make the mapping between the

scenario and its solution publicly available, helping developers to identify when to adopt the pattern instead of

other sharding techniques.

1 INTRODUCTION

With the beginning of the 21st century, web appli-

cations requirements dramatically increased in scale.

Web 2.0 technologies made web applications much

more attractive due to improvements on their interac-

tivity. A whole new set of applications has emerged:

social networks, media sharing applications and on-

line ofﬁce suites, for example. Those new applica-

tions attracted a huge number of users, many of which

have migrated from local to online applications. Con-

currently, many companies developed SOA-based ap-

plications making easier the integration between dif-

ferent systems and increasing the reuse of functionali-

ties through services. With this scenario, new integra-

tion functionalities have been developed and applica-

tions have began to exchange lots of data.

The amount of data to be persisted and managed

grew in the same proportion as the data trafﬁc of this

new environment grew. Nowadays, large e-commerce

sites have to manage data from thousands of concur-

rent active sessions. Social networks record the ac-

tivities of their members for later analysis by recom-

mending systems. Online applications store the pref-

erences of millions of users. Coping with the increase

in data and trafﬁc required more computing resources.

Consequently, the databases responsible for storing

all that information had to scale in order to handle the

huge load without impairing services and applications

performance.

Database systems can scale up or scale out. Scal-

ing up implies bigger machines, more processors,

disk storage, and memory. Scaling up a database

server is not always possible due to the expensive

costs to acquire all those resources and to physical

and practical limitations. The alternative is scaling

out the database system, which consists of grouping

several smaller machines in a cluster (Sadalage and

Fowler, 2013). To increase the cluster capacity, com-

modity hardware is added on demand. The relatively

new but quickly widespread cloud computing tech-

nology has turned more affordable the necessary in-

frastructure for scaling out systems.

Once the database has the necessary infrastructure

for running on a large cluster, a distribution model

has to be adopted based on the application require-

ments. An application may require read scalability,

write scalability, or both. Basically, there are two dis-

tribution models: replication and sharding. The for-

mer takes the same data and copies it into multiple

313

H. Costa C., Vianney B. M. Filho J., Henrique M. Maia P. and Carlos M. B. Oliveira F..

Sharding by Hash Partitioning - A Database Scalability Pattern to Achieve Evenly Sharded Database Clusters.

DOI: 10.5220/0005376203130320

In Proceedings of the 17th International Conference on Enterprise Information Systems (ICEIS-2015), pages 313-320

ISBN: 978-989-758-096-3

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: The four partitioning strategies: (a) round-robin, (b) range, (c) list, and (d) hash (DeWitt and Gray, 1992).

nodes, while the latter puts different data on different

nodes (Sadalage and Fowler, 2013). Both techniques

are orthogonal and can be used together.

This paper discusses database sharding distribu-

tion models, speciﬁcally a technique known as hash

partitioning. The goal of this work is to catalog in

the format of a Database Scalability Pattern the best

practice that consists in sharding the data among the

nodes of a database cluster using the hash partition-

ing technique to nicely balance the load among the

database servers. The pattern avoids the creation of

hot spots, which means data ranges that concentrate

the read and write operations in one node or in a

group of nodes. This sharding technique is embedded

in some NoSQL databases like AmazonDB (DeCan-

dia et al., 2007) and MongoDB (Boicea et al., 2012),

and in ORM frameworks for relational databases, e.g.,

EclipseLink (Kalotra and Kaur, 2014) and Hibernate

Shards (Wu and Yin, 2010). Our main contribution

is making publicly available the mapping between the

scenario and its solution, helping developers to iden-

tify when to adopt hash partitioning in sharded clus-

ters instead of other sharding techniques.

The remainder of the paper is structure as follows.

Section 2 presents the background and Section 3 dis-

cusses the main related work. Section 4 describes the

pattern using the format presented by Hohpe and B.

Woolf (Hohpe and B.Woolf, 2003) which is used to

name the subsections of the pattern. Finally, Section

5 brings the conclusion and future work.

2 BACKGROUND

As data trafﬁc and data volume increase, it becomes

more difﬁcult and expensive to scale up a database

server. A more appealing option is to scale out, that

is, run the database on a cluster of servers where

several nodes can handle the requests. With a dis-

tributed database architecture, the load is distributed

among the servers that constitute the cluster, thus in-

creasing the performance and availability of the sys-

tem. There are two architectures for scaling out a

database system: distributed database systems and

parallel database systems. Both approaches are used

in high performance computing, where there is a need

for multiprocessor architecture to cope with a high

volume of data and trafﬁc.

The work done by Elmasri and Navathe (El-

masri and Navathe, 2011) deﬁnes distributed database

systems as a collection of multiple logically inter-

related databases distributed over a computer net-

work, and a distributed database management sys-

tem as a software system that manages a distributed

database while making the distribution transparent to

the user. In this architecture, there are no shared

hardware resources and the nodes can have differ-

ent hardware conﬁgurations. Parallel database man-

agement systems link multiple smaller machines to

achieve the same throughput as a single, larger ma-

chine, often with greater scalability and reliability

than a single-processor database system (Connolly

and Begg, 2005). However, multiple processors share

either memory (disk storage and primary memory) or

only disk storage (in this case, each processor has its

own primary memory).

Many database vendors offer scalability solutions

based on the shared disk approach, like Oracle RAC

(Abramson et al., 2009) does. These kind of solu-

tions link several servers through a high speed net-

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

314

work, but they still have a limiting device. They share

a storage device which acts like a system bottleneck.

In the nothing shared parallel architecture, processors

communicate through a high speed network and each

of them has its own primary and secondary memory.

That type of architecture requires node symmetry and

homogeneity, but in pure distributed database, system

homogeneity is not required. To achieve real hori-

zontal scalability, a pure distributed database system

architecture must be adopted.

2.1 Distribution Models

Once the hardware resources, server nodes, for de-

ploying a distributed database are available, a distri-

bution model should be chosen to leverage the cluster

capacity. Roughly, there are two paths to data distri-

bution: replication and sharding.

2.1.1 Replication

Replication can be performed in two ways: master-

slave and peer-to-peer. In a master-slave scheme, one

node is responsible for processing any updates to the

data and a background process is responsible for syn-

chronizing the data across the other nodes, the slaves.

That kind of replication is recommended for intensive

data read applications. To increase the cluster capac-

ity of processing read requests, more slaves can be

added. However, the write throughput is limited by

the capacity of processing write requests of the mas-

ter node. In peer-to-peer replication, there is no mas-

ter node. All the replicas can accept write requests,

thus improving the system capacity of handling write

requests. On the other hand, peer-to-peer replication

clusters have to deal with inconsistency problems that

may arise.

2.1.2 Sharding

Sharding is a way of scaling out a database via hori-

zontal fragmentation. A horizontal fragment of a re-

lation (table) is a subset of the tuples in that relation.

The tuples that belong to the horizontal fragment are

speciﬁed by a condition on one or more attributes of

the relation (Elmasri and Navathe, 2011). Often, only

a single attribute is involved. That is, the horizontal

scalability is supported by putting different parts of

the data onto different servers of a cluster. The objec-

tive is making different clients talk to different server

nodes. Consequently, the load is balanced out nicely

among the servers. Sharding is particularly valuable

for performance since it can improve both read and

write performance. Replication can greatly improve

read performance but does little for applications that

have several write operations. Sharding provides a

way to horizontally scale those operations.

2.1.3 Partitioning Strategies

There are four different strategies for partitioning data

across a cluster: round-robin, range, list, and hash

partioning (DeWitt and Gray, 1992). The simplest

partitioning strategy is the round-robin, which dis-

tributes the rows of a table among the nodes in a

round-robin fashion (Figure 1a).

For the range, list, and hash strategies, an attribute,

known as partitioning key, must be chosen among the

table attributes. The partition of the table rows will

be based on the value of the partitioning key. In the

range strategy, a given range of values is assigned to

a partition. The data is distributed among the nodes

in such a way that each partition contains rows for

which the partitioning key value lies within its range

(Figure 1b).

The list strategy is similar to the range strategy. In

the former each partition has a list of values assigned

one by one. A partition is selected to keep a row if

the partitioning key value is equal to one of the val-

ues deﬁned in the list (Figure 1c). In the latter, the

mapping between the partitioning key values and its

nodes is based on the result of a hash function. The

partitioning key value is used as parameter of the hash

function and the result determines where the data will

be placed (Figure 1d).

3 RELATED WORK

A catalog of software design patterns was ﬁrst de-

scribed by Gamma et al. (Gamma et al., 1994), a

pioneer work in the computer science ﬁeld. Subse-

quently, Fowler et al. (Fowler et al., 2002) and Hohpe

and B. Woolf (Hohpe and B.Woolf, 2003) identiﬁed

and published software engineering architectural pat-

terns.

Shumacher et al. (Shumacher et al., 2006) present,

in the format of patterns, a set of best practices to

turn applications more secure at different levels. Haﬁz

(Haﬁz, 2006) describes four design patterns applica-

ble to the design of anonymity systems and can be

used to secure the privacy of sensitive data. Shu-

macher (Shumacher, 2003) introduces an approach

for mining security patterns from security standards

and presents two patterns for anonymity and privacy.

Strauch et al. (Strauch et al., 2012) describe four pat-

terns that address data conﬁdentiality in the cloud.

The authors in (Eessaar, 2008) propose a pattern-

based database design and implementation approach

ShardingbyHashPartitioning-ADatabaseScalabilityPatterntoAchieveEvenlyShardedDatabaseClusters

315

Table 1: Common partitioning hash keys and their efﬁciency.

Hash Key Efﬁciency

User id in applications with many users. Good

Status code in applications with few possible status codes. Bad

Tracked device id that stores data at relatively similar intervals. Good

Tracked device id, where one is by far more popular than all the others. Bad

that promotes the use of patterns as the basis of code

generation. In addition, they present a software sys-

tem that performs code generation based on database

design patterns. However, their work does not list

the database design patterns that the proposed tool is

based on and does not mention design patterns related

to distributed database systems.

The paper (Fehling et al., 2011) proposes a

pattern-based approach to reduce the complexity of

cloud application architectures. The pattern language

developed aims to guide the developers during the

identiﬁcation of cloud environments and architecture

patterns applicable to their problems. Fehling et al.

(Fehling et al., 2011) also gives an overview of previ-

ously discovered patterns. In that list, there are six

patterns related to cloud data storage, but none of

them is related to data sharding techniques.

Pallman (Pallmann, 2011) presents a collection

of patterns implemented by the Microsoft Azure

Cloud Platform and describes ﬁve data storage pat-

terns which can be used to store record-oriented data.

In (Go, 2014), the Partition Key pattern describes

Azure’s data distribution at a high level, without spec-

ifying the strategy used to actually partition the data.

The pattern described in our paper is focused in the

strategy used to distribute data and is not oriented to

any vendor’s platform.

The white paper (Adler, 2011) suggests a refer-

ence architecture and best practices to launch scal-

able applications in the cloud. The suggested best

practices, which are not organized as patterns, were

derived from the wealth of knowledge collected from

many different industry use cases. Although Adler

(Adler, 2011) discusses database scalability, it only

addresses master-slave replication.

Stonebraker and Cattel (Stonebraker and Cattell,

2011) present ten rules to achieve scalability in datas-

tores that handle simple operations are listed. During

the discussion of the rules, the authors presents im-

portant drawbacks about sharding data in speciﬁc sit-

uations. The observations made by Stonebraker and

Cattel (Stonebraker and Cattell, 2011) were impor-

tant for the construction of the pattern presented in

this work.

In addition to the four conﬁdentiality patterns,

Strauch et al. (Strauch et al., 2012) also present

two patterns related to horizontal scalability of the

data access layer of an application: Local Database

Proxy and Local Sharding-Based Router. The Lo-

cal Sharding-Based Router pattern suggests the ex-

tension of the data access layer with the addition of a

router responsible for distributing the read and write

requests among the database servers. Each database

server in the cluster holds a portion of the data that

was partitioned using some data sharding technique.

Nonetheless, that pattern does not suggest any partic-

ular data sharding strategy. It can be used as a com-

plement to our proposed pattern in order to implement

a read and write strategy after the deployment of the

pattern described here.

4 SHARDING BY HASH

PARTITIONING

The goal of the database scalability pattern entitled

Sharding by Hash Partitioning is to distribute, as

evenly as possible, a functional group among the

nodes of a cluster using the sharding distribution

method. The desired result is an homogeneous distri-

bution of the load generated by client requests among

the databases servers.

4.1 Context

Scaling out techniques are used to increase the per-

formance of database systems. The master/slave ar-

chitecture provides read scalability but does not help

much for write intensive applications. To obtain gen-

eral read and write scalability, the sharding technique

should be used. But nothing shared systems scale

only if data objects are partitioned across the system’s

nodes in a manner that balances the load (Stonebraker

and Cattell, 2011).

Clusters that use the sharding method to partition

the data, distribute the data portions among several

database servers to balance queries and update re-

quests nicely. On the other hand, a bad distribution

can originate two issues: concentration spots that hold

the most requested registries, and servers that concen-

trate the majority of update requests. These points of

concentration, known as hot spots (Stonebraker and

Cattell, 2011), can be a data set kept in only one server

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

316

Figure 2: The transaction log table partitioned by range.

or data sets distributed between a few servers.

For instance, suppose a table containing a date

ﬁeld is chronologically distributed between three

database servers: A, B and C. Server A is responsible

for storing the oldest data, server B is responsible for

storing the intermediate data, and server C is respon-

sible for storing the most recent data. If the client ap-

plication keeps generating new data, the server C will

receive the majority of the update requests. In addi-

tion, there is a great chance of server C receiving the

majority of the queries because, generally, application

users are more interested in recent information. In this

scenario, the attempt to distribute the load across the

three database servers by sharding the data using the

range sharding strategy will fail.

4.2 Challenge

How to distribute data avoiding the creation of hot

spots to obtain a sharded cluster that nicely balance

the requests load across its nodes?

4.3 Forces

When using sharding, partitioning the data among the

nodes to avoid hot spots is not a trivial task. The

appropriate partitioning strategy must be chosen de-

pending on the application data access pattern. Most

of the times, choosing the wrong sharding strategy

will decrease the system’s performance.

Excess communication can happen in those

database systems where data placement was not care-

fully chosen by the database designer (Zilio et al.,

1994). When the round-robin sharding strategy is

used, data is distributed without considering its at-

tributes. Registries that are commonly aggregated to

compose a query result may be scattered across the

nodes generating data trafﬁc overhead.

It is natural to think of distributing the data based

on natural partitioning keys like date ﬁelds. There-

fore, the obvious choice would be the range partition-

ing strategy. However, applications in which the users

are interested only in recent information, range par-

titioning can generate hot spots like in the example

shown at the end of Section 4.1.

In addition to balance the load among the database

servers in the cluster nicely, a partitioning strategy

should generate no signiﬁcant overhead when discov-

ering in which node a registry must reside.

4.4 Solution

The hash partitioning can be used to achieve an even

distribution of the requests load among the nodes of

a distributed database system. A ﬁeld whose values

are particular to a group of registries must be chosen

as the partitioning key. The choice on the partitioning

should take into account the fact that registries that

are commonly accessed together will share the same

value for the partitioning key.

Queries interested in data that shares the same par-

titioning key value will hit the same node. Conse-

quently, when two or more requests ask for data with

different values for the partitioning key, they will be

directed to different nodes. The same happens when

saving new data in the sharded cluster. A hash func-

tion is applied to the value of the partition key in the

new registry. The result will determine in which node

the data will be stored. Hence, if two new registries

present different values for their partitioning key, they

will be stored in different nodes of the cluster.

Once the partitioning key ﬁeld is chosen, an index

can be created based on another ﬁeld to order the reg-

istries with the same partitioning key value. With the

existence of that secondary index, results from queries

that search for registries with the same value for the

partitioning key can be ordered.

4.5 Results

Data placement is very important in nothing shared

parallel database systems. It has to ease the par-

allelism and minimize the communication overhead

(Zilio et al., 1994). Systems that aggregate the nec-

essary information in only one registry are best suited

ShardingbyHashPartitioning-ADatabaseScalabilityPatterntoAchieveEvenlyShardedDatabaseClusters

317

Figure 3: The transaction log table partitioned by hash key.

for that approach since it will minimize the network

trafﬁc. Thus, it is important to avoid multi-shard

operations to the greatest extent possible, including

queries that go to multiple shards, as well as multi-

shard updates requiring ACID properties (Stone-

braker and Cattell, 2011). Therefore, sharding data

among clusters nodes is best suited for applications

that does not require joins between database tables.

If the hash partitioning needs to be completely im-

plemented in the data access layer of an application,

an algorithm and a hash function need to be chosen

to realize the hash partitioning method. For instance,

the MD5 (Rivest, 1992) can be the hash function im-

plementation and the Consistent Hashing algorithm

(Karger et al., 1997) can be used to distribute and lo-

cate data across the cluster. Tools that implement the

pattern, like some NoSQL datastores that originally

support sharding and some persistence frameworks,

already have an implemented algorithm.

Even if the datastore or persistence framework be-

ing used already implement the pattern, choosing the

right partitioning hash key is a task of the database

designer. Choosing the right partitioning key is very

important when using the hash partitioning method.

The database designer must choose the right partition-

ing key to keep the workload evenly balanced across

partitions. For instance, if a table has a very small

number of heavily accessed registries, even a single

one, request trafﬁc is concentrated on a small number

of partitions. To obtain a nicely balanced workload,

the partitioning hash key must have a large number of

distinct values, which are requested fairly uniformly,

as randomly as possible. Table 1 shows common par-

titioning hash keys and their efﬁciency.

4.6 Next

Cloud datastores provide the necessary infrastructure

to scale at lower costs and using a simpliﬁed man-

agement. However, the majority of cloud relational

databases does not implement the Sharding by Hash

Partitioning pattern. Indeed, they do not support

sharding natively. In such cases, the datastore func-

tionalities can be extended by implementing the pat-

tern in the application data access layer. Strauch et

al. (Strauch et al., 2012) describe the pattern Lo-

cal Sharding-Based Router which complements the

pattern described in this paper. The pattern Local

Sharding-Based Router proposes an extra layer, de-

ployed in the cloud, responsible for implementing

sharding in cloud datastores that do not support shard-

ing natively. The Local Sharding-Based Router pat-

tern does not suggest any sharding strategy.

4.7 Sidebars

There are NoSQL datastores, like MongoDB (Liu

et al., 2012) and DynamoDB (DeCandia et al., 2007),

that implement the Sharding by Hash Partitioning pat-

tern. Those datastores can be used if an application

access pattern does not require joins between differ-

ent tables, and requires the scalability offered by data

sharding and the workload balance offered by hash

partitioning. That is, if the application will beneﬁt

from the Sharding by Hash Partitioning pattern and

does not need the relation constraints offered by re-

lational databases, the mentioned datastores can be

used. The paper (DeCandia et al., 2007) describes

the hash partitioning algorithm used by DynamoDB.

It is a variation of the Consistent Hashing algorithm

(Karger et al., 1997) and can be used as a reference

for implementing the Sharding by Hash Partitioning

pattern during a framework development or in an ap-

plication data access layer.

There are few relational databases which support

sharding in a real nothing shared architecture. The

ones that do support, generally, are more complex

to maintain. If an application needs to shard data

by hash partitioning, but the relational database used

does not provide this feature, a framework that imple-

ments the Sharding by Hash Partitioning pattern can

be used. EclipseLink implements a data sharding by

hash key policy and can provide this feature to rela-

tional databases that do not support sharding. Alter-

natively, it can simply be used if developers do not

know how to conﬁgure databases that support shard-

ing.

In on-premise applications that stores data in the

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

318

cloud, if the Sharding by Hash Partitioning pattern has

to be implemented in the data access layer, this must

be done at the cloud side to avoid network latency

if more than one node needs to be queried (Strauch

et al., 2012).

4.8 Example

To improve the understanding of the Sharding by

Hash Partitioning pattern, a system that logs each

transaction realized by customers of a bank will be

used as an example. A single table stores all the trans-

action log registries. Each registry has a ﬁeld that de-

scribes any common bank transaction that can be per-

formed by a client of the bank, such as withdrawal,

deposit, or transfer. As expected, the table has a ﬁeld

that holds the transaction date.

Over time the table becomes very large. The IT

staff decides to shard the table data across nodes of a

cluster to improve performance and obtain scalability.

The staff creates a cluster composed of three database

servers. In the ﬁrst attempt, the table is chronologi-

cally partitioned, that is, a range partitioning based on

the transaction date is conﬁgured. Server A stores the

oldest transactions, and server C stores the more re-

cent transactions (Figure 2). This partitioning scheme

generates hot spots. All new transaction log registries

are stored in server C and most of the bank customers

consult their latest transactions, which are also stored

in server C.

In this case the use of hash key partitioning is rec-

ommended. The bank IT staff decides to shard the

transaction logs table using the customer ID as the

partitioning key. Furthermore, they create an index

based on the transaction date ﬁeld. Now, the result

of a hash function applied to the customer ID deter-

mines where the transaction registry will be stored

(Figure 3). Due to the large amount of customers,

probabilistically, the data is more evenly partitioned.

When customers consult their latest transactions, the

requests will be distributed across the nodes of the

cluster. The index based on the transaction date will

keep the transaction log registries ordered within a

customer query result.

5 CONCLUSIONS

The data sharding based on hash key partitioning,

identiﬁed and formalized as a database scalability pat-

tern in this work, efﬁciently provides read and write

scalability improving the performance of a database

cluster. Data sharding by hash key partitioning, how-

ever, does not solve all database scalability problems.

Therefore, it is not recommended to all scenarios. The

formal description of the solution as a pattern helps

in the task of mapping the data sharding by hash key

partitioning to its recommended scenario.

As future work we intend to continue formaliz-

ing database horizontal scalability solutions as pat-

terns, so we can produce a catalog containing a list of

database scalability patterns that aims to solve scala-

bility problems.

REFERENCES

Abramson, I., Abbey, M., Corey, M. J., and Malcher, M.

(2009). Oracle Database 11g. A Beginner’s Guide.

Oracle Press.

Adler, B. (2011). Building scalable applications in the

cloud. reference architecture and best practices.

Boicea, A., Radulescu, F., and Agapin, L. I. (2012). Mon-

godb vs oracle - database comparison. In Proceedings

of the 2012 Third International Conference on Emerg-

ing Intelligent Data and Web Technologies, pages

330–335. IEEE.

Connolly, T. M. and Begg, C. E. (2005). DATABASE SYS-

TEMS. A Practical Approach to Design, Implementa-

tion, and Management. Addison-Wesley, 4th edition.

DeCandia, G., Hastorun, D., Jampani, M., Kakulapati,

G., Lakshman, A., Pilchin, A., Sivasubramanian, S.,

Vosshall, P., and Vogels, W. (2007). Dynamo: ama-

zon’s highly available key-value store. In Proceedings

of Twenty-ﬁrst ACM SIGOPS Symposium on Operat-

ing Systems Principles, SOSP ’07, pages 205–220,

New York, NY, USA. ACM.

DeWitt, D. and Gray, J. (1992). Parallel database sys-

tems: The future of high performance database sys-

tems. Commun. ACM, 35(6):85–98.

Eessaar, E. (2008). On pattern-based database design and

implementation. In Proceedings of the 2008 Interna-

tional Conference on Software Engineering Research,

Management and Applications, pages 235–242. IEEE.

Elmasri, R. and Navathe, S. B. (2011). Fundamentals of

Database Systems. Addison-Wesley, 6th edition.

Fehling, C., Leymann, F., Retter, R., Schumm, D., and

Schupeck, W. (2011). An architectural pattern lan-

guage of cloud-based applications. In Proceedings

of the 18th Conference on Pattern Languages of Pro-

grams, number 2 in PLoP ’11, pages 1–11, New York,

NY, USA. ACM.

Fowler, M., Rice, D., Foemmel, M., Hieatt, E., Mee, R.,

and Stafford, R. (2002). Patterns of Enterprise Appli-

cation Architecture. Addison-Wesley.

Gamma, E., Helm, R., Johnson, R., and Vlissides, J.

(1994). Design Patterns. Elements of Reusable

Object-Oriented Software. Addison-Wesley, 1st edi-

tion.

Go, J. (2014). Designing a scalable partitioning strategy

for azure table storage. http://msdn.microsoft.com/en-

us/library/azure/hh508997.aspx.

ShardingbyHashPartitioning-ADatabaseScalabilityPatterntoAchieveEvenlyShardedDatabaseClusters

319

Haﬁz, M. (2006). A collection of privacy design patterns. In

Proceedings of the 2006 Conference on Pattern Lan-

guages of Programs, number 7 in PLoP ’06, pages 1–

7, New York, NY, USA. ACM.

Hohpe, G. and B.Woolf (2003). Enterprise Integration Pat-

terns: Designing, Building, and Deploying Messaging

Solutions. Addison-Wesley, 1st edition.

Kalotra, M. and Kaur, K. (2014). Performance analysis of

reusable software systems. In 2014 5th International

Conference on Conﬂuence The Next Generation Infor-

matino Technology Summit, pages 773–778. IEEE.

Karger, D., Lehman, E., Leighton, T., Panigrahy, R.,

Levine, M., and Lewin, D. (1997). Consistent hash-

ing and random trees: Distributed caching protocols

for relieving hot spots on the world wide web. In Pro-

ceedings of the Twenty-ninth Annual ACM Symposium

on Theory of Computing, STOC ’97, pages 654–663,

New York, NY, USA. ACM.

Liu, Y., Wang, Y., and Jin, Y. (2012). Research on the im-

provement of mongodb auto- sharding in cloud envi-

ronment. In Proceedings of the 7th International Con-

ference on Computer Science and Education, pages

851–854. IEEE.

Pallmann, D. (2011). Windows azure design patters.

http://neudesic.blob.core.windows.net/webpatterns/

index.html.

Rivest, R. (1992). The md5 message-digest algorithm.

IETF RFC 1321.

Sadalage, P. J. and Fowler, M. (2013). NoSQL Distilled. A

Brief Guide to the Emerging World of Polyglot Persis-

tence. Addison-Wesley, 1st edition.

Shumacher, M. (2003). Security patterns and security stan-

dards - with selected security patterns for anonymity

and privacy. In European Conference on Pattern Lan-

guages of Programs.

Shumacher, M., Fernandez-Buglioni, E., Hybertson, D.,

Buschmann, F., and Sommerlad, P. (2006). Security

Patterns: Integrating Security and Systems Engineer-

ing. Wiley.

Stonebraker, M. and Cattell, R. (2011). 10 rules for scalable

performance in ’simple operation’ datastores. Com-

munications of the ACM, 54(6):72–80.

Strauch, S., Andrikopoulos, V., Breitenbuecher, U., Kopp,

O., and Leymann, F. (2012). Non-functional data layer

patterns for cloud applications. In 2012 IEEE 4th In-

ternational Conference on Cloud Computing Technol-

ogy and Science, pages 601–605. IEEE.

Wu, P. and Yin, K. (2010). Application research on a per-

sistent technique based on hibernate. In International

Conference on Computer Design and Applications,

volume 1, pages 629–631. IEEE.

Zilio, D. C., Jhingran, A., and Padmanabhan, S. (1994).

Partitioning key selection for a shared-nothing paral-

lel database system. Technical report, IBM Research

Division, Yorktown Heights, NY.

ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems

320