On Datastore Support for Agile Software Development

Jie Liu

Computer Science Division, Western Oregon University, Monmouth, Oregon, 97361, U.S.A.

Keywords: Software Development, Agile Software Development Process, Databases, Database Management Systems,

MongoDB, and NoSQL.

Abstract: Most database-base applications support two main sets of features: customer facing transactional

capabilities such as purchasing books at an online bookstore and functionality required by managers and

business analysts such as identifying trends in sales data by combing through aggregated sales data. The

conventional approach of having just one main database to support both features greatly restrict developers

freedom in applying the best approach to quickly implement new features, to enhance existing features, or

to mend defects because any attempt in changing database schema means code and test cases modifications

in many places and may even require a large amount of effort in testing. Such an inherited resistance in

changing introduced by data store does not fit the evolutionary development nature of Agile software

development methodology. We argue that we should consider having at least two databases: one support

transactional capabilities and the other support reporting and possible data warehouse needs, and will show

how such an approach supports the Agile software development methodology.

1 INTRODUCTION

With increasingly sophisticated IDEs, programming

languages, knowledge, enough bad experiences in

trying other software development methodologies by

both the developers and management teams, and

strong support of success stories favour Agile

software development process, more and more

software teams are adopting Agile software

development process. In a nutshell, in our view, the

benefits of Agile methodologies are to promote short

development cycles, encourage communications and

cooperation, and accommodate changes in

requirements and scope. Each sprint is short and is

focused on the most important issues at the time so

the team can find out what work and what are the

challenges quickly. Due to short sprint, changes on

requirements are reflected promptly. Scrums or other

frequent short meetings encourage communications

and cooperation among the developers and also

between developers and key stakeholders. The use

of product backlog brings all shareholders'

requirements together at one place so everyone can

see. Finally, the end of sprint demo is an effective

way to show the progress of the development team

to the management team and other shareholders, not

to mention it is also an effective vehicle for the

product owners to interact directly with the

developers and testers and to validate whether the

implementation matches with what is expected. For

projects that require a dozen or so co-located

developers, Agile software development process has

proven to be effective and lead to successful projects

(Turk, 2002).

In just a few years, we have witnessed drastic

change in the acceptances of Agile software

development methodology. Right now, all software

teams in our organization have adopted Agile

software development methodology. The members

of our management team are accustomed to attend

our demos at the end of each sprint. Even the

support and operation teams have daily scrums so

both the team members and managers can have a

good read on the vital signs about the team’s current

state and team members' activities.

We understand and agree that Agile software

development process has its limitations such as unfit

to support large and distributed teams, unfit to

support large and complex software application, and

is claimed to be unfit for developing safety-critical

software (Turk, 2002).. However, we are not here to

discuss the limitations. Instead, we would like to

discuss our experience where the practice of using

one database, especially the use of relational

database management systems, in supporting of

Agile software development often suffers the

183

Liu J..

On Datastore Support for Agile Software Development.

DOI: 10.5220/0005106701830188

In Proceedings of the 9th International Conference on Software Engineering and Applications (ICSOFT-EA-2014), pages 183-188

ISBN: 978-989-758-036-9

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

experience of very much a mismatch. Looking

deeper, we believe that we have been doing this all

wrong in our past projects where we would have just

one database to support vastly different sets of data

processing needs. The result is the requests of

changing due to necessary software enhancement

and resistance of changing due to stability

considerations are constantly engage in this tug of

war where neither side is winning.

In this paper, we argue that (1) many reasonably

sized transactional based applications need at least

two databases: one to support the basic customer

facing business related transactional tasks, and one

to support reporting needs; and (2) to effectively

support Agile software development, the database

handling customer facing tasks does not need to be a

relational database. It is possible that the reporting

database late can be modified to support data mining

and data warehousing needs.

2 THE PROBLEMS

It is inevitable that almost all transactional based

applications need some database support to make

sure persistence data can be stored for late uses by

the same or different components, or even a different

application. In addition, when the information is

needed, it can be quickly located among a large

number of similar data. This persistence data store is

generally accomplished through the use of files

systems in the beginning of computer history and

then switched to database management systems

(Kroenke, 2013). For the past several decades, these

databases have been relational databases (Kroenke,

2013). Because all important information for an

application to operate properly has to be eventually

stored in the database and then retrieved from the

database, the database becomes a vital component of

a software system. Because much of the code, both

in UI and business logic are tightly coupled with the

database schema, changing in schema often breaks

the functioning features.

Looking back to several dozen applications we

have participated in the past 20 some years, we

realized that, although most of these applications

have only one main supporting database, these

databases generally service two very distinct sets of

needs: store transactions and support reporting. We

have attempted to use views and temporary tables to

alleviate the negative effects of one part to the other

and have very limited success.

In the past, being attaching with a large

enterprise, a major milestone in our software

projects has always been to complete the database

design and declare database schema froze. The

project then must activate Change Management

process to contain negative effects on the necessary

modifications to the database schema because

modifications in other components may be absolute

necessary to reflect changes in the database schema

due to the tightly coupling nature among the

business logic and UI database. Also the self

describing characteristic of a database makes some

simple changes, such as stop allowing a column to

be null, be quite dramatic. Worse yet, once a

software system is in production and becomes more

and more popular, its database may need to provide

data for other systems and databases. This is called

the "worst-case scenario" in (Ambler, 2006).

Changing the database schema in these situations

can be a reasonable sized project by itself.

With more and more software teams adopting

the Agile software development methodology, we

are experiencing an even more frequent and urgent

needs for schema changes. For several projects we

have experienced, the schema may need to be

changed several times within a sprint to

accommodate different stories. Many tools,

frameworks, and approaches have been developed to

manage the inevitable changes, and some introduces

new problems.

Ruby on Rails handles these changes by

introducing the ActiveRecord package to

automatically support the database needs. It also

uses database migrations to move data from one

database design to another. However, these auto-

generated databases are good enough to allow an

application to be demoed quickly; however, they are

not well designed enterprise level databases and lack

many of basic database objects such as indexes for

fast data access and to support table join operations,

stored procedures for processing data on the server

vs. moving the data cross the network to be

processed in the code, and views to hide the physical

table structures. When the amount of data becomes

large and number of concurrent users increases,

these problems start to become more obvious.

Another issue with the auto generated databases is

security related. In a database centric design, which

types of users have access to what database objects

need to be well designed and carefully maintained.

In an auto generate database, authentication and

authorization related issues are, in almost all the

cases, afterthought.

The approach suggested in (Martin, 2003) is to

delay the decision of finalizing database designs.

ICSOFT-EA2014-9thInternationalConferenceonSoftwareEngineeringandApplications

184

When discussing the implementation of a use case,

the author stated

"Again, our predisposition to database may

tempt us into thinking about record layouts

or the field structure in a relational

database table, but we should resist these

urges."

Our field experience tells us that once we start

coding, we likely need to store the data and pass the

data to other components under development by

other team members. Not storing the data, therefore

not defining the data format or database schema, is

not really an option.

A widely accepted approach is called "Database

refactoring" (Ambler, 2003 and Ambler, 2006). It

was first introduced in 2002 by S. W. Ambler based

on the concept of refactoring for code. "Database

refactoring" is defined as a simple change to a

database schema that improves its design while

retaining both its behavioural and informational

semantics (Ambler, 2003). Code refactoring is very

effective in enabling developers to evolve their code

slowly over time and to improve their design,

making it easier to understand and to modify.

However, code refactoring and database refactoring

are very difference in the effect of refactoring. The

effect of code refactoring is mostly local. That is in

the most situations, a refactoring activity retains the

same semantics of modified code segment, at least

from a black box point of view. We all know that an

Object Oriented programming language like Java

and C# can easily contain the implementation details

of a method to the just that method. Database

refactoring, on the other hand, is not simple because

database schemas may be tightly coupled with code

in applications it is supporting, other databases

interacting with it, and code segments to support

data migration and testing, just to name a few. The

effect of some minor database refactoring can be

huge and cause many unexpected situations. So,

database refactoring should never be performed on a

production database without thorough testing. In

addition, working with experienced DBAs may be

necessary to plan and perform database refactoring

successfully.

There are other tools such as Liquibase that

helps a project to manage database changes,

especially for projects using Agile software

development methodology. Still, it re-enforce the

idea that changes are the necessary evil and need to

be managed.

Generally an enterprise level customer facing

application, such as Amazon.Com, needs to support

three group of users: administrators and supporting

staff, customers and CRM staff, and business

analysts and management team. The operations

performed by these three groups of users are very

different and have very little overlap.

The administrators and supporting staff mainly

need to monitor the status of an application and its

environment such as CPU utilization, memory

usage, storage performance, and network traffic. A

subgroup of supporting staff may need to update the

contents of a database such as adding new products

or changing the quantities or prices of a number of

products. Customers and CRM staff, on each access,

need to be able to access and possible update a

relatively small number of records quickly.

Customers' actions may trigger changing of the state

of the database. For example, a customer may

purchase an item, which triggers the shipment

process to start the steps of sending the item.

Business analysts and management team perform

very different sets of tasks that request very different

support from the DBMS and data store. For

example, business analysts may need to generate

quarterly reports on sales figure on certain products.

Such a report may need to access a large number of

records from a number of tables, but business

analysts generally do not need to change the state of

a database.

Despite the fact where users of different group

need very different support of data store needs, our

past projects and many project across the world tend

to use a single database, therefore a single database

management system (DBMS) to fulfil all the

database needs. This one-size-fit-all approach

greatly restricts the software development progress

of the project by disallowing or making it every

expensive to change database schema because any

change over database schema may results in rework

in many parts of the application. Such changes are

among main reasons software projects are late and

over budget (Liu, 2009). Even with Agile software

development, changing on database schema may

have the domino effect of triggering re-work on

many features and test cases that are considered as

already completed.

Here is our dilemma, on one hand, to support

rapid changes in requirements of software

development, using or not using the Agile

methodology, we have to allow frequent changes on

the database schema; on the other hand, changes on

database schema, such as a simple one as

disallowing of nulls on a column may introduce

many new defects when using a RDBMS such as

MySQL or SQL Server 2012. We believe one

OnDatastoreSupportforAgileSoftwareDevelopment

185

possible solution is to have two or more databases

and with the one that supports transactional

operations being not necessary relational.

3 THE SOLUTION

The truth is that for many enterprise level

applications, the database supports are needed for

two types of users: customers and business analysis

and managers. Customers mostly utilize the

transactional features of a DBMS at single ore a very

selected few records level and hardly ever need to

view aggregated results, while analysis and

managers almost do the opposite -- they almost

always view the database contents in aggregated

views and hardly ever need to change the state of the

database. After recognizing this, we cannot help but

to ask why do we need to have one database rather

than two or more? One explanation of this one-size-

fit-all approach was historical. In the past, resources

(hardware, software, and staffing) for a database

were expensive. The situation is very different now.

The workstations developers use to write code are

sufficient to support most DBMS for development.

Figure 1: Applications have two supporting databases.

With open source, functioning DBMSs are

readily available and affordable. With abundant

tools, many with intuitive GUI, we may not need a

DBA to create and maintain a database for us. We

do not see any reason not to use two or more

databases to support distinct sets of functionalities as

shown in Figure 1. We call the database supporting

customer facing activities the Transactional

Database, while the other database the Reporting

Database.

Considering the case of Amazon.com, a

customer generally will view information in her

account, search for items, complete an order, and to

view comments from other customers regarding an

item. Comparing to the amount of date Amazon.com

stores, none of the activities require more than tiny

amount of data. On the other, a business analysis

many want to generate a report, say, to find out the

top 10 most popular items in the last 10 days. That

query may need to pull a lot of records and can be

resource intensive. In the case of having one

database, generating such a report may even slow

down customer facing activities. On the other hand,

a business analyst does not have much of needs to

inspect individual records.

In the case the reporting database is used to

perform Business Intelligence (BI) related activities,

such as finding items that are associated with each

other, the results of BI may need to be sent back to

the transactional database. Otherwise, most of the

time, the direction of data is to flow from the

transactional database to the reporting database.

Since the transactional database hardly ever

needs to support pulling of a large number of

records, we believe the DBMS supporting such a

database should be a light weight one that supports

basic database operations (Select, Update, and

Insert) well and does not even have to be a relational

database.

4 WHY THE SOLUTION WORKS

Using multiple databases is not a new idea. Most

data mining and data warehouse applications

routinely load data from transactional databases

through ETL, which is a mature technology and

enjoy many effective tools. What we are proposing

here is to start thinking of separating database needs

between transactional activities and reporting

activities at the very beginning of the system design

and architectural stage so the changes in one

database do not drastically affect the existing

features supported by the other database. Such a

separation can provide a very supportive platform

for Agile software development because the

developers can be much more freely in changing the

database schema, especially the customer facing

related database that generally require the support of

transactional activities. Due to this separation, the

Authentication and

Authorization +

Switchboard

Reporting and

Analysis

Customer

facing features

Relational

DB for

reporting

Light Weight

Data store

ICSOFT-EA2014-9thInternationalConferenceonSoftwareEngineeringandApplications

186

developers can change the schema of the

transactional database with very little worry about

affecting the reporting database. This freedom is not

possible at all without the introduction of the second

database.

Another benefit of developing systems with two

databases is that since the transactions require fast

date retrieval, data insertion, and record update for

small amount of data, it may not be necessary to

employ a fully featured database from a commercial

management system (DBMS). We have participated

in successful projects that use XML documents,

MongoDB or other similar products.

MongoDB is an excellent example of an open

source document-oriented NoSQL database

management system. It provides a very different

platform than the traditional RDBMS in terms of

storing and retrieving information. For developers

familiar with the RDBMS, MongoDB provides for

indexes, dynamic queries, replication, and auto

sharding as well as the retrieving, inserting, deleting,

and updating of records. MongoDB is easy to learn

and use because it supports, comparing with a fully

functioning RDBMS such as Oregon 11g or SQL

Server 2012, a very limit set of features--not a

weakness considering its purposes.

Beside differences in terminology, one of the

main differences between a database of MongoDB

(called an instance) and a database supported by a

RDBMS is that a MongoDB collection, similar to a

table in a relational database, can hold any type of

object, basically every record can be different in

terms of format, data types of fields, and even the

number of fields (we call columns in relational

database terms). In addition, a record can contain

arrays as a field's data type. This is MongoDB's

approach of handling the traditional one-to-many

relationship between two entities; as a result, it does

not support the relational algebra's join operation

and is certainly not relational.

It is out of the scope of this paper to discuss

whether MongoDB's approach of handling one-to-

many relationship is viable or not. Still, we'd like to

mention that we do not appreciate the suggested

handling of one-to-many relationship because

MongoDB's approach, in our opinion, introduce

complexity in searching on the "many" side. We also

believe that MongoDB should consider adding

support in joining two collections because join

operations are used in retrieving data from databases

all the time. After all, we cannot put the entire

relational database into just one collection.

The benefits come from MongoDB's small set of

features is obvious. Comparing for inserting 50,000

rows, MongoDB is 100 times faster than SQL Server

2008, a popular relational database management

system produced by Microsoft (Kennedy, 2008).

Even after considering that, during the test, SQL

Server 2008 was accessed through LINQ to SQL as

reported in (Kennedy, 2008), a 100 times difference

is significant.

Retrieving data from a MongoDB is also faster

than that of SQL Server 2008. The same article

reported a SQL Server 2008 took 28 seconds to read

out 50,000 records while MongoDB used only 10.4

seconds retrieving the same 50,000 records. For

complex queries, MongoDB can complete 100,000

not so simple queries in 398 seconds. Doing the

same takes SQL Server 2008 960 seconds. All tests

described in (Kennedy, 2008) were conducted on a

Lenovo T61 64-bit with a dual-core 2.8 GHz

processor. The OS is on Windows 7, and all DBMS

are 64-bit ones. We are in the process of conducting

our own performance test and expect to provide our

findings in our final version of the paper. After

reading the detailed experiments given in (Kennedy,

2008), we believe SQL Server 2008's performance

can be improved if proper indexes were added.

Note that, our experience shows that, for projects

following the Agile software development

methodology, schema changes on the transactional

database can be frequent during the development

phase, especially during the early stage of the

development phase. Once a large number of features

have been implemented, the scheme becomes stable.

Once it is determined that the reporting related

features is supported by a second database, the

actual design of the reporting database can be

pushed until the transactional database is relatively

stable. As the result of separated databases, the

reporting database not only is designed with a

mutual understanding of its source database, but also

is built on a more stable schema.

With proper design and architecting and

adopting of the Layers of Data Abstraction concept,

the reporting systems see the databases through

external views, which enjoy immunity of changes in

conceptual schema such as adding columns, tables,

indexes, and views. It is this conceptual model

generally needs to reflect changes in the schema

changes in transactional database.

With increase in popularity on deploying

customer facing application on the Cloud, separating

the transactional database with the reporting

database may become necessary for security reasons.

The transactional databases are generally deployed

with the application on the Cloud, which generally

means it is not on the Intranet. For most enterprise

OnDatastoreSupportforAgileSoftwareDevelopment

187

level security measures, authenticating users and

authorizing accessing of resources on the Cloud

using enterprise maintained user credential is

unlikely due to security concerns. If we have only

one database, we have to implement additional

security measures to make sure the reporting

database can only be accessed by business analysts

and managers. Another sticky issue is that certain

information, such as product costs and suppliers

details, is considered as business secret and often is

against company policy to be stored outside of the

intranet. In that situation, we will be forced to bring

data in the transactional database into the intranet to

have a separate database. If we design our systems

with two databases, meeting the security

requirements becomes relative easy. Once the

database is inside the firewall, we can use the

existing authentication and authorization mechanism

to control the access of the reports.

5 CONCLUSIONS

This one-size-fit-all approach of having just one

database to support users of very different needs

greatly restricts the software development progress

by disallowing or making it every expensive to

change database schema. With Agile software

development methodology, we have to let

developers change the database schema with certain

level of freedom. To solve this mismatch between

the database support and software development

methodology, we propose to consider having at least

two databases: one to support transactional

capabilities and the other to support reporting needs.

The database supporting the transactional operations

does not even have to be relational, such as

MongoDB.

We showed that due to the simplicity nature, the

non-relational database can be faster than their

relational counterpart. The separation of the

transactional database and reporting database

provide easy support of data mining and data

warehousing extensions on the reporting side,

provide much great freedom for the developers to

use the best database support to implement customer

facing features without worry about affecting the

reporting side functionalities. Last but not least,

because the reporting database is independent, it can

be hosted inside the firewall while the transactional

database is deployed on the Cloud.

ACKNOWLEDGEMENTS

We would to thank Western Oregon University for

their support in our research and writing of this

paper with their research grant JL2014.

REFERENCES

Turk, D., France R., and Rumpe, B., 2002. Agile Software

Processes: Principles, Assumptions and Limitations,

Technical Report, Colorado State University

Kennedy, M., 2008. http://blog.michaelckennedy.net

/2010/04/29/mongodb-vs-sql-server-2008-

performance-showdown/

Sommerville, I., 2006. Software Engineering”, 8 Ed,

Addison-Wesley

Liu, J., Liu, F., 2009. Factors contribute to high costs of

software projects, the 2009 International Conference

on Software Engineering Research and Practice, July

13-16, 2009, Las Vegas, USA

Martin, R., 2003. Agile Software Development, Pearson

Education, Inc., Upper Saddle River, NJ

Ambler, S., 2003. Agile Database Techniques, John Wiley

& Sons

Burns, L., 2011 Building the Agile Database, Technics

Publications

Ambler, S., Sadalage P., 2006. Refactoring Databases:

Evolutionary Database Design, Addison-Wesley

Professional

Kroenke, D. and Auer, D., 2013. Database Processing:

Fundamentals, Design, and Implementation, 13th ed,

Pearson Prentice Hall, Upper Saddle River, NJ

ICSOFT-EA2014-9thInternationalConferenceonSoftwareEngineeringandApplications

188