AN EMPIRICAL EVALUATION OF EVOLUTIONARY DESIGN
APPROACH
Design, Results and Discussion of Experiments on Extreme Programming
René Noël, Marcello Visconti, Gonzalo Valdés and Hernán Astudillo
Departamento de Informática, Universidad Técnica Federico Santa María, Av. España 1258, Valparaíso, Chile
Keywords: Extreme Programming, Evolutionary Design, Experimental Design.
Abstract: Evolutionary Design is Extreme Programming’s approach to organize software structure and its
relationships, encouraging refactoring, test driven development and the simplest solution for the
requirements of a single iteration, thus avoiding a big up-front design activity at the beginning of the project
that can cause carrying on a huge structural complexity throughout the whole project. In order to contrast
this approach with a planned or traditional design approach, an empirical evaluation of impact on software
design quality and process productivity has been designed and conducted in an academic environment with
toy size problems. Experimental studies planning details are presented, and two replications with different
experimental designs are described. Results suggest that there are no differences in quality between both
approaches, and that productivity is better when a planned design is adopted.
1 INTRODUCTION
Extreme Programming (XP) (Beck, 1999) proposes
an iterative process and an evolutionary design
(Fowler, 2004) approach that encourages the design
of the simplest solution for the requirements
considered in each iteration, without worrying about
next iteration requirements and its design
complexity. This approach allows embracing change
by avoiding big up-front design activity and carrying
design complexity throughout the whole project, as
traditional software processes proposes.
There are documented cases of development
teams that resist to this approach (Harrison, 2003;
Keefe and Dick, 2004; Müller and Tichy, 2001;
Rasmusson, 2003) questioning XP design practices
and their naturalness; besides, a big up-front design
is an instance for identifying reusable structures such
as architectural or design patterns (Gamma et al,
1995), and may help to improve productivity and
product internal quality. Theoretical fundaments can
be argued in favour of both design approaches, but
Experimental Software Engineering offers a chance
to gather empirical evidence on its impact over
product quality and process productivity. This paper
presents two experimental designs and their results
on evaluating the impact of different design
approaches over product quality and process
productivity. The design topics covered in this
article are limited to the detailed design activity
(class models, method’s algorithms) and not
architectural design issues.
Section 2 presents the two experimental designs,
and its results. Section 3 discusses the main findings
of the study considering both experimental studies,
its interpretation on Evolutionary Design context
and future work.
2 EXPERIMENTAL DESIGN
In order to compare XP approach with a traditional
or planned approach, an empirical evaluation has
been designed. The goal of the study is to detect the
existing differences on product quality and process
productivity when developing with an XP design
approach versus a planned design approach.
2.1 Activity Description
The software development activity (which subjects
must tackle) was designed considering the practices
that must be implemented, and time constraints to
execute experiment trials. Design, coding and testing
of software features can’t take more than two
119
Noël R., Visconti M., Valdés G. and Astudillo H. (2007).
AN EMPIRICAL EVALUATION OF EVOLUTIONARY DESIGN APPROACH - Design, Results and Discussion of Experiments on Extreme Programming.
In Proceedings of the Second International Conference on Evaluation of Novel Approaches to Software Engineering , pages 119-122
DOI: 10.5220/0002586201190122
Copyright
c
SciTePress
academic time slots, so we planned a 4-hour activity
and design problems that could be solved in that
time box. A method based on XP original definition
(Beck, 1999; Wake, 2001; Jeffries et al, 2000) was
defined. In addition, a variation of the original XP
method was defined, in order to incorporate a
planned design session at the start of each iteration.
Each trial of the experimental study consists of
training the subjects on a development method, and
then making them apply the method on a given
problem.
The application of the methods consists basically
of developing a simple software solution, whose
requirements are separated in two sets. The whole
context of the problem is presented on a global
system description that explains the whole
functionality and all the domain elements and its
relationships. The first set specifies a functionality
that adds value to the client, but is not the whole
system functionality. The second set complements
the first one, and covers all the desired features of
the system. By giving a whole system description at
the start, planned design subjects can anticipate
system design complexity, probably improving its
final design quality, and also improving its
productivity when identifying already solved
problems. If too much time is spent on making this
up-front design, productivity could be negatively
affected. XP approach can presumably be more
productive without a first up-front design, but
separating requirements in iterations guarantees that
design must be at least reviewed to implement the
second set of requirements, and could probably
require refactoring the first iteration code.
2.2 Hypotheses and Variables
High level null hypotheses are formulated in order to
evaluate the existence of differences on product
quality and process productivity using XP approach
and planned design approach:
H0’: The use of planned design approach produces
software with an equal design quality than
evolutionary design approach
H0’’: The use of an evolutionary software design
approach is as productive as a planned approach on
an XP project.
In order to test these hypotheses, quality and
productivity are measured by using standard metrics.
For each of the metrics, a null hypothesis is
formulated, for instance:
H0’1: Productivity measured on LOC/minute is the
same using evolutionary or planned design
approach
Independent variables that can affect quality and
productivity are identified:
Design Approach: is the factor under study,
and has two levels: evolutionary approach
(XP’s approach) and planned approach.
Participant’s Design Experience: is an
undesirable factor of influence and the
experimental design will focus on minimizing
its impact.
Problem to Solve: If different problems are
used on the study, the impact must be
minimized or evaluate if the impact on results
is statistically significant.
XP Knowledge: to minimize the influence of
this undesirable factor, participants are trained
on XP’s practices an process, and a guided
exercise applying them is performed.
Business logic complexity can be addressed
during the detailed design by creating an appropriate
structure of classes, methods and interfaces that
provides a maintainable and flexible solution, or
assigning complexity to a few and highly complex
methods, attempting to maintainability and
understandability of the code. In order to evaluate
software quality, methods’ algorithms complexity is
measured.
Process Productivity:
LOC/minute (PTLOC)
Number of Classes/hour (PNOC)
Number of Methods/minute (PNOM)
Product Internal Quality (Design Quality):
Decision Count (DC)
Maximum McCabe Cyclomatic Complexity
(MCC), of the more complex method coded
by each subject.
Number of Code Statements (NSTMNT)
2.3 Original Experimental Design
The original experimental design was presented in
(Noël et al, 2005), consisting on a random blocked
design, with one factor (the Design Approach) and
two levels (XP and planed approach). Participants
were undergraduate students that had previously
approved an Object Oriented Analysis and Design
course. Two blocks were conformed by subjects
with design experience and without design
experience (marked with different grey levels in
table 1), previously evaluated by a survey. Subjects
were 31 development teams conformed by a pair of
developers, with equivalent design experience. A
training session of 1.5 [hours] was performed.
ENASE 2007 - International Conference on Evaluation on Novel Approaches to Software Engineering
120
Table 1: Original Experimental Design.
Subject XP Approach Planned Approach
1 X
2 X
...
30 X
31 X
Remarkable results of acceptance test are the
following:
0,052
0,0804
0,1503
0,3726
0,4424
0,5843
0,8078
0,1481
0,6047
0,2281
0,2645
0,4859
0
0,2
0,4
0,6
0,8
1
PTNOC PTLOC PTNOM DC NSTMNT MCC
metric
significance
NDE
DE
Figure 1: Significance Test Results for Original
Experimental Design.
Chart shows the t-Test 2-tailed probability of
rejecting null hypothesis when it's true, on subjects
blocked by experience on design experienced
subjects (DE) and no experienced subjects (NDE).
Significant differences on productivity for subjects
with no design experience where found (p<0.07),
obtaining better results with a planned design
approach. Differences on software design quality are
less significant, but show that experienced subjects
produce higher quality software with a planned
design approach. The difference between experience
blocks leads us to try to mitigate the influence of this
factor instead of blocking it, in order to isolate the
method effect from the experience effect.
2.4 Second Experimental Design
To mitigate the influence of the subjects’ particular
experience, each of them must use both design
approaches. This implies having two distinct
problems to solve, in order to not introduce
maturation bias when applying both approaches on
the same problem. Thus a 2x2 factorial design with
repeated measures is proposed where the factors are
the Design Approach and the problem to be solved
(Table 2).
Table 2: Second Experimental Design.
Group 1 Group 2
Training Training
Method 1
Problem 1
Method 2
Problem 1
Training Training
Method 2
Problem 2
Method 1
Problem 2
To avoid carryover effects, counterbalancing is
applied: if the whole Group 1 uses Method 1 in first
place and then Method 2, a bias can be introduced
due to the learning of tools, programming language
or others on the first session. Thus, one half of
Group 1 will use method 1 and problem 1, and the
other half will use method 2 and problem 2. On the
second day, methods and problems will be inverted.
Same strategy is applied to group 2. Participants
were 22 senior-level students, with Object Oriented
Design knowledge and some professional practice.
A detailed comparison of the two experimental
designs is presented in (Noël et al, 2007).
2.5 Experimental Results Summary
Given the 2x2 factorial design with repeated
measures, and the factors Development Method and
Problem, 3 sets of hypotheses can be formulated.
For Main Effect Development Method:
H0: There is no difference between subjects
using XP’s Design Approach and subjects using
Planned Approach with respect to VAR[i]
H1: There is a difference between subjects
using XP’s Design Approach and subjects using
Planned Approach with respect to VAR[i]
Where i = PTLOC, PNOC, PNOM, DC, MCC,
NSTMNT. Similar hypotheses are formulated for
Main Effect Problem and Interaction Effect Method
X Problem. Looking for method main effect over
dependent variables will allow testing hypotheses
raised on section 2.2
Data was analysed applying Analysis of
Variance for the 2x2 factorial with repeated
measures design. All the tests of significance for the
influence of factors and interception applied led to
the same significance levels for each metric, that are
presented on figure 2. Looking at method’s effect,
we can see that null hypotheses can’t be rejected; no
differences in quality and productivity between
evolutionary and planned design approaches can be
demonstrated by the influence of the Method factor.
Estimated marginal means for each metric for
methods 1 and 2, presented on table 3 suggest that
although not statistically significant, the Planned
Design Approach (Method 2) is more productive
than XP Design Approach (Method 1), for every
metric of productivity. Product design quality
metrics are favourable to XP Design Approach for
Number of Statements and Maximum Cyclomatic
Complexity. Decision Count metric suggests that
more control flow statements were coded when
using XP Design Approach than using a Planned
Design Approach.
AN EMPIRICAL EVALUATION OF EVOLUTIONARY DESIGN APPROACH - Design, Results and Discussion of
Experiments on Extreme Programming
121
2.6 Validity Discussion
The main threats to validity are related to three key
factors:
Activity Design for Process Implementation.
Available time for perform the activity forces us to
adapt XP practices and to work with toy problems,
so it might be a threat for construct validity.
Metrics and Problems Size. Toy problems
could be not complex enough to get significant
differences on quality or productivity for the chosen
metrics, attempting to construct validity.
Academic Environment. Students could not be
representative of professional developers, attempting
to the external validity of the study.
0,136
0,403
0,736
0,311
0,532
0,269
0,666
0,12
0,682
0,961
0,805
0,535
0,589
0,708
0,823
0,906
0,687
0,568
0
0,2
0,4
0,6
0,8
1
1,2
PTLOC PNOC PNOM NSTMNTS DC MCC
metrics
si gni fi cance
METHO D PRO B LEM METHO D * PROBL EM
Figure 2: ANOVA Significance Test Results for 2x2
Factorial with Repeated Measures Design.
Table 3: Estimated Marginal Means for Method’s Effect.
Measure Method Mean Std. Error 95% Confidence Interval
Lower Bound Upper Bound
PTLOC 1 ,458 ,053 ,335 ,582
2 ,568 ,080 ,385 ,751
PNOC 1 1,106 ,162 ,732 1,480
2 1,294 ,133 ,986 1,601
PNOM 1 ,079 ,020 ,031 ,126
2 ,090 ,025 ,033 ,146
NSTMNT 1 22,167 2,26 16,957 27,376
2 26,611 3,76 18,043 35,179
DC 1 3,611 ,740 1,905 5,317
2 3,056 ,868 1,054 5,057
MCC 1 3,333 ,717 1,680 4,986
2 4,444 ,966 2,216 6,673
3 CONCLUSIONS
Productivity results are consistent between original
and second execution of the experimental study:
results suggest that a Planned Design approach
always yields a better productivity. For quality
metrics, in the first experimental study planned
design approach yields better quality, but in the
second, the results suggest that subjects using XP
evolutionary approach get better quality products.
However, both quality results are far from being
statistically significant, so this suggests that no
product design quality differences exists when using
distinct design approaches.
When facing the design activity, we can choose
between a planned approach or an evolutionary
approach. Our study suggests that no significant
differences on quality between both approaches can
be demonstrated, and that process productivity is
better with a planned approach, so we can evaluate
the trade-offs of increasing process productivity by
planning the design, or empowering the process
capability for embracing change through the
adoption of XP original design approach, without
affecting product design quality.
REFERENCES
Beck, K., 1999. Extreme Programming Explained:
Embrace Change, Addison-Wesley.
Fowler, M., 2004. Is Design Dead? http://
www.martinfowler.com/articles/designDead.html.
Gamma, E., Helm, R., Johnson, R., & Vlissides, J., 1995.
Design Patterns: Elements of Reusable Object-
Oriented Software, Addison-Wesley.
Harrison, N., 2003. A Study of Extreme Programming in a
Large Company. Avaya Labs. http://
www.agilealliance.org/system/article/file/1292/file.pdf
Henderson-Sellers, B., 1996. Object-Oriented Metrics,
measures of Complexity, Prentice Hall.
Jeffries, R., Anderson, A., Hendrickson, C., & Jeffries, R.
E., 2000. Extreme Programming Installed, Addison-
Wesley.
Keefe, K., & Dick, M., 2004. Using Extreme
Programming in a Capstone Project. In Proc. 6
th
Australasian Computing Education Conference,
Dunedin, New Zealand, pp. 151-160.
Müller, M., & Tichy, W., 2001. Case Study: Extreme
Programming in a University Environment. In Proc.
23
rd
Int’l Conference on Software Eng., Toronto,
Canada, pp. 537-544.
Nöel, R., Visconti, M., Valdés, G., & Astudillo, H., 2007.
Lab. Package for the Investigation about the Impact of
Software Design Approaches on XP. http://www.
labada.inf.utfsm.cl/~amigosisw/xpdesign_package.
Nöel, R., Astudillo, H., Visconti, M., & Pereira, J., 2005.
Evaluating Design Approaches in Extreme
Programming. In Experimental Software Eng. Latin
American Workshop, Uberlandia, Brazil.
Rasmusson, J., 2003. Introducing XP into Greenfield
Projects: Lessons Learned. IEEE Software, vol. 33, no.
7, pp. 21-28.
Wake, W., 2001. Extreme Programming Explored,
Addison-Wesley.
ENASE 2007 - International Conference on Evaluation on Novel Approaches to Software Engineering
122