testing. Overall, the error rates obtained are
considered quite promising in relation to other GP
approaches on the same datasets mentioned in
relative works (e.g., Burgess and Lefley, 2001,
report best population solutions for the Desharnais
dataset: MMRE=0.379 and CC=0.824). The best-of-
run equations obtained from the GP experiments
seem to perform consistently well with insignificant
differences, indicating a promising generalization
ability over the datasets employed.
6 CONCLUSIONS
The current work utilized Genetic Programming to
derive classical regression equations applied on two
publicly available project cost datasets to provide
accurate software development effort estimations.
The main contribution of this work is the automatic
creation and exploration of a large set of different
equations represented by parse trees evaluated
through newly devised fitness functions. The
experiments showed that GP performed consistently
well and reached to constructive solutions that yield
relatively successful effort approximations. This
finding is also in agreement with observations from
other research studies that compared GP to other
techniques. More specifically, the work of (Lefley
and Shepperd, 2003) focused on comparing GP with
other techniques for cost estimation. The authors
assess the accuracy of the estimates using data
within and outside organizations (SSTF dataset) and
report that GP performs very well but requires a lot
of expertise. They also emphasize the need of
producing accurate enough and simpler equations.
Similar works using the Desharnais dataset (Burgess
and Lefley, 2001) compare techniques for predicting
effort and argue that there may be other techniques
or model characteristics (despite accuracy degree)
that should have an equal, if not greater impact upon
their selection, such as ‘transparency’ and ‘ease of
configuration’. It seems that GP can produce
relatively quite transparent solutions in the sense that
they are expressed in expressions. However, again
they mention that some expertise is required to
choose configuration values for the parameters.
The present work took into consideration the
previous suggestions and attempted to obtain
simpler and suitable equations to predict effort. A
possible limitation of this work is the specific
selection of the operands used, which were
considered expressive enough to cover the potential
solution space and not too general or narrow to
radically increase execution time or constrain the
search space. Furthermore, we imposed several
restrictions to control the effect of bloat in GP
execution in order to save on both storage space and
algorithm execution time. The experiments were
designed to give a realistic dimension to the
solutions obtained in the form of equations that can
be easily interpreted and used by project managers.
Future research steps will emphasize on utilizing
operators of categorical and numerical type and
modified fitness functions that may provide
improvements to the results of the GP. Such fitness
functions could include for example combinations of
performance metrics, parameter settings and
facilitate in achieving even better effort predictions.
REFERENCES
Albrecht, A. J., 1979. Measuring Application
Development Productivity. Proceedings of the Joint
SHARE/GUIDE/IBM Application Development
Symposium, pp. 92.
Boehm, B. W., 1981. Software Engineering Economics,
Prentice Hall.
Boehm, B. W., Abts, C., Brown, A., Chulani, S., Clark B.,
Horowitz, E., Madachy, R., Reifer, D., Steece, B.,
2000. Software Cost Estimation with COCOMO II,
Pearson Publishing.
Burgess, C. J., Leftley, M., 2001. Can Genetic
Programming Improve Software Effort Estimation? A
Comparative Evaluation. Inform. and Soft. Tech., 43
(14), pp. 863-873.
Desharnais, J. M., 1989. Analyse Statistique de la
Productivite des Projects de Development en
Informatique a Partir de la Technique de Points de
Fonction. MSc. Thesis, Université du Québec,
Montréal.
Heiat, A., 2002. Comparison of Artificial Neural Network
and regression models for estimating software
development effort. Information and Software
Technology, 44, pp. 911-922.
Holland, J. H., 1992. Genetic Algorithms, Scientific
American, Vol. 267, No. 1, pp. 66–72, New York.
Huang, S.-J., Chiu N.-H., 2008. Optimization of analogy
weights by genetic algorithm for software effort
estimation. Information and Software Technology, 48,
pp. 1034-1045.
Jørgensen, M., Shepperd, M., 2007. A Systematic Review
of Software Development Cost Estimation Studies.
IEEE Transactions on Software Engineering, 33, No.
1, IEEE Computer Press, pp. 33-53.
Koza, J. R., 1992. Genetic Programming: On the
Programming of Computers by Means of Natural
Selection, MIT Press, Massachusetts.
Lefley, M., Shepperd, M.J., 2003. Using Genetic
Programming to Improve Software Effort Estimation
Based on General Data Sets, Proceedings of GECCO,
pp. 2477-2487.
ICEIS 2010 - 12th International Conference on Enterprise Information Systems
286