SOFTWARE EFFORT ESTIMATION AS A CLASSIFICATION PROBLEM
Ayşe Bakır, Burak Turhan, Ayşe Bener
2008
Abstract
Software cost estimation is still an open challenge. Many researchers have proposed various methods that usually focus on point estimates. Software cost estimation, up to now, has been treated as a regression problem. However, in order to prevent over/under estimates, it is more practical to predict the interval of estimations instead of the exact values. In this paper, we propose an approach that converts cost estimation into a classification problem and classifies new software projects in one of the effort classes each corresponding to an effort interval. Our approach integrates cluster analysis with classification methods. Cluster analysis is used to determine effort intervals while different classification algorithms are used to find the corresponding effort classes. The proposed approach is applied to seven public data sets. Our experimental results show that hit rates obtained for effort estimation are around 90%-100%s. For point estimation, the results are also comparable to those in the literature.
References
- Alpaydin, E., 2004. Introduction to Machine Learning, MIT Press.
- Angelis, L., Stamelos, I., 2000. A Simulation Tool for Efficient Analogy Based Cost Estimation, Empirical Software Engineering, 5, 35-68.
- Bakar, Z. A., Deris, M. M., Alhadi, A. C., 2005. Performance Analysis of Partitional and Incremental Clustering, SNATI 2005.
- Boehm B. W., 1981. Software Engineering Economics, Prentice-Hall.
- Boetticher, G., Menzies, T., Ostrand, T., 2007. PROMISE Repository of Empirical Software Engineering Data, http://promisedata.org/repository, West Virginia University, Department of Computer Science.
- Gallego, J. J. C., Rodriguez, D., Sicilia, M. A., Rubio, M. G., Crespo, A. G., 2007. Software Project effort Estimation Based on Multiple Parametric Models Generated through Data Clustering, Journal of Computer Science and Technology, 22 (3), 371-378.
- Jorgensen, M., 2003. An Effort Prediction Interval Approach Based on the Empirical Distribution of Previous Estimation Accuracy, Information and Software Technology, 45, 123-126.
- Lee, A., Cheng, C. H., Balakrishnan, J., 1998. Software Development Cost Estimation: Integrating Neural Network with Cluster analysis, Information & Management, 34, 1-9.
- Leung, H., Fan, Z., 2002. Software Cost Estimation, Handbook of Software Engineering and Knowledge Engineering, Vol. 2, World Scientific.
- Quinlan, J. R., 1993. C4.5: Programs for Machine Learning, Morgan Kaufman.
- Sentas, P., Angelis, L., Stamelos, I., 2003. Multinominal Logistic Regression Applied on Software Productivity Prediction, PCI 2003, 9th Panhellenic Conference in Informatics, Thessaloniki.
- Sentas, P., Angelis, L., Stamelos, I., Bleris, G., 2004. Software Productivity and Effort Prediction with Ordinal Regression, Information and Software Technology, 47 (2005), 17-29.
- Stamelos, I., Angelis, L., 2001. Managing Uncertainty in Project Portfolio Cost Estimation, Information and Software Technology, 43(13), 759-768.
Paper Citation
in Harvard Style
Bakır A., Turhan B. and Bener A. (2008). SOFTWARE EFFORT ESTIMATION AS A CLASSIFICATION PROBLEM . In Proceedings of the Third International Conference on Software and Data Technologies - Volume 2: ICSOFT, ISBN 978-989-8111-52-4, pages 274-277. DOI: 10.5220/0001877802740277
in Bibtex Style
@conference{icsoft08,
author={Ayşe Bakır and Burak Turhan and Ayşe Bener},
title={SOFTWARE EFFORT ESTIMATION AS A CLASSIFICATION PROBLEM},
booktitle={Proceedings of the Third International Conference on Software and Data Technologies - Volume 2: ICSOFT,},
year={2008},
pages={274-277},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001877802740277},
isbn={978-989-8111-52-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Third International Conference on Software and Data Technologies - Volume 2: ICSOFT,
TI - SOFTWARE EFFORT ESTIMATION AS A CLASSIFICATION PROBLEM
SN - 978-989-8111-52-4
AU - Bakır A.
AU - Turhan B.
AU - Bener A.
PY - 2008
SP - 274
EP - 277
DO - 10.5220/0001877802740277