SOFTWARE DEFECT PREDICTION: HEURISTICS FOR WEIGHTED NAÏVE BAYES

Burak Turhan, Ayşe Bener

2007

Abstract

Defect prediction is an important topic in software quality research. Statistical models for defect prediction can be built on project repositories. Project repositories store software metrics and defect information. This information is then matched with software modules. Naïve Bayes is a well known, simple statistical technique that assumes the ‘independence’ and ‘equal importance’ of features, which are not true in many problems. However, Naïve Bayes achieves high performances on a wide spectrum of prediction problems. This paper addresses the ‘equal importance’ of features assumption of Naïve Bayes. We propose that by means of heuristics we can assign weights to features according to their importance and improve defect prediction performance. We compare the weighted Naïve Bayes and the standard Naïve Bayes predictors’ performances on publicly available datasets. Our experimental results indicate that assigning weights to software metrics increases the prediction performance significantly.

References

  1. Alpaydin, E., “Introduction to Machine Learning.”, The MIT Press, October 2004.
  2. Auer, M., Trendowicz, A., Graser, B., Haunschmid, E. and Biffl, S., “Optimal Project Feature Weights in Analogy Based Cost Estimation: Improvement and Limitations”, IEEE Transactions on Software Engineering., 32(2), 2006, pp. 83-92.
  3. Basili, V. R., Briand, L. C., and Melo, W. L., “A Validation of Object-Oriented Design Metrics as Quality Indicators”, IEEE Transactions on Software Engineering, 22(10), 1996, pp. 751-761.
  4. Domingos, P. and Pazzani, M., “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss”, Machine Learning., 29(2-3), 1997, pp. 103-130.
  5. Fenton, N.E. and Neil, M., “A critique of software defect prediction models”, IEEE Transactions. on Software. Engineering., 25(5), 1999, pp. 675-689.
  6. Fenton, N. and Ohlsson, N., “Quantitative Analysis of Faults and Failures in a Complex Software System,” , IEEE Transactions on Software Engineering., 2000, pp. 797-814.
  7. Frank, E., Hall, M., Pfahringer, B., “Locally weighted naive Bayes”, In Proceedings of the Uncertainty in Artificial Intelligence Conference, Acapulco, Mexico, Morgan Kaufmann, 2003, pp. 249-256.
  8. Hall, M., “A decision tree-based attribute weighting filter for naive Bayes”, Knowledge-Based Systems., 20(2), 2007, pp. 120-126.
  9. Harrold, M. J., “Testing: a roadmap”, In Proceedings of the Conference on the Future of Software Engineering, ACM Press, New York, NY, 2000, pp. 61-72.
  10. Khoshgoftaar, T. M. and Seliya, N., “Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques”, Empirical Software Engineering., 8(3), 2003, pp. 255-283.
  11. Lewis, D. D., “Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval”, In Proceedings of the 10th European Conference on Machine Learning, C. Nedellec and C. Rouveirol, Eds. Lecture Notes In Computer Science, vol. 1398. SpringerVerlag, London, 1998, pp. 4-15.
  12. Menzies, T., Stefano, J. D., Chapman, M., “Learning Early Lifecycle IV and V Quality Indicators,” In Proceedings of the IEEE Software Metrics Symposium, 2003.
  13. Menzies, T., DiStefano, J., Orrego, A., Chapman, R., “Assessing Predictors of Software Defects,” In Proceedings of Workshop Predictive Software Models, 2004.
  14. Menzies T., Greenwald, J., Frank, A., “Data mining static code attributes to learn defect predictors”, IEEE Transactions on Software Engineering, 33(1), 2007, pp. 2-13.
  15. Mladenic, D. and Grobelnik, M., “Feature Selection for Unbalanced Class Distribution and Naive Bayes”, In Proceedings of the Sixteenth international Conference on Machine Learning, I. Bratko and S. Dzeroski, Eds. Morgan Kaufmann Publishers, San Francisco, CA, 1999, pp. 258-267.
  16. Munson, J. and Khoshgoftaar, T. M., “Regression modelling of software quality: empirical investigation”, Journal of Electronic Materials., 19(6), 1990, pp. 106-114.
  17. Munson, J. and Khoshgoftaar, T. M., “The Detection of Fault-Prone Programs”, IEEE Transactions on Software Engineering., 18(5), 1992, pp. 423-433.
  18. Nagappan N., Williams, L., Osborne, J., Vouk, M., Abrahamsson, P., “Providing Test Quality Feedback Using Static Source Code and Automatic Test Suite Metrics”, International Symposium on Software Reliability Engineering, 2005.
  19. Nasa/Wvu IV&V Facility, Metrics Data Program, available from http://mdp.ivv.nasa.gov; Internet; accessed 2007.
  20. Padberg, F., Ragg T., Schoknecht R., “Using machine learning for estimating the defect content after an inspection”, IEEE Transactions on Software Engineering, 30(1), 2004, pp: 17- 28.
  21. Quinlan, J. R. “C4.5: Programs for Machine Learning.”, Morgan Kaufmann, San Mateo, CA, 1993.
  22. Shepperd, M. and Ince D., “A Critique of Three Metrics,” Journal of Systems and Software., 26(3), 1994, pp. 197-210.
  23. Song, O., Shepperd, M., Cartwright, M., Mair, C., "Software Defect Association Mining and Defect Correction Effort Prediction," IEEE Transactions on Software Engineering., 32(2), 2006, pp. 69-82.
  24. Tahat, B. V., Korel B., Bader, A., "Requirement-Based Automated Black-Box Test Generation", In Proceedings of 25th Annual International Computer Software and Applications Conference, Chicago, Illinois, 2001, pp. 489-495.
  25. Zhang, H. and Sheng S., “Learning weighted naive Bayes with accurate ranking”, In Proceedings of the 4th IEEE International Conference on Data Mining, 1(4), 2004, pp. 567- 570
  26. Zheng, Z. and Webb, G. I., “Lazy Learning of Bayesian Rules”, Machine Learning., 41(1), 2000, pp. 53-84..
Download


Paper Citation


in Harvard Style

Turhan B. and Bener A. (2007). SOFTWARE DEFECT PREDICTION: HEURISTICS FOR WEIGHTED NAÏVE BAYES . In Proceedings of the Second International Conference on Software and Data Technologies - Volume 2: ICSOFT, ISBN 978-989-8111-06-7, pages 244-249. DOI: 10.5220/0001339402440249


in Bibtex Style

@conference{icsoft07,
author={Burak Turhan and Ayşe Bener},
title={SOFTWARE DEFECT PREDICTION: HEURISTICS FOR WEIGHTED NAÏVE BAYES},
booktitle={Proceedings of the Second International Conference on Software and Data Technologies - Volume 2: ICSOFT,},
year={2007},
pages={244-249},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001339402440249},
isbn={978-989-8111-06-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Software and Data Technologies - Volume 2: ICSOFT,
TI - SOFTWARE DEFECT PREDICTION: HEURISTICS FOR WEIGHTED NAÏVE BAYES
SN - 978-989-8111-06-7
AU - Turhan B.
AU - Bener A.
PY - 2007
SP - 244
EP - 249
DO - 10.5220/0001339402440249