A Novel Regression Method for Software Defect Prediction with Kernel Methods

Ahmet Okutan, Olcay Taner Yıldız

Abstract

In this paper, we propose a novel method based on SVM to predict the number of defects in the files or classes of a software system. To model the relationship between source code similarity and defectiveness, we use SVM with a precomputed kernel matrix. Each value in the kernel matrix shows how much similarity exists between the files or classes of the software system tested. The experiments on 10 Promise datasets indicate that SVM with a precomputed kernel performs as good as the SVM with the usual linear or RBF kernels in terms of the root mean square error (RMSE). The method proposed is also comparable with other regression methods like linear regression and IBK. The results of this study suggest that source code similarity is a good means of predicting the number of defects in software modules. Based on the results of our analysis, the developers can focus on more defective modules rather than on less or non defective ones during testing activities.

References

  1. Aiken, A. (1997). Moss (measure of software similarity). http://cs.stanford.edu/ aiken/moss/.
  2. Arisholm, E., Briand, L. C., and Fuglerud, M. (2007). Data mining techniques for building fault-proneness models in telecom java software. In Software Reliability, 2007. ISSRE 7807. The 18th IEEE International Symposium on.
  3. Boetticher, G., Menzies, T., and Ostrand, T. (2007). Promise repository of empirical software engineering data http://promisedata.org/ repository, west virginia university, department of computer science.
  4. Boetticher, G. D. (2005). Nearest neighbor sampling for better defect prediction. In Proceedings of the 2005 workshop on Predictor models in software engineering, PROMISE 7805.
  5. Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning, 20:273-297. 10.1007/BF00994018.
  6. Dems?ar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, pages 1-30.
  7. Elish, K. O. and Elish, M. O. (2008). Predicting defectprone software modules using support vector machines. Journal of Systems and Software, 81.
  8. Gondra, I. (2008). Applying machine learning to software fault-proneness prediction. Journal of Systems and Software, 81(2):186-195.
  9. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: An update. SIGKDD Explorations, 11(1):10-18.
  10. Halstead, M. H. (1977). Elements of Software Science (Operating and programming systems series). Elsevier Science Inc., New York, NY, USA.
  11. Hu, Y., Zhang, X., Sun, X., Liu, M., and Du, J. (2009). An intelligent model for software project risk prediction. In International Conference on Information Management, Innovation Management and Industrial Engineering, 2009, volume 1, pages 629 -632.
  12. Kaur, A. and Malhotra, R. (2008). Application of random forest in predicting fault-prone classes. In Advanced Computer Theory and Engineering, 2008. ICACTE 7808. International Conference on.
  13. Kaur, A., Sandhu, P., and Bra, A. (2009). Early software fault prediction using real time defect data. In Machine Vision, 2009. ICMV 7809. Second International Conference on, pages 242 -245.
  14. Lessmann, S., Baesens, B., Mues, C., and Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34:485-496.
  15. McCabe, T. (1976). A complexity measure. IEEE Transactions on Software Engineering, 2:308-320.
  16. Menzies, T., Greenwald, J., and Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33:2- 13.
  17. Munson, J. C. and Khoshgoftaar, T. M. (1992). The detection of fault-prone programs. IEEE Trans. Softw. Eng., 18:423-433.
  18. Pai, G. and Dugan, J. (2007). Empirical analysis of software fault content and fault proneness using bayesian methods. IEEE Transactions on Software Engineering, 33(10):675-686.
  19. Roy, C. K., Cordy, J. R., and Koschke, R. (2009). Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming.
  20. Schleimer, S., Wilkerson, D. S., and Aiken, A. (2003). Winnowing: local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, SIGMOD 7803, New York, NY, USA. ACM.
  21. Shin, M., Ratanothayanon, S., Goel, A. L., and Paul, R. A. (2007). Parsimonious classifiers for software quality assessment. IEEE International Symposium on HighAssurance Systems Engineering, pages 411-412.
  22. Wu, G., Chang, E. Y., and Zhang, Z. (2005). An analysis of transformation on non-positive semidefinite similarity matrix for kernel machines. In Proceedings of the 22nd International Conference on Machine Learning.
  23. Xing, F., Guo, P., and Lyu, M. (2005). A novel method for early software quality prediction based on support vector machine. In International Symposium on Software Reliability Engineering.
  24. Zimmermann, T. and Nagappan, N. (2009). Predicting defects with program dependencies. International Symposium on Empirical Software Engineering and Measurement, pages 435-438.
Download


Paper Citation


in Harvard Style

Okutan A. and Taner Yıldız O. (2013). A Novel Regression Method for Software Defect Prediction with Kernel Methods . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 216-221. DOI: 10.5220/0004290002160221


in Bibtex Style

@conference{icpram13,
author={Ahmet Okutan and Olcay Taner Yıldız},
title={A Novel Regression Method for Software Defect Prediction with Kernel Methods},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2013},
pages={216-221},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004290002160221},
isbn={978-989-8565-41-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - A Novel Regression Method for Software Defect Prediction with Kernel Methods
SN - 978-989-8565-41-9
AU - Okutan A.
AU - Taner Yıldız O.
PY - 2013
SP - 216
EP - 221
DO - 10.5220/0004290002160221