to learn defect predictors, due to lack of local
module level defect data.
The first analysis confirms that the average
defect rate of all projects was 15%. While the simple
rule based module requires inspection of 45% of the
code, the learning based model suggested that we
needed to inspect only 6% of the code. This is from
the fact that rule based model has a bias towards
more complex and larger modules, whereas learning
based model predicts that smaller modules contain
most of the defects.
Our second analysis results employed data
adjusted with CGBR framework and improved the
estimations further and suggested that 70% of the
defects could be detected by inspecting only 3% of
the code.
Our future work consists of collecting local
module level defects to be able to build within-
company predictors for this large telecommunication
system. We also plan to use file level code churn
metrics in order to predict production defects
between successive versions of the software.
ACKNOWLEDGEMENTS
This research is supported by Boğaziçi University
research fund under grant number BAP 06HA104,
the Turkish Scientific Research Council
(TUBITAK) under grant number EEEAG 108E014
and Turkcell A.Ş.
REFERENCES
Bell, R.M., Ostrand, T.J., Weyuker, E.J., July 2006.
Looking for Bugs in All the Right Places. Proc.
ACM/International Symposium on Software Testing
and Analysis (ISSTA2006), Portland, Maine, pp. 61-
71.
Boetticher, G., Menzies, T., Ostrand, T., 2007. PROMISE
Repository of empirical software engineering data
http://promisedata.org/repository, West Virginia
University, Department of Computer Science.
Fenton N.E. and Neil M., A critique of software defect
prediction models. IEEE Transactions On Software
Engineering (1999) vol. 25 pp. 675-689
Fenton, N.E., Ohlsson, N., Aug 2000. Quantitative
Analysis of Faults and Failures in a Complex Software
System. IEEE Trans. on Software Engineering, Vol
26, No 8, pp.797-814.
Kocak, G., Turhan, B., Bener, A., 2008. Software Defect
Prediction Using Call Graph Based Ranking (CGBR)
Framework, to appear in Proceedings of
EUROMICRO SPPI (2008), Parma, Italy.
Koru, A. G., Liu, H., 2005. An Investigation of the Effect
of Module Size on Defect Prediction Using Static
Measures. Proceeding of PROMISE 2005, St. Louis,
Missouri, pp. 1-6.
Koru, A. G., Liu, H., Nov.-Dec. 2005. Building effective
defect-prediction models in practice Software, IEEE,
vol. 22, Issue 6, pp. 23 – 29.
Malaiya, Y. K., Denton, J., 2000. Module Size
Distribution and Defect Density, ISSRE 2000, pp. 62-
71.
Menzies, T., Greenwald, J., Frank, A., 2007. Data Mining
Static Code Attributes to Learn Defect Predictors,
IEEE Transactions on Software Engineering, 33, no.1,
2-13.
Menzies, T., Turhan, B., Bener, A., Distefano, J., 2007.
“Cross- vs within-company defect prediction studies”,
Technical report, Computer Science, West Virginia
University.
NASA, “WVU IV&V facility metrics data program.”
[Online]. Available: http://mdp.ivv.nasa.gov
Ostrand, T.J., Weyuker., E.J., July 2002. The Distribution
of Faults in a Large Industrial Software System. Proc.
ACM/International Symposium on Software Testing
and Analysis (ISSTA2002), Rome, Italy, pp. 55-64.
Ostrand, T.J., Weyuker, E.J., Bell, R.M., July 2004.
Where the Bugs Are. Proc. ACM/International
Symposium on Software Testing and Analysis
(ISSTA2004), Boston, MA.
Ostrand, T.J., Weyuker, E.J., Bell, R.M., April 2005.
Predicting the Location and Number of Faults in Large
Software Systems. IEEE Trans. on Software
Engineering, Vol 31, No 4.
Ostrand, T.J., Weyuker, E.J., Bell, R.M., July 2007.
Automating Algorithms for the Identification of Fault-
Prone Files. Proc. ACM/International Symposium on
Software Testing and Analysis (ISSTA07), London,
England.
Turhan, B., Bener, A., 2008. Data Sampling for Cross
Company Defect Predictors, Technical Report,
Computer Engineering, Bogazici University.
Turhan , B., Bener, A., A Multivariate Analysis of Static
Code Attributes for Defect Prediction. Quality
Software, 2007. QSIC '07. Seventh International
Conference on (2007) pp. 231 - 237
Nagappan, N. and Ball T., Explaining failures using
software dependences and churn metrics. Technical
Report, Microsoft Research (2006)
Zhang, H., On the Distribution of Software Faults.
Software Engineering, IEEE Transactions on (2008)
vol. 34 (2) pp. 301-302
Zimmermann, T., Nagappan, N. Predicting Subsystem
Failures using Dependency Graph Complexities.
Technical Report, Microsoft Research (2006).
ICSOFT 2008 - International Conference on Software and Data Technologies
288