Authors:
Gözde Koçak
;
Burak Turhan
and
Ayşe Bener
Affiliation:
Boğaziçi University, Turkey
Keyword(s):
Software testing, Defect Prediction, Call Graph, Empirical Analysis.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Case-Based Reasoning
;
Enterprise Information Systems
;
Enterprise Software Technologies
;
Pattern Recognition
;
Software Economics
;
Software Engineering
;
Symbolic Systems
;
Theory and Methods
Abstract:
In a large software system knowing which files are most likely to be fault-prone is valuable information for project managers. However, our experience shows that it is difficult to collect and analyze fine-grained test defects in a large and complex software system. On the other hand, previous research has shown that companies can safely use cross company data with nearest neighbor sampling to predict their defects in case they are unable to collect local data. In this study we analyzed 25 projects of a large telecommunication system. To predict defect proneness of modules we learned from NASA MDP data. We used static call graph based ranking (CGBR) as well as nearest neighbor sampling for constructing method level defect predictors. Our results suggest that, for the analyzed projects, at least 70% of the defects can be detected by inspecting only i) 6% of the code using a Naïve Bayes model, ii) 3% of the code using CGBR framework.