impact very significantly on all the results, and
subjects with enough expertise should be able to
easily approach the theoretic best value for
effectiveness, as for orthogonality and discrepancy.
Our consequent expectation is that there should be
objectivity in defect categorization, whether enacted
by software practitioners. However, such an
expectation still needs empirical evidence. Further
results show that, when time spent in categorizing
defects lasts between 1 and 3 hours, the
effectiveness, orthogonality, and discrepancy are not
affected by the time duration of the classification
section. Moreover, results show that the
programming language of coded artifacts, and the
defect nature seem to impact insignificantly on
effectiveness, orthogonality, and discrepancy.
Finally, our results show that there are some
categories that tend to confuse subjects; this, in our
understanding, calls for improving definitions of
those ODC DT categories, as actually given by IBM.
Namely, those categories are “Interface/OO
Message” and “Relationships”. Further confusing
categories are “Assignment/Initialization” and
“Algorithm/Method” on one side, and
“Algorithm/Method” and “Checking” on the other
side, which confirm previous results (Henningsson
and Wohlin, 2004).
Our plan for the future is first to extend the size of
our defect repository, place the material in electronic
format, and contact IBM experts in the aim of
receiving their categorizations of our defect samples
(to use as the reference “correct” categorizations),
and then to proceed with replicating the experiment
with professionals both in a controlled environment,
and through the Web. This should also provide the
precise timing of each categorization, and help to
investigate efficiency in deep.
REFERENCES
Abdelnabi Z., G. Cantone, M. Ciolkowski, D. Rombach:
“Comparing Code Reading Techniques Applied to
Object-oriented Software Frameworks with regard to
Effectiveness and Defect Detection Rate”, Proceedings
of the 2004 International Symposium on Empirical
Software Engineering, pp. Redondo Beach (CA),
2004.
Basili V.R, G. Caldiera, H.D. Rombach: “Goal Question
Metric Paradigm”, in Encyclopaedia of Software
Engineering, J.J. Marciniak Edr., Vol. 1, pp. 528-532,
John Wiley & Sons, 1994.
Basili V.R., and R. Selby: “Comparing the Effectiveness
of Software Testing Strategies”, IEEE Transactions on
Software Engineering, CS Press, December, 1987, pp.
1278 -1296.
Cantone G., Z. A. Abdulnabi, A. Lomartire, G. Calavaro:
“Effectiveness of Code Reading and Functional
Testing with Event-Driven Object-Oriented Software”,
Empirical Methods and Studies in Software
Engineering, R. Conradi and A. I. Wang Eds., LNCS
2765, pp. 166-193, Springer, 2003.
Cohen J.: "A Coefficient of Agreement for Nominal
Scales". In Educational and Psychological
Measurement, 20:37-46, 1960.
Durães J. and Madeira H., "Definition of Software Fault
Emulation Operators: a Field Data Study", In Proc. of
2003 International Conference on Dependable
Systems and Networks", (2003)
El Emam K. and I. Wieczorek: “The Repeatability of
Code Defect Classifications”, Proceedings of
International Symposium on Software Reliability
Engineering, pp. 322-333, 1998.
Henningsson K. and C. Wohlin: “Assuring Fault
Classification Agreement – An Empirical Evaluation”
Proceedings of the 2004 International Symposium on
Empirical Software Engineering, 2004.
Juristo N. and S. Vegas: “Functional Testing, Structural
Testing, and Code Reading: What Fault Type Do They
Each Detect?”, Empirical Methods and Studies in
Software Engineering, R. Conradi and A. I. Wang
Eds., LNCS 2765, pp. 208-232, Springer, 2003.
Myers G.J.: “A Controlled Experiment in Program Testing
and Code Walkthroughs/Reviews”, Communications
of ACM, Vol. 21 (9), pp. 760-768, 1978.
Wohlin C., P. Runeson, M. Höst, M.C. Ohlsson, B.
Regnell, A. Wesslén: “Experimentation in Software
Engineering: An Introduction”, The Kluwer
International Series in Software Engineering, 2000.
IBM a, “Details of ODC v 5.11”,
www.research.ibm.com/softeng/ODC/DETODC.HTM
, last access: 02/05/2006.
IBM b, “ODC Frequently Asked Questions”,
www.research.ibm.com/softeng/ODC/FAQ.HTM, last
access: 02/05/2006.
EXPLORING FEASIBILITY OF SOFTWARE DEFECTS ORTHOGONAL CLASSIFICATION
117