leases were located in different packages. This re-
sulted in time being spent to find the classes of mod-
ules for the whole project. We found that the packag-
ing of classes helps any member of staff to work more
effectively. On the other hand, the project level prod-
uct size was easily achieved, because at this level, the
code size can be found by counting the lines of whole
code.
Finally, the most significant step was deciding the
most suitable prediction model. To determine which
models that best fit our situation, we used defect den-
sity distributions and we observe that the module
level defect density distribution curves converge to
the Rayleigh curves and the project level defect den-
sity distribution curve converge to a straight line that
shows a Simple Linear Regression.
6 CONCLUSIONS
In this study, we examined a defect prediction pro-
cess of an iterative, civil project in which defect ar-
rival phase data is not recorded. We analyzed the
defect density data at module level and project level
to predict the latent defect densities. By analyzing
module level distributions; we were able to analyze
defects in smaller code segments and by analyzing
the project level distributions; we analyzed the de-
fects in general. We used a software reliability growth
model (SRGM), the Rayleigh Model and the Simple
Linear Regression Model which need basic statistical
and mathematical information. To predict the defect
density of a module, we used the mean of three sep-
arate modules from the function coefficients from the
Rayleigh Model and the Linear Model to create pre-
diction model functions.
The results for the project level prediction show
that the performance of the Linear Model is better
than the performance of the Rayleigh Model. From
these results, since the MMRE value of Linear Model
was under 0.25, it can be seen that the Linear Model
had a more acceptable performance. The result for the
module level shows that performance of the Linear
Model is better than the Rayleigh Model with respect
to RMSE and MAE values. However, the MMRE
value of the Linear Model does not have an accept-
able performance value (it should be less than 0.25).
The MMRE value of Rayleigh Model has a suitable
value which is smaller than the acceptable result of
0.25.
The modules used to construct a prediction model
have different complexities. So, we can increase the
performance of module level prediction results by in-
creasing the number of modules that are investigated.
At project level, due the nature of iterative develop-
ment, the defect arrival rate generally remains at the
same level and the distribution of the defect densities
converge to a Linear Model. Therefore, the Linear
Model has a higher performance than the Rayleigh
Model.
The most important lessons learned from this
study are observing the nature of defect distributions,
identifying and overcoming the lack of data, and de-
ciding the most suitable prediction model that fits best
to the situation. Defect arrival and removal data col-
lection is important to model defect arrival and re-
moval patterns in relation to project phases. Due to
the nature of iterative development, a defect found
in an iteration can be fixed in another iteration. It is
very important to specify in which release the defect
is found and in which release the defect is fixed. An-
other important issue is the consistency and integrity
of project planning data with project defect data. De-
fect tracking tool should be capable of accessing and
working integrated with project management tool.
For future work, we are planning to include data
from more modules and projects in the models and
examine the prediction of the project schedule using
the defect removal effort spent by the project staff.
REFERENCES
Abrahamsson, P., Moser, R., Pedrycz, W., Sillitti, A., &
Succi, G. (2007, September). Effort prediction in it-
erative software development processes–incremental
versus global prediction models. In Empirical Soft-
ware Engineering and Measurement, 2007. ESEM
2007. First International Symposium on (pp. 344-
353). IEEE.
B
¨
aumer, M., Seidler, P., Torkar, R., Feldt, R.,
Tomaszewski, P., & Damm, L. O. (2008). Predicting
fault inflow in highly iterative software development
processes: an industrial evaluation. In Supplementary
CD-ROM Proceedings of the 19th IEEE International
Symposium on Software Reliability Engineering: In-
dustry Track.
Eumetcal. (2011). Mean Absolute Error (MAE)
and Root Mean Squared Error (RMSE)
Retrieved September 23, 2013, from
http://www.eumetcal.org/resources/ukmeteocal/-
verification/www/english/msg/ver cont var/-
uos3/uos3 ko1.htm
Hong, Y., Baik, J., Ko, I. Y., & Choi, H. J. (2008,
May). A Value-Added Predictive Defect Type Dis-
tribution Model based on Project Characteristics. In
Computer and Information Science, 2008. ICIS 08.
Seventh IEEE/ACIS International Conference on (pp.
469-474). IEEE.
Jiang, T., Tan, L., & Kim, S. (2013). Personalized defect
prediction. In Proceedings of the 28th IEEE/ACM In-
InvestigatingDefectPredictionModelsforIterativeSoftwareDevelopmentWhenPhaseDataisNotRecorded-Lessons
Learned
57