with less discriminative power will be ignored auto-
matically (especially in tree pruning).
na
¨
ıve Bayes has the worst performance among the
three classifiers in classifying software efforts. We
explain this as that the variables used for describing
projects may not be independent of each other. More-
over, na
¨
ıve Bayes regard all variables as has equiv-
alent weights as each other in the prediction model.
The conditional probabilities of all variables have the
same weight when predicting the label of an incom-
ing project. However, in fact, some variables of
projects have more discriminative power than other
variables in deciding the project effort. The noise
and peculiarities are often contained in the variables
those have little discriminative power and those vari-
ables should be given less importance in the predic-
tion model. We conjecture that this fact is also the
cause of the poor performance of k-medoids in project
clustering projects. In the same manner as that in k-
medoids clustering, MINI technique has significantly
improved the performance of project classification by
imputing missing values in Boolean vectors.
Table 13 shows the performances of the three clas-
sifiers on CSBSG data set. The similar conclusion as
on ISBSG data set can be drawn on CSBSG data set.
However, the performances of three classifiers on CS-
BSG data set are worse than those on ISBSG data set.
The average of overall accuracies of the three tech-
niques without (with) imputation on CSBSG data set
is decreased by 6.95% (6.66%) using that on ISBSG
data set as the baseline. We also explain this outcome
as the lower quality of CSBSG data set than that of
ISBSG data set.
We can see from Tables 12 and 13 that, in both
ISBSG and CSBSG data sets, all the three super-
vised learning techniques have not produced a favor-
able classification on software efforts using project at-
tributes. The best performance that was produced by
BPNN is with the accuracy around 60%. The accu-
racy as 60% is meaningless for software effort pre-
diction in most cases because, that means that at the
probability 0.4, the prediction results fall beyond the
range of each effort class. Combined with the results
of effort prediction from unsupervised learning, we
draw that the predictability of software effort using
supervised learning techniques is not acceptable by
software industry, either.
6 RELATED WORK
Srinivasan and Fisher (Srinivasan and Fisher, 1995)
used decision tree and BPNN to estimate software de-
velopment effort. COCOMO data with 63 historical
projects was used as the training data and Kremer data
with 15 projects was used as testing data. They re-
ported that decision tree and BPNN are competitive
with traditional COCOMO estimator. However, they
pointed out that the performances of machine learning
techniques are very sensitive to the data on which they
were trained. (Finnie et al., 1997) compared three es-
timation techniques as BPNN, case-based reasoning
and regression models using Function Points as the
measure of system size. They reported that neither of
case-based reasoning and regression model was favor-
able in estimating software efforts due to the consid-
erable noise in the data set. BPNN appears capable
of providing adequate estimation performance (with
MRE as 35%) nevertheless its performance is largely
dependent on the quality of training data as well as
the suitability of testing data to the trained model. Of
all the three methods, a large amount of uncertainty is
inherent in their performances. In both (Finnie et al.,
1997) and (Srinivasan and Fisher, 1995), a serious
problem confronted with effort estimation using ma-
chine learning techniques is that huge uncertainty in-
volved in the robustness of these techniques. That is,
model sensitivity and data-dependent property of ma-
chine learning techniques hinder their admittance by
industrial practice in effort prediction. These work as
well as (Prietula et al., 1996) motivates this study to
investigate the effectiveness of a variety of machine
learning techniques on two different data sets.
(Park and Baek, 2008) conducted an empirical
validation of a neural network model for software ef-
fort estimation. The data set used in their experiments
is collected from a Korean IT company and includes
148 IT projects. They compared expert judgment, re-
gression models and BPNN with different input vari-
ables in software effort estimation. They reported
that neural network using Function Point and other 6
variables (length of project, usage level of system de-
velopment methodology, number of high/middle/low
level manpower and percentage of outsourcing) as in-
put variables outperforms other estimation methods.
However, even in the best performance, the average
MRE is nearly 60% with standard deviation more than
30%. This result makes it is very hard that the method
proposed in their work can be satisfactorily admitted
in practice. For this reason, a validation of machine
learning methods is necessary in order to shed light on
the advancement of software effort estimation. This
point also motivates us to investigate the effectiveness
of machine learning techniques for software effort es-
timation and the predictability of software effort using
machine techniques.
(Shukla, 2000) proposed a neuron-genetic ap-
proach to predict software development effort while
ENASE 2011 - 6th International Conference on Evaluation of Novel Software Approaches to Software Engineering
12