amount of research has already been done in SEE, uti-
lizing ML approaches to handle the inadequacies of
conventional and parametric estimation strategies and
align with present-day development and management
strategies. However, mostly owing to uncertain out-
comes and obscure model development techniques,
only a few or none of the approaches can be practi-
cally used for deployment.
This paper aims to improve the process of SEE
with the help of a powerful and handy approach. For
this purpose, we have proposed an ELM based ap-
proach for SEE to tackle the issues mentioned above.
This has been accomplished by applying the ISBSG
dataset. The ISBSG dataset contains 9178 projects
developed in different programming languages using
different development methodologies. So, this data
is heterogeneous data, which usually leads to incon-
sistent estimates. So, the necessary steps are needed
to remove the noise from the data. The projects in the
dataset have been filtered based on data quality rating,
UFP rating, missing values in the dependent variable,
and missing values in independent variables. Here,
we have considered only those projects in which the
functional size is measured in IFPUG 4+. After per-
forming the data pre-processing, only 927 projects
were left with 12 features. Then, the acquired dataset
is given as input to the ML models.
The ML model learns on train data and gives esti-
mations for testing data. Then, the error estimates of
different models are evaluated with different perfor-
mance evaluation measures. By looking at the results,
we can say that the ELM model has outperformed the
other models, depending on the evaluation measures.
To validate the results of these models, we have con-
ducted a Wilcoxon Rank test to check whether the
models have a significant difference or not. Based on
the results, we found that the ELM model is signifi-
cantly different from every other model except MLP.
Finally, we compared the results of the proposed mod-
els with the benchmark ANN and fuzzy models de-
velop over the same dataset. Table 7 shows the com-
parison of the proposed model with the benchmark
models based on different accuracy measures. The
MAE values of the ELM model largely differ from
ANN and fuzzy models. The proposed model shows
an improvement of 53.08% compared to the bench-
mark fuzzy model.
In the future, it is recommended to use the more
advanced ML algorithms on some other datasets.
Also, the ISBSG dataset contains outliers that have
not been studied in this study. So, it’s recommended
to use different techniques to remove outliers from the
data to make it more useful for ML. The ISBSG data
without outliers may improve SEE analysis.
REFERENCES
Albrecht, A. (1979). Measuring application development
productivity. In In IBM Application Development
Symposium, pages 83–92.
Azzeh, M. and Nassif, A. (2016). A hybrid model for es-
timating software project effort from use case points.
Applied Soft Computing, 49:981–990.
Ben-David, S. and Shalev-Shwartz, S. (2014). Understand-
ing Machine Learning: From Theory to Algorithms,
Understanding Machine Learning: From Theory to
Algorithms. Cambridge University Press, New York.
Berlin, S., Raz, T., Glezer, C., and Zviran, M. (2009). Com-
parison of estimation methods of cost and duration in
it projects. Information and software technology jour-
nal, 51:738–748.
Boehm, B. W. (1981). Software Engineering Economics.
Prentice Hall, 10 edition.
Drucker, H., Burges, C., Kaufman, L., Smola, A., and Vap-
nik, V. (1997). Support vector regression machines.
In In Advances in neural information processing sys-
tems, pages 155–161.
Galorath, D. and Evans, M. (2006). Software Sizing, Es-
timation, and Risk Management. Auerbach Publica-
tions.
Garc
´
ıa, S., Luengo, J., and Herrera, F. (2016). Tuto-
rial on practical tips of the most influential data pre-
processing algorithms in data mining. Knowledge
based Systems, 98:1–29.
Guevara, F. G. L. D., Diego, M. F., Lokan, C., and Mendes,
E. (2016). The usage of isbsg data fields in software
effort estimation: a systematic mapping study. ournal
of Systems and Software, 113:188–215.
Han, J., Kamber, M., and Pei, J. (2006). Data Mining: Con-
cepts and Techniques. Morgan Kaufmann.
Hardin, J., Hardin, J., Hilbe, J., and Hilbe, J. (2007). Gen-
eralized linear models and extensions. Stata press.
Huang, G., Zhou, H., Ding, X., and Zhang, R. (2011). Ex-
treme learning machine for regression and multiclass
classification. IEEE Transactions on Systems, Man,
and Cybernetics, Part B (Cybernetics), 42(2):513–
529.
Huang, J., Li, Y., and Xie, M. (2015). An empirical anal-
ysis of data preprocessing for machine learning-based
software cost estimation. Information and Software
Technology, 67:108–127.
Idri, A., Hosni, M., and Abran, A. (2016). Improved esti-
mation of software development effort using classical
and fuzzy analogy ensemble. Applied Soft Computing,
49:990–1019.
ISBSG (2019). International Software Benchmarking Stan-
dards Group.
Jorgensen, M. and Shepperd, M. (2007). A system-
atic review of software development cost estimation
studies. IEEE Transaction of Software Engineering,
33(1):33–53.
Kemerer, C. (1993). Reliability of function points measure-
ment: a field experiment. In Commun. ACM 36, page
85–97.
ENASE 2021 - 16th International Conference on Evaluation of Novel Approaches to Software Engineering
56