that the ML-algorithms are sensitive to the smell type
and the developer. For instance, while the SMO pre-
sented the higher accuracies on detecting Long Met-
hod and Feature Envy smells, such algorithm pre-
sented lowest accuracies on detecting God Class and
Data Class instances. An, almost, inverted behaviour
was verified by the Random Forest algorithm. Simi-
larly, even when an algorithm presented a mean high
accuracy on detecting a given smell, his performance
was not consistent on on detecting such anomalies for
the different developers.
Finally, we will make the dataset used in our ex-
periments available in order to help other studies in
smell detection. We had no knowledge of other data-
sets with a large portion of evaluations manually vali-
dated by different developers over a same set of code
snippets. Thus, the availability of this dataset repre-
sents another contribution of this paper.
REFERENCES
Abbes, M., Khomh, F., Gueheneuc, Y.-G., and Antoniol, G.
(2011). An empirical study of the impact of two anti-
patterns, blob and spaghetti code, on program compre-
hension. In Software maintenance and reengineering
(CSMR), 2011 15th European conference on, pages
181–190. IEEE.
Amorim, L., Costa, E., Antunes, N., Fonseca, B., and
Ribeiro, M. (2015). Experience report: Evaluating
the effectiveness of decision trees for detecting code
smells. In Proceedings of the 2015 IEEE 26th Interna-
tional Symposium on Software Reliability Engineer-
ing (ISSRE), ISSRE ’15, pages 261–269, Washington,
DC, USA. IEEE Computer Society.
Fleiss, J. L. (1971). Measuring nominal scale agreement
among many raters. Psychological bulletin,
76(5):378.
Fontana, F. A., Ferme, V., Marino, A., Walter, B., and Mar-
tenka, P. (2013). Investigating the Impact of Code
Smells on System’s Quality: An Empirical Study on
Systems of Different Application Domains. 2013
IEEE International Conference on Software Mainte-
nance, pages 260–269.
Fontana, F. A., Mäntylä, M. V., Zanoni, M., and Marino, A.
(2015). Comparing and experimenting machine lear-
ning techniques for code smell detection. Empirical
Software Engineering.
Fowler, M. (1999). Refactoring: Improving the Design of
Existing Code. Addison-Wesley, Boston, MA, USA.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann,
P., and Witten, I. H. (2009). The weka data mining
software: an update. ACM SIGKDD explorations new-
sletter, 11(1):10–18.
Khomh, F., Di Penta, M., and Gueheneuc, Y.-G. (2009a).
An Exploratory Study of the Impact of Code Smells
on Software Change-proneness. 2009 16th Working
Conference on Reverse Engineering, pages 75–84.
Khomh, F., Penta, M. D., Guéhéneuc, Y.-G., and Antoniol,
G. (2011a). An exploratory study of the impact of an-
tipatterns on class change- and fault-proneness. Em-
pirical Software Engineering, 17(3):243–275.
Khomh, F., Vaucher, S., Guéhéneuc, Y. G., and Sahra-
oui, H. (2009b). A bayesian approach for the de-
tection of code and design smells. In Quality Software,
2009. QSIC’09. 9th International Conference on, pa-
ges 305–314. IEEE.
Khomh, F., Vaucher, S., Guéhéneuc, Y.-G., and Sahraoui,
H. (2011b). Bdtex: A gqm-based bayesian appro-
ach for the detection of antipatterns. J. Syst. Softw.,
84(4):559–572.
Landis, J. R. and Koch, G. G. (1977). The measurement of
observer agreement for categorical data. biometrics,
pages 159–174.
Maiga, A., Ali, N., Bhattacharya, N., Sabane, A., Gue-
heneuc, Y.-G., and Aimeur, E. (2012). SMURF: A
SVM-based Incremental Anti-pattern Detection Ap-
proach. 2012 19th Working Conference on Reverse
Engineering, pages 466–475.
Mäntylä, M. V. and Lassenius, C. (2006). Subjective eva-
luation of software evolvability using code smells: An
empirical study, volume 11. Springer.
Mitchell, T. M. (1997). Machine learning. McGraw-
Hill series in computer science. McGraw-Hill, Boston
(Mass.), Burr Ridge (Ill.), Dubuque (Iowa).
Moha, N., Gueheneuc, Y.-G., Duchien, L., and a. F. Le
Meur (2010). DECOR: A Method for the Specifica-
tion and Detection of Code and Design Smells. IEEE
Transactions on Software Engineering, 36(1):20–36.
Oizumi, W., Garcia, A., da Silva Sousa, L., Cafeo, B., and
Zhao, Y. (2016). Code anomalies flock together: Ex-
ploring code anomaly agglomerations for locating de-
sign problems. In Proceedings of the 38th Internati-
onal Conference on Software Engineering, ICSE ’16,
pages 440–451, New York, NY, USA. ACM.
Rasool, G. and Arshad, Z. (2015). A review of code smell
mining techniques. Journal of Software: Evolution
and Process, pages n/a–n/a.
Santos, J. A. M., de Mendonça, M. G., and Silva, C. V. A.
(2013). An exploratory study to investigate the im-
pact of conceptualization in god class detection. In
Proceedings of the 17th International Conference on
Evaluation and Assessment in Software Engineering,
EASE ’13, pages 48–59, New York, NY, USA. ACM.
Schumacher, J., Zazworka, N., Shull, F., Seaman, C., and
Shaw, M. (2010). Building empirical support for au-
tomated code smell detection. Proceedings of the
2010 ACM-IEEE International Symposium on Empi-
rical Software Engineering and Measurement - ESEM
’10, page 1.
Witten, I. H. and Frank, E. (2005). Data Mining: Practi-
cal machine learning tools and techniques. Morgan
Kaufmann.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Reg-
nell, B., and Wesslén, A. (2000). Experimentation in
Software Engineering: An Introduction. Kluwer Aca-
demic Publishers, Norwell, MA, USA.
ICEIS 2017 - 19th International Conference on Enterprise Information Systems
482