Authors:
E. N. Akimova
1
;
2
;
A. V. Bersenev
1
;
2
;
A. Yu. Cheshkov
3
;
A. Yu. Deikov
1
;
2
;
K. S. Kobylkin
1
;
2
;
A. V. Konygin
1
;
I. P. Mezentsev
1
;
2
and
V. E. Misilov
1
;
2
Affiliations:
1
N. N. Krasovskii Institute of Mathematics and Mechanics, Ekaterinburg, Russian Federation
;
2
Ural Federal University, Ekaterinburg, Russian Federation
;
3
Huawei Russian Research Institute, Moscow, Russian Federation
Keyword(s):
Anomaly Detection, Code Quality, Defect Prediction.
Abstract:
The software development community has been using handcrafted code quality metrics for a long time. Despite their widespread use, these metrics have a number of known shortcomings. The metrics do not take into account project-specific coding conventions, the wisdom of the crowd, etc. To address these issues, we propose a novel semantic-based approach to calculating an anomaly index for the source code. This index called A-INDEX is the output of a model trained in unsupervised mode on a source code corpus. The larger the index value, the more atypical the code fragment is. To test A-INDEX we use it to find anomalous code fragments in Python repositories. We also apply the index for a variant of the source code defect prediction problem. Using BugsInPy and PyTraceBugs datasets, we investigate how A-INDEX changes when the bug is fixed. The experiments show that in 63% of cases, the index decreases when the bug is fixed. If one keeps only those code fragments for which the index changes
significantly, then in 71% of cases the index decreases when the bug is fixed.
(More)