the presence of bugs in the code. Despite the fact that
datasets with bugs were not used during training the
index (zero-shot learning), experiments have shown
that the value of the index usually decreases when the
bug is fixed. Thus, the evaluation shows the potential
of the index.
Computations were performed on the Uran super-
computer at the IMM UB RAS.
REFERENCES
Afric, P., Sikic, L., Kurdija, A. S., and Silic, M.
(2020). REPD: Source code defect prediction as
anomaly detection. Journal of Systems and Software,
168:110641.
Akimova, E. N., Bersenev, A. Y., Deikov, A. A., Kobylkin,
K. S., Konygin, A. V., Mezentsev, I. P., and Misilov,
V. E. (2021a). A survey on software defect prediction
using deep learning. Mathematics, 9(11).
Akimova, E. N., Bersenev, A. Y., Deikov, A. A., Kobylkin,
K. S., Konygin, A. V., Mezentsev, I. P., and Misilov,
V. l. E. (2021b). PyTraceBugs: A large python code
dataset for software defect prediction. In Proceedings
of the 28th Asia-Pacific Software Engineering Confer-
ence, pages 229–239.
Allamanis, M., Barr, E. T., Devanbu, P., and Sutton, C.
(2018). A survey of machine learning for big code
and naturalness. ACM Comput. Surv., 51(4).
Allamanis, M., Jackson-Flux, H., and Brockschmidt, M.
(2021). Self-supervised bug detection and repair.
CoRR, abs/2105.12787.
Alsawalqah, H., Faris, H., Aljarah, I., Alnemer, L., and
Alhindawi, N. (2017). Hybrid smote-ensemble ap-
proach for software defect prediction. In Silhavy,
R., Silhavy, P., Prokopova, Z., Senkerik, R., and
Kominkova Oplatkova, Z., editors, Software Engi-
neering Trends and Techniques in Intelligent Systems,
pages 355–366.
Bryksin, T., Petukhov, V., Alexin, I., Prikhodko, S., Shpil-
man, A., Kovalenko, V., and Povarov, N. (2020). Us-
ing large-scale anomaly detection on code to improve
kotlin compiler. MSR ’20, pages 455–465, New York.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2019). BERT: Pre-training of deep bidirectional
transformers for language understanding. In Proceed-
ings of the 2019 Conference of the North American
Chapter of the Association for Computational Lin-
guistics, pages 4171–4186, Minneapolis, Minnesota.
Feng, H., Kolesnikov, O., Fogla, P., Lee, W., and Gong,
W. (2003). Anomaly detection using call stack infor-
mation. In 2003 Symposium on Security and Privacy,
2003., pages 62–75.
Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S.,
Zhou, L., Duan, N., Svyatkovskiy, A., Fu, S., Tu-
fano, M., Deng, S. K., Clement, C. B., Drain, D., Sun-
daresan, N., Yin, J., Jiang, D., and Zhou, M. (2021).
GraphCodeBERT: Pre-training code representations
with data flow. In ICLR 2021.
Kingma, D. P. and Welling, M. (2014). Auto-encoding
variational bayes. In 2nd International Conference
on Learning Representations, ICLR 2014, Banff, AB,
Canada, April 14–16, 2014, Conference Track Pro-
ceedings.
Le, V. and Zhang, H. (2021). Log-based anomaly detection
without log parsing. CoRR, abs/2108.01955.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen,
D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoy-
anov, V. (2019). RoBERTa: A robustly optimized
bert pretraining approach. arXiv e-prints, page
arXiv:1907.11692.
Moshtari, S., Santos, J. C., Mirakhorli, M., and Okutan, A.
(2020). Looking for software defects? First find the
nonconformists. In SCAM 2020, pages 75–86.
Neela, K. N., Ali, S. A., Ami, A. S., and Gias, A. U. (2017).
Modeling software defects as anomalies: A case study
on promise repository. JSW, 12(10):759–772.
Nu
˜
nez-Varela, A. S., P
´
erez-Gonzalez, H. G., Mart
´
ınez-
Perez, F. E., and Soubervielle-Montalvo, C. (2017).
Source code metrics: A systematic mapping study.
Journal of Systems and Software, 128:164–197.
Ray, B., Hellendoorn, V., Godhane, S., Tu, Z., Bacchelli,
A., and Devanbu, P. (2016). On the ”naturalness” of
buggy code. ICSE ’16, pages 428–439.
Raychev, V., Bielik, P., and Vechev, M. (2016). Probabilistic
model for code with decision trees. SIGPLAN Not.,
51(10):731–747.
Rezende, D. J., Mohamed, S., and Wierstra, D. (2014).
Stochastic backpropagation and approximate infer-
ence in deep generative models. In ICML 2014, pages
1278–1286.
Sakurada, M. and Yairi, T. (2014). Anomaly detection
using autoencoders with nonlinear dimensionality re-
duction. In MLSDA’14.
Sharma, T. and Spinellis, D. D. (2020). Do we need im-
proved code quality metrics? ArXiv, abs/2012.12324.
Tong, H., Liu, B., and Wang, S. (2018). Software defect
prediction using stacked denoising autoencoders and
two-stage ensemble learning. Information and Soft-
ware Technology, 96:94–111.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2017). Attention is all you need. NIPS’17, pages
6000–6010.
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and
Manzagol, P.-A. (2010). Stacked denoising autoen-
coders: Learning useful representations in a deep net-
work with a local denoising criterion. J. Mach. Learn.
Res., 11:3371–3408.
Widyasari, R., Sim, S. Q., Lok, C., Qi, H., Phan, J., Tay, Q.,
Tan, C., Wee, F., Tan, J. E., Yieh, Y., Goh, B., Thung,
F., Kang, H. J., Hoang, T., Lo, D., and Ouh, E. L.
(2020). BugsInPy: A database of existing bugs in
python programs to enable controlled testing and de-
bugging studies. ESEC/FSE 2020, pages 1556–1560.
ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering
266