Source-Code Embedding-Based Software Defect Prediction

Diana-Lucia Miholca, Zsuzsanna Oneţ-Marian

2023

Abstract

Software defect prediction is an essential software development activity, a highly researched topic and yet a still difficult problem. One of the difficulties is that the most prevalent software metrics are insufficiently relevant for predicting defects. In this paper we are proposing the use of Graph2Vec embeddings unsupervisedly learnt from the source code as basis for prediction of defects. The reliability of the Graph2Vec embeddings is compared to that of the alternative embeddings based on Doc2Vec and LSI through a study performed on 16 versions of Calcite and using three classification models: FastAI, as a deep learning model, Multilayer Perceptron, as an untuned conventional model, and Random Forests with hyperparameter tuning, as a tuned conventional model. The experimental results suggest a complementarity of the Graph2Vec, Doc2Vec and LSI-based embeddings, their combination leading to the best performance for most software versions. When comparing the three classifiers, the empirical results highlight the superiority of the tuned Random Forests over FastAI and Multilayer Perceptron, which confirms the power of hyperparameter optimization.

Download


Paper Citation


in Harvard Style

Miholca D. and Oneţ-Marian Z. (2023). Source-Code Embedding-Based Software Defect Prediction. In Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT; ISBN 978-989-758-665-1, SciTePress, pages 185-196. DOI: 10.5220/0012129600003538


in Bibtex Style

@conference{icsoft23,
author={Diana-Lucia Miholca and Zsuzsanna Oneţ-Marian},
title={Source-Code Embedding-Based Software Defect Prediction},
booktitle={Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT},
year={2023},
pages={185-196},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012129600003538},
isbn={978-989-758-665-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT
TI - Source-Code Embedding-Based Software Defect Prediction
SN - 978-989-758-665-1
AU - Miholca D.
AU - Oneţ-Marian Z.
PY - 2023
SP - 185
EP - 196
DO - 10.5220/0012129600003538
PB - SciTePress