loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Anamaria Briciu ; Mihaiela Lupea ; Gabriela Czibula and Istvan Gergely Czibula

Affiliation: Department of Computer Science, Babeş-Bolyai University, Cluj-Napoca, Romania

Keyword(s): Software Defect Prediction, Machine Learning, Semantic Features, BERT-Based Models, doc2vec, Source Code, Comments.

Abstract: The present study belongs to the new research direction that aims to improve software defect prediction by using additional knowledge such as source code comments. The fusion of programming language features learned from the code and natural language features extracted from the code comments is the proposed semantic representation of a source code. Two types of language models are applied to learn the semantic features: (1) the pre-trained models CodeBERT and RoBERTa for code embedding and textual embedding; (2) doc2vec model used for both, code embedding and comments embedding. These two semantic representations, in two combinations (only code features and code features fused with comment features), are used separately with the XGBoost classifier in the experiments conducted on the Calcite dataset. The results show that the addition of the natural language features from the comments increases the software defect prediction performance.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.224.30.9

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Briciu, A., Lupea, M., Czibula, G. and Gergely Czibula, I. (2024). Enriching the Semantic Representation of the Source Code with Natural Language-Based Features from Comments for Improving the Performance of Software Defect Prediction. In Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering - ENASE; ISBN 978-989-758-696-5; ISSN 2184-4895, SciTePress, pages 132-143. DOI: 10.5220/0012688400003687

@conference{enase24,
author={Anamaria Briciu and Mihaiela Lupea and Gabriela Czibula and Istvan {Gergely Czibula}},
title={Enriching the Semantic Representation of the Source Code with Natural Language-Based Features from Comments for Improving the Performance of Software Defect Prediction},
booktitle={Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering - ENASE},
year={2024},
pages={132-143},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012688400003687},
isbn={978-989-758-696-5},
issn={2184-4895},
}

TY - CONF

JO - Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering - ENASE
TI - Enriching the Semantic Representation of the Source Code with Natural Language-Based Features from Comments for Improving the Performance of Software Defect Prediction
SN - 978-989-758-696-5
IS - 2184-4895
AU - Briciu, A.
AU - Lupea, M.
AU - Czibula, G.
AU - Gergely Czibula, I.
PY - 2024
SP - 132
EP - 143
DO - 10.5220/0012688400003687
PB - SciTePress