Authors:
Viktor Csuvik
;
Dániel Horváth
and
László Vidács
Affiliation:
Department of Software Engineering, University of Szeged, Hungary
Keyword(s):
Automated Program Repair, Patch assessment, Overfitting patch, Code features, PCC
Abstract:
Automated Program Repair (APR) strives to minimize the expense associated with manual bug fixing by developing methods where patches are generated automatically and then validated against an oracle, such as a test suite. However, due to the potential imperfections in the oracle, patches validated by it may still be incorrect. A significant portion of the literature on APR focuses on this issue, usually referred to as Patch Correctness Check (PCC). Several approaches have been proposed that use a variety of information from the project under repair, such as diverse manually designed heuristics or learned embedding vectors. In this study, we explore various features obtained from previous studies and assess their effectiveness in identifying incorrect patches. We also evaluate the potential for accurately classifying correct patches by combining and selecting learned embeddings with engineered features, using various Machine Learning (ML) models. Our experiments demonstrate that not al
l features are equally important, and selecting the right ML model also has a huge impact on the overall performance. For instance, using all 490 features with a decision tree classifier achieves a mean F1 value of 64% in 10 independent trainings, while after an in-depth feature- and model selection with the selected 43 features, the MLP classifier produces a better performance of 81% F1. The empirical evaluation shows that this model is able to correctly classify samples on a dataset containing 903 labeled patches with 100% precision and 97% recall on it's peak, which is complementary performance compared to state-of-the-art methods. We also show that independent trainings can exhibit varying outcome, and propose how to improve the stability of model trainings.
(More)