Multi-Output Learning for Predicting Evaluation and Reopening of GitHub Pull Requests on Open-Source Projects

Peerachai Banyongrakkul, Suronapee Phoomvuthisarn

2023

Abstract

GitHub’s pull-based development model is widely used by software development teams to manage software complexity. Contributors create pull requests for merging changes into the main codebase, and integrators review these requests to maintain quality and stability. However, a high volume of pull requests can overwhelm integrators, causing feedback delays. Previous studies have built predictive models using traditional machine learning techniques with tabular data, but these may lose meaningful information. Additionally, relying solely on acceptance and latency predictions may not be sufficient for integrators. Reopened pull requests can add maintenance costs and burden already-busy developers. This paper proposes a novel multi-output deep learning-based approach that early predicts acceptance, latency, and reopening of pull requests, effectively handling various data sources, including tabular and textual data. Our approach also applies SMOTE and VAE techniques to address the highly imbalanced nature of the pull request reopening. We evaluate our approach on 143,886 pull requests from 54 open-source projects across four well-known programming languages. The experimental results show that our approach significantly outperforms the randomized baseline. Moreover, our approach improves accuracy by 8.68%, precision by 1.01%, recall by 11.49%, and F1-score by 6.77% in acceptance prediction, and MMAE by 6.07% in latency prediction, while improving balanced accuracy by 9.43%, AUC by 9.37%, and TPR by 30.07% in reopening prediction over the existing approach.

Download


Paper Citation


in Harvard Style

Banyongrakkul P. and Phoomvuthisarn S. (2023). Multi-Output Learning for Predicting Evaluation and Reopening of GitHub Pull Requests on Open-Source Projects. In Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT; ISBN 978-989-758-665-1, SciTePress, pages 163-174. DOI: 10.5220/0012125200003538


in Bibtex Style

@conference{icsoft23,
author={Peerachai Banyongrakkul and Suronapee Phoomvuthisarn},
title={Multi-Output Learning for Predicting Evaluation and Reopening of GitHub Pull Requests on Open-Source Projects},
booktitle={Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT},
year={2023},
pages={163-174},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012125200003538},
isbn={978-989-758-665-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT
TI - Multi-Output Learning for Predicting Evaluation and Reopening of GitHub Pull Requests on Open-Source Projects
SN - 978-989-758-665-1
AU - Banyongrakkul P.
AU - Phoomvuthisarn S.
PY - 2023
SP - 163
EP - 174
DO - 10.5220/0012125200003538
PB - SciTePress