Exploiting Relations, Sojourn-Times, and Joint Conditional Probabilities for Automated Commit Classification

Sebastian Hönel

2023

Abstract

The automatic classification of commits can be exploited for numerous applications, such as fault prediction, or determining maintenance activities. Additional properties, such as parent-child relations or sojourn-times between commits, were not previously considered for this task. However, such data cannot be leveraged well using traditional machine learning models, such as Random forests. Suitable models are, e.g., Conditional Random Fields or recurrent neural networks. We reason about the Markovian nature of the problem and propose models to address it. The first model is a generalized dependent mixture model, facilitating the Forward algorithm for 1st- and 2nd-order processes, using maximum likelihood estimation. We then propose a second, non-parametric model, that uses Bayesian segmentation and kernel density estimation, which can be effortlessly adapted to work with nth-order processes. Using an existing dataset with labeled commits as ground truth, we extend this dataset with relations between and sojourn-times of commits, by re-engineering the labeling rules first and meeting a high agreement between labelers. We show the strengths and weaknesses of either kind of model and demonstrate their ability to outperform the state-of-the-art in automated commit classification.

Download


Paper Citation


in Harvard Style

Hönel S. (2023). Exploiting Relations, Sojourn-Times, and Joint Conditional Probabilities for Automated Commit Classification. In Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT; ISBN 978-989-758-665-1, SciTePress, pages 323-331. DOI: 10.5220/0012077300003538


in Bibtex Style

@conference{icsoft23,
author={Sebastian Hönel},
title={Exploiting Relations, Sojourn-Times, and Joint Conditional Probabilities for Automated Commit Classification},
booktitle={Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT},
year={2023},
pages={323-331},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012077300003538},
isbn={978-989-758-665-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Conference on Software Technologies - Volume 1: ICSOFT
TI - Exploiting Relations, Sojourn-Times, and Joint Conditional Probabilities for Automated Commit Classification
SN - 978-989-758-665-1
AU - Hönel S.
PY - 2023
SP - 323
EP - 331
DO - 10.5220/0012077300003538
PB - SciTePress