Adopting Delta Maintainability Model for Just in Time Bug Prediction

Lerina Aversano

1 a

, Martina Iammarino

2 b

, Antonella Madau

4 c

, Debora Montano

3 d

and Chiara Verdone

4 e

Dept. of Agricultural Science, Food, Natural Resources and Engineering, University of Foggia, Foggia, Italy

Dept. of Informatics, University of Bari Aldo Moro, Bari, Italy

CeRICT scrl, Regional Center Information Communication Technology, Benevento, Italy

Dept. of Engineering, University of Sannio, Benevento, Italy

Keywords:

Just-In-Time Bug Prediction, Process Metrics, Pipeline.

Abstract:

A ﬂaw that leads to a software malfunction is called a bug. Preventing bugs from the beginning reduces the

need to address complex problems in later stages of development or after software release. Therefore, bug

prevention helps create more stable and robust code because bug-free software is easier to maintain, update,

and expand over time. In this regard, we propose a pipeline for the prevention of bugs in the source code,

consisting of a machine learning model capable of predicting just in time whether a new commit inserted into

the repository can be classiﬁed as ”good” or ”bad”. This is a critical issue as it directly affects the quality of

our code. The approach is based on a set of features containing process software metrics at the commit level,

some of which are related to the impact of changes. The proposed method was validated on data obtained from

three open-source systems, for which the entire evolutionary history was considered, focusing mainly on those

affected by bugs. The results are satisfactory and show not only the effectiveness of the proposed pipeline

capable of working in continuous integration but also the ability of the approach to work cross-project, thus

generalizing the results obtained.

1 INTRODUCTION

A bug is an error or defect in software that causes

it to function unexpectedly or undesirably (Ayewah

et al., 2007). Bugs can manifest themselves in various

ways, such as malfunctions, program crashes, calcu-

lation errors, or unexpected behavior. They can result

from programming errors, design errors, or even ex-

ternal factors such as unexpected system conditions.

Bug prevention is of extreme importance in the

software development process, not only because bugs

can compromise the reliability of the software, caus-

ing malfunctions that can lead to data loss, critical er-

rors, or even system crashes, but also because these

can be exploited by attackers to breach system secu-

rity. Additionally, ﬁxing bugs after the software has

https://orcid.org/0000-0003-2436-6835

https://orcid.org/0000-0001-8025-733X

https://orcid.org/0009-0003-2227-9778

https://orcid.org/0000-0002-5598-0822

https://orcid.org/0000-0003-1335-5276

been released can be expensive and time-consuming,

so preventing bugs during the development stages in

the ﬁrst place can help reduce costs and improve the

overall efﬁciency of the development process, and can

help create cleaner, more structured systems, making

developers’ jobs easier, in the long run, (Zhang et al.,

2012).

Furthermore, it must be considered that very of-

ten developers use collaborative development plat-

forms, such as Github, where the source code is man-

aged via Github. Therefore, developers collaborate on

shared repositories, and commits are used to record

changes made to the code during the development

cycle (Rodr

ıguez-P

erez et al., 2020). However, it

happens too often that aggressive development cycles

are adopted, too rapid development cycles in which

changes are made frequently. This approach responds

to user needs promptly, but also increases the risk of

introducing bugs into the process (Tan et al., 2014).

Therefore, code quality management involves con-

stantly monitoring and controlling changes made to

the source code to avoid the introduction of bugs.

Aversano, L., Iammarino, M., Madau, A., Montano, D. and Verdone, C.

Adopting Delta Maintainability Model for Just in Time Bug Prediction.

DOI: 10.5220/0012785100003753

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Conference on Software Technologies (ICSOFT 2024), pages 419-426

ISBN: 978-989-758-706-1; ISSN: 2184-2833

419

The problem of bugs introduced by commits in

repositories constitutes a problem of great importance

in the world of software development (Wen et al.,

2019; Marengo et al., 2018). This issue highlights

how even seemingly small or innocuous changes can

have a signiﬁcant impact on the stability and per-

formance of an application. Therefore, the problem

of bugs introduced by commits highlights the impor-

tance of developing bug prediction and prevention

systems capable of identifying risky changes before

they negatively affect development.

In this regard, this paper proposes a pipeline

whose primary goal is to prevent the introduction

of just-in-time bugs into the source code. This is

achieved by evaluating each new commit against a

baseline set to identify suspicious commits that may

contain errors or code quality issues. The toolchain

was therefore developed with the idea of improving

the quality of software in general by promptly iden-

tifying and correcting errors. In this regard, it helps

ensure that the software produced is more reliable,

stable, and meets the desired quality standards. At

the same time, it helps save resources and time and

prevents the introduction of bugs at an early stage of

development, thus reducing the possibility of solv-

ing more complex problems in later stages. On the

one hand, it provides developers and project man-

agers with data and information to make informed de-

cisions, on the other it offers key metrics and indica-

tors on the quality of the code and the effectiveness of

the development process. Speciﬁcally, the proposed

approach focuses on process metrics, some closely re-

lated to the state of the software system at a given in-

stant of time, and others related to the impact of the

change upon the introduction of a new commit.

The paper is structured as follows: Section 2 re-

ports the most related works, Section 3 details the ap-

proach followed for the development of the proposed

pipeline, and Section 4 reports the results of the ex-

periments carried out. Finally, the conclusions and

future work in Section 6.

2 RELATED WORKS

One development objective that is especially rele-

vant to high-assurance software systems is lowering

the number of software defects (Seliya et al., 2010).

Early identiﬁcation of defects and their characteristics

(Neelofar et al., 2012) could lead to rapid rectiﬁca-

tion of defects with a view to providing maintainable

software. The literature presents a variety of software

metrics. Early in the software development life cycle,

models can be created that could be used to anticipate

problematic modules or classes using these software

parameters and error data (Nagwani and Suri, 2023).

Despite their frequent use in the literature on de-

fect prediction, the authors in (Rahman and Devanbu,

2013) contend that process metrics are typically more

beneﬁcial for prediction than code metrics. They dis-

cover that code metrics are quite stable; they do not

change signiﬁcantly from release to release. This

causes stagnation in the prediction models, resulting

in the same ﬁles being forecasted as defective mul-

tiple times; however, these recurringly defective ﬁles

are comparatively less defect-dense.

In recent years, several studies have been con-

ducted that relate machine learning techniques to the

prevention and prediction of bugs (Malhotra, 2015).

The authors in (Osman et al., 2018) examine the

effects of wrapper feature selection techniques and

correlation-based feature selection techniques on ﬁve

prediction models and show how these models func-

tion both with and without feature selection to fore-

cast the number of bugs in ﬁve distinct open-source

Java software systems. The ﬁndings demonstrate that

while removing more than half of the features, wrap-

pers can increase prediction accuracy by up to 33%.

In (Song et al., 2011), the authors provide and as-

sess a basic framework for predicting software de-

fects that facilitates an evaluation of competing pre-

diction techniques. To showcase the effectiveness of

the suggested methodology, they employ both pub-

licly accessible software defect data sets and simula-

tion. The authors emphasize that it’s critical to select

distinct learning schemes for various data sets (i.e., no

scheme should predominate) and that even minor ad-

justments to evaluation procedures might drastically

alter results.

In (Osman, 2017), the authors build a bug detec-

tor by paying particular attention to null-related bugs

through empirical analysis. Additionally, they empir-

ically demonstrate how feature selection and hyper-

parameter optimizations raise the accuracy of predic-

tions. Next, by determining the most practical mix of

bug prediction setups, machine learning models, and

response labels, they optimize bug prediction to lo-

cate the largest number of bugs in the least amount of

code. They demonstrate how to create the most cost-

effective bug predictor by treating change metrics and

source code as dependent variables, performing fea-

ture selection on them, and then utilizing an optimised

random forest to forecast the number of defects.

In order to automatically generate feedback to de-

velopers, the authors in (Khanan et al., 2021) present

an explainable Just-In-Time defect prediction frame-

work. This framework provides the riskiness of each

commit, an explanation of why the commit is danger-

ICSOFT 2024 - 19th International Conference on Software Technologies

420

ous, and suggestions for risk mitigation. In order to

continually monitor and assess a stream of contribu-

tions in numerous GitHub repositories, the framework

is incorporated into the GitHub CI/CD pipeline as a

GitHub application.

The authors in (Wang et al., 2021) propose an

approach for conducting bug prediction in terms of

model construction, updating, and evaluation in real-

world continuous software development. The authors

suggest ConBuild for model development, which uses

the distributional properties of bug prediction data to

inform the choice of training versions. In order to fa-

cilitate the reuse or updating of bug prediction mod-

els in continuous software development, the authors

suggest ConUpdate, which makes use of the evolu-

tion of distributional properties of bug prediction data

between versions. The authors suggest ConEA for

model evaluation, which conducts effort-aware eval-

uation by making use of the evolution of the buggy

likelihood of ﬁles between versions.

3 APPROACH

The main objective of the study is to investigate the

performance of the proposed approach for continu-

ous just-in-time bug prediction in the context of open-

source software projects.

First of all, we focused our attention on the set

of metrics best suited to predicting bugs in commits.

To this end, we extract a series of commit-level pro-

cess metrics. To validate our approach and the chosen

subset of metrics, we generalize its use by adopting

transfer learning. Therefore, we train our model on a

set of commits belonging to a set of software systems

and test it on commits belonging to a different one.

The general architecture of the proposed pipeline

is shown in Figure 1, where it is possible to deduce

the main phases.

The main objective of the study is to investigate

the performance of the proposed approach for con-

tinuous just-in-time bug prediction in the context of

open-source software projects. First of all, we fo-

cused our attention on the set of metrics best suited

to predicting bugs in commits. To this end, we ex-

tract a series of commit-level process metrics. To val-

idate our approach and the chosen subset of metrics,

we generalize its use by adopting transfer learning.

Therefore, we train our model on a set of commits

belonging to a set of software systems and test it on

commits belonging to a different one.

The implemented process begins with the selec-

tion of repositories on GitHub and the subsequent ex-

traction of features. Speciﬁcally, the ﬁrst phase in-

volves on the one hand the collection of all the com-

mit logs which are subsequently subjected to the eval-

uation of the process metrics, and on the other the ex-

traction of all the information on the bugs. After data

integration, we proceed with the evaluation of the ma-

chine learning models. Based on the previous data

analysis steps, several classiﬁers are tested. Once the

model has been selected, training is carried out using

different datasets based on the different combinations

of previously extracted features. Finally, in the last

stage, the classiﬁcation model is used to make predic-

tions about future data regarding the presence of bugs.

In particular, we talk about just-in-time prediction be-

cause given a given commit, our model is capable of

predicting what the next one will be like based on the

metrics considered, even before it is published. More

details on the phases just described will be provided

in the following paragraphs.

3.1 Data Extraction

The proposed pipeline has as its ﬁrst step the ex-

traction of the data on which to conduct the experi-

ments. In detail, the process begins with the selection

of Github repositories with a long history in terms of

commits.

Therefore, three repositories are identiﬁed, Elas-

ticSearch, Guava, and RxJava, and for each of these

the entire history, commits, modiﬁed ﬁles, and related

metadata are extracted.

Next, each software system is subjected to bug

identiﬁcation. More speciﬁcally, the runSZZ algo-

rithm is used which, given a bug ﬁx commit, iden-

tiﬁes the commits that probably introduced the bug.

In essence, given the commit in which the ﬁxing oc-

curred as input, SZZ identiﬁes the latest commit to

each modiﬁed line of source code (

Sliwerski et al.,

2005). The algorithm is divided into two main com-

ponents, SSZNoIssueTracher and SZZ, which are

responsible for analyzing commits and identifying

bugs, respectively. This is the crucial step to gener-

ate the list of buggy commits. The output of the algo-

rithm consists of two JSON ﬁles, one that locates and

tracks bugs, and one that tracks commits responsible

for introducing bugs into the source code.

Finally, for each commit, we extracted a set of

metrics related to the process. We chose to consider

process metrics based on the results of the study (Rah-

man and Devanbu, 2013), in which the results high-

light that code metrics, despite widespread use in the

defect prediction literature, are generally less useful

than process metrics for prediction. Speciﬁcally in

the context of process metrics, we consider two sets

of metrics, the ﬁrst containing indicators strictly re-

Adopting Delta Maintainability Model for Just in Time Bug Prediction

421

Figure 1: Architecture of the Approach.

Table 1: Process Related Metrics.

Feature Description

Project Name repository name

Author name of the committer

Date

the date of the author’s commit

with the format ”YYYY-MM-DD”

Day day of the week the commit was made

Insertion number of lines added in the commi

Deletions number of lines removed in the commit

Total Lines total number of lines of code in the commit

Files added number of new ﬁles added in the commit

Fix attempt

a boolean value that suggests,

starting from the commit message,

whether it is a ﬁx attempt

lated to the process, whose metrics are shown in the

Table 1, where the name is shown in the ﬁrst column

and a brief description in the second.

The second set of metrics is described in the next

section.

3.2 Impact Related Metrics Calculation

Source code change impact metrics are tools used to

evaluate the extent and effects of a change made to the

software. These metrics provide crucial information

to developers and project managers, allowing them to

assess the complexity, scope, and possible risks asso-

ciated with a speciﬁc change.

In this regard, speciﬁc metrics were measured to

quantify and evaluate the impact of the software in ob-

jective terms. First of all, to evaluate the maintainabil-

ity implications of commits, we calculated the Delta

Maintainability Model (DMM)(di Biase et al., 2019).

The delta-maintainability metric is the percentage of

low-risk changes in a commit. Its value can ﬂuctu-

ate between 0 when the changes can be considered

risky and 1 when the changes are low risk. Reward

the improvement of methods and penalize the wors-

ening of things. The starting point of the DMM is a

risk proﬁle according to which methods are classiﬁed

into four bands: low, medium, high, and very high-

risk methods. Therefore, the risk proﬁle of a class is

then a 4-tuple representing the amount of code.

To transfer risk proﬁles from the ﬁle level to the

commit level, we consider delta risk proﬁles. These

are pairs (dl, dh), with dl representing the increase in

the low-risk code and dh the increase in the high-risk

code.

The DMM value is calculated as follows:

DMM =

goodchange

goodchange + badchange

(1)

Below are the other metrics calculated:

• Span of Changes: represents the number of ﬁles

modiﬁed per commit. The formula is as follows:

FILES(c) =

∑

1c ∼> f (2)

where c denotes changes and f denotes ﬁles, for a

change c and a ﬁle f, c∼>f indicates that change

c affects ﬁle f.

• History of Frequent Changes: indicates the sum

of changes c on ﬁle f since the latter was added.

The formula is as follows:

CHFG( f , I) =

∑

c∼> f

1DAT E(c)εI (3)

• Commit Maintainability (MC): a software mainte-

nance metric based solely on commit-level LOC

(Lines of Code). Used to evaluate and measure

the complexity and size of an application’s source

code, it represents the total number of lines of

source code present in committed ﬁles.

The formula is as follows:

MC =

LOC f oridCommit

Limitthreshold f or pro jectName

∗ 100

(4)

where LOC for idCommit is the sum of the num-

ber of lines of code of each ﬁle added or modiﬁed

ICSOFT 2024 - 19th International Conference on Software Technologies

422

for each idCommit, and Limit Threshold for pro-

jectName is a maximum value of desired or ac-

ceptable LOC for the commit, calculated as the

average of the LOCs for idComit for each project-

Name. A lower value of LOC for idCommit than

the maximum assigned value will lead to a lower

score, indicating higher maintainability, while a

value of LOC for idCommit that exceeds the max-

imum assigned value will lead to a higher score,

indicating lower maintainability.

3.3 Data Analysis

Dataset analysis is a crucial step in building machine

learning models, as the quality and relevance of the

data directly inﬂuence the performance of the models

themselves. In this regard, one of the main features of

our approach involves a different selection of features.

In particular, for each selected software system,

we consider 6 different versions of the dataset, each

of which is always characterized by the basic char-

acteristics with the addition of a single characteristic

relating to the quality of the code.

More speciﬁcally, version 1 (D1), in the set of im-

pact metrics, takes into consideration the total num-

ber of lines of code (Total LOC) as an additional

metric; version 2 (D2) commit maintainability; ver-

sion 3 (D3) is the metric that considers the maximum

of the History of Frequent Changes; version 4 (D4)

is the metric that considers the average of the His-

tory of Frequent Changes; version 5 (D5) the delta-

maintainability metric (DMM), and ﬁnally version 6

(D6) the Span of Changes metric.

In order to provide a visual representation of the

relationships between features in the different ver-

sions of the dataset considered for the experiments,

Figure 2 shows a Venn diagram. This diagram clearly

shows the shared characteristics between different

versions of the dataset and provides an intuitive pic-

ture of the overlapping information. In the diagram,

each circle represents a version of the speciﬁc dataset,

while the intersections highlight common character-

istics. The variety of information collected aims to

provide a complete picture of the dynamics of the

projects analyzed.

3.4 Classiﬁcation Task

For the classiﬁcation task, three models were evalu-

ated, all based on trees:

• Decision Tree (DTC): a machine learning model

that is based on a tree structure composed of de-

cision nodes and leaves. Each decision node rep-

resents a choice about a feature of the data, while

Figure 2: Venn Diagram.

the leaves contain the model’s predictions. Dur-

ing training, the tree divides based on the most in-

formative features, trying to create divisions that

maximize the purity of the leaves (minimizing en-

tropy). The Decision Tree is interpretable and can

handle both classiﬁcation and regression prob-

lems (Magee, 1964).

• Random Forest (RFC): a machine learning model

that exploits ensemble learning, combining mul-

tiple Decision Trees to improve precision and

generalization. During training, several Decision

Trees are created on random subsets of the train-

ing data and features. The ﬁnal predictions are

obtained by voting or averaging the predictions of

each tree. This approach reduces the risk of over-

ﬁtting and improves the robustness of the model

(Cutler et al., 2012).

• Extra Tree (EXTC): a variant of Random For-

est that goes further, using a more random strat-

egy in the creation of individual trees. When se-

lecting splits in nodes, it randomly chooses split

points rather than searching for the best possible

one. This makes the model more computation-

ally efﬁcient than traditional Random Forest. Al-

though the decision process is less interpretable

than standard Decision Trees, Extra Tree can of-

ten achieve competitive results with lower compu-

tational complexity (Geurts et al., 2006).

The classiﬁer training process involved the use

of a training dataset containing the extracted features

and also the calculated metrics. The model was itera-

tively trained, evaluated, and optimized to ensure high

performance.

To evaluate the predictive effectiveness of the

model, the confusion matrix was used, which pro-

vides an overview of the model’s performance, in-

dicating the number of correct and incorrect predic-

Adopting Delta Maintainability Model for Just in Time Bug Prediction

423

tions. Furthermore, to understand which variables

have a signiﬁcant impact on the model’s predictions,

the ”feature importance” was examined to understand

which feature can have the greatest impact on bug

prediction. Finally, precision and recall metrics were

adopted to evaluate the performance of the model in

predicting future behavior. These metrics provide an

in-depth understanding of the model’s ability to make

accurate predictions.

4 RESULTS

This section reports the results of the experiments

conducted to validate the proposed approach.

In particular, for each software system considered

we report a table with the results of the accuracy ob-

tained. Each row of the table refers to a speciﬁc classi-

ﬁer tested, while each column refers to the version of

the dataset taken into consideration for the analysis.

We chose accuracy as the validation metric because

accuracy involves the ability to get close to a speciﬁc

result. Our goal is to correctly identify the presence

of bugs in commits, so as to avoid them before the

software can be compromised.

Table 2 shows the accuracy performance of the

three classiﬁers for all the different sets of features

extracted from ElasticSearch project. As we can see,

all classiﬁers achieve very high levels of accuracy;

Extra Trees and Random Forest always obtain results

around 91% while Decision Tree maintains lower val-

ues around 83%. In particular, among the two best,

Extra Trees is the best performer for datasets D1, D4,

and D5, while Random Forest achieves better accu-

racy values for the remaining three datasets. The best

result ever for the ElasticSearch repository is obtained

using the D2 and Random Forest feature set, reaching

an accuracy value of 91.53%.

Table 2: Accuracy Results for ElasticSearch.

Classiﬁer D1 D2 D3 D4 D5 D6

DTC 0.8463 0.843 0.8373 0.8412 0.8381 0.843

RFC 0.9115 0.9153 0.9143 0.913 0.9094 0.9133

EXTC 0.9133 0.9107 0.9115 0.9122 0.9145 0.9102

The accuracy performance of the three classiﬁers

for each of the various feature sets taken from the

Guava project is displayed in Table 3. In this case, the

general trend is similar to the previous one: Random

Forest and Extra Tree are better than Decision Tree

even if the gap compared to the previous case short-

ens from a difference of 7 percentage points to around

3 points. All the classiﬁers perform an accuracy value

over 94%. In particular, Random Forest is the best

performer for datasets D1, D2, D4, D5, and D6 with

an accuracy of 97.67%, while Extra Tree achieves bet-

ter accuracy values for the remaining dataset.

Table 3: Accuracy Results for Guava.

Classiﬁer D1 D2 D3 D4 D5 D6

DTC 0.9488 0.9488 0.9442 0.9488 0.9535 0.9628

RFC 0.9767 0.9767 0.9721 0.9767 0.9767 0.9767

EXTC 0.9721 0.9721 0.9721 0.9674 0.9767 0.9721

Table 4 shows the accuracy performance of the

three classiﬁers for each of the different feature sets

extracted from the Rxjava repository. In this case

the situation is different compared to the previous

cases, as we can see, Random Forest conﬁrms itself

as the best classiﬁer in each set of features consid-

ered, reaching 91.67% for the ﬁrst 5 datasets. Deci-

sion Tree and Extra Tree performance are on the same

level for each dataset except in D3 where Extra Tree

equals Random Forest in accuracy.

Table 4: Accuracy Results for Rxjava.

Classiﬁer D1 D2 D3 D4 D5 D6

DTC 0.875 0.875 0.9583 0.875 0.833 0.8333

RFC 0.9167 0.9167 0.9167 0.9167 0.9167 0.875

EXTC 0.8333 0.875 0.9167 0.8333 0.8333 0.8333

Further experiments are related to transfer learn-

ing of the model trained on the data extracted from

the ElasticSearch repository. In particular, the Guava

repository has been used as a test set for this experi-

ment. Obtained results are reported in Table 5. The

ﬁrst consideration to make is that the experiment was

successful as all the classiﬁers considered had at least

performances comparable to the previous cases, those

in which the experiment was limited to data extracted

from only one repository at a time. This means that

the results obtained can be generalized and that taking

process and impact metrics into account is important

in bug prediction.

The extra Trees classiﬁer reaches an optimal ac-

curacy value of 97.35% for dataset D4, and it results

as the best performer overall. Random Forest reaches

comparable performance even if they are a little lower

than Extra Trees’ accuracy values. While Decision

Tree is the worst in this case although it still reaches

90% accuracy for the D1 dataset.

Table 5: Accuracy Results using Transfer Learning.

Classiﬁer D1 D2 D3 D4 D5 D6

DTC 0.9008 0.8534 0.8547 0.8394 0.8715 0.8324

RFC 0.9595 0.9595 0.9665 0.9651 0.9637 0.9606

EXTC 0.9679 0.9665 0.9707 0.9735 0.9707 0.9721

ICSOFT 2024 - 19th International Conference on Software Technologies

424

5 THREATS TO VALIDITY

In this section, we discuss threats to the validity of the

study.

Construct validity: The validity of our study is

threatened by the validity of the source code measur-

ing method, which we used. We employ the runSZZ

method in this sense, which is widely utilized in other

research and is accessible to the public.

Internal validity: Elements that may sway our ob-

servations pose a threat to internal validity. Specif-

ically, whether the measurements are sufﬁcient and

whether the parameters are relevant to our ﬁndings.

A meticulous data-gathering procedure was used for

this.

External validity: Our ﬁndings’ capacity to be

broadly applied poses a danger to external validity.

While we looked at three popular open-source sys-

tems with different sizes, domains, time frames, and

commit counts, we are conscious that more empirical

validation on commercial systems would be helpful

to strengthen our conclusions. The type of reported

ﬂaws in commercial systems is different from those

of open source systems. Because the tools we utilized

are limited to Java applications, we were only able to

consider systems written in Java. This is another con-

straint of our work. As a result, we are unable to make

generalizations about projects from industrial settings

or systems built in other languages.

6 CONCLUSIONS

A bug indicates a fault that causes the software to mal-

function and is usually attributable to code errors. Not

all bugs are visible, in fact, some errors, due to the de-

velopment of the source code and rarely to the com-

piler, are imperceptible, in the sense that they do not

affect the functionality of the software.

At the same time, software evolution and mainte-

nance are important and ongoing processes that likely

result in the introduction of new bugs. Since it can re-

duce resource waste and aid in decision making, there

is therefore growing interest in evaluating and predict-

ing the time and money needed to ﬁx bugs. Bug pre-

vention is therefore of fundamental importance, and

in this regard, this document proposes a pipeline for

just-in-time bug identiﬁcation.

Speciﬁcally, the document focuses on a double ob-

jective, on the one hand identifying the set of pro-

cess metrics most suitable for bug detection, and on

the other evaluating whether the proposed approach

is also valid cross-project, thanks to transher learning.

The approach has been validated on three open-source

software systems and the results are very satisfactory.

Finally, to assess the effectiveness of our paradigm

in practice, we intend to carry out a controlled study

involving practitioners. This would enable defect pre-

diction to be more practically used and to assist with

real-time development tasks, including code writing

and/or code reviews.

REFERENCES

Ayewah, N., Pugh, W., Morgenthaler, J. D., Penix, J.,

and Zhou, Y. (2007). Evaluating static analysis de-

fect warnings on production software. In Proceed-

ings of the 7th ACM SIGPLAN-SIGSOFT workshop

on Program analysis for software tools and engineer-

ing, pages 1–8.

Cutler, A., Cutler, D. R., and Stevens, J. R. (2012). Random

forests. Ensemble machine learning: Methods and

applications, pages 157–175.

di Biase, M., Rastogi, A., Bruntink, M., and van Deursen,

A. (2019). The delta maintainability model: Measur-

ing maintainability of ﬁne-grained code changes. In

2019 IEEE/ACM International Conference on Techni-

cal Debt (TechDebt), pages 113–122. IEEE.

Geurts, P., Ernst, D., and Wehenkel, L. (2006). Extremely

randomized trees. Machine learning, 63:3–42.

Khanan, C., Luewichana, W., Pruktharathikoon, K.,

Jiarpakdee, J., Tantithamthavorn, C., Choetkiertikul,

M., Ragkhitwetsagul, C., and Sunetnanta, T. (2021).

Jitbot: An explainable just-in-time defect prediction

bot. In Proceedings of the 35th IEEE/ACM Interna-

tional Conference on Automated Software Engineer-

ing, ASE ’20, page 1336–1339, New York, NY, USA.

Association for Computing Machinery.

Magee, J. F. (1964). Decision trees for decision making.

Harvard Business Review Brighton, MA, USA.

Malhotra, R. (2015). A systematic review of machine learn-

ing techniques for software fault prediction. Applied

Soft Computing, 27:504–518.

Marengo, A., Pagano, A., and Ladisa, L. (2018). Towards

a mobile augmented reality prototype for corporate

training: A new perspective. page 129 – 135. Cited

by: 8.

Nagwani, N. K. and Suri, J. S. (2023). An artiﬁcial intel-

ligence framework on software bug triaging, techno-

logical evolution, and future challenges: A review. In-

ternational Journal of Information Management Data

Insights, 3(1):100153.

Neelofar, Javed, M. Y., and Mohsin, H. (2012). An au-

tomated approach for software bug classiﬁcation. In

2012 Sixth International Conference on Complex, In-

telligent, and Software Intensive Systems, pages 414–

419.

Osman, H. (2017). Empirically-Grounded Construction of

Bug Prediction and Detection Tools. PhD thesis, Uni-

versit

at Bern.

Adopting Delta Maintainability Model for Just in Time Bug Prediction

425

Osman, H., Ghafari, M., and Nierstrasz, O. (2018). The

impact of feature selection on predicting the number

of bugs.

Rahman, F. and Devanbu, P. (2013). How, and why, process

metrics are better. In 2013 35th International Confer-

ence on Software Engineering (ICSE), pages 432–441.

Rodr

ıguez-P

erez, G., Robles, G., Serebrenik, A., Zaidman,

A., Germ

an, D. M., and Gonzalez-Barahona, J. M.

(2020). How bugs are born: a model to identify how

bugs are introduced in software components. Empiri-

cal Software Engineering, 25:1294–1340.

Seliya, N., Khoshgoftaar, T. M., and Hulse, J. V. (2010).

Predicting faults in high assurance software. 2010

IEEE 12th International Symposium on High Assur-

ance Systems Engineering, pages 26–34.

Sliwerski, J., Zimmermann, T., and Zeller, A. (2005). When

do changes induce ﬁxes? ACM sigsoft software engi-

neering notes, 30(4):1–5.

Song, Q., Jia, Z., Shepperd, M., Ying, S., and Liu, J.

(2011). A general software defect-proneness predic-

tion framework. IEEE Transactions on Software En-

gineering, 37(3):356–370.

Tan, L., Liu, C., Li, Z., Wang, X., Zhou, Y., and Zhai, C.

(2014). Bug characteristics in open source software.

Empirical software engineering, 19:1665–1705.

Wang, S., Wang, J., Nam, J., and Nagappan, N. (2021).

Continuous software bug prediction. In Proceedings

of the 15th ACM / IEEE International Symposium

on Empirical Software Engineering and Measurement

(ESEM), ESEM ’21, New York, NY, USA. Associa-

tion for Computing Machinery.

Wen, M., Wu, R., Liu, Y., Tian, Y., Xie, X., Cheung, S.-C.,

and Su, Z. (2019). Exploring and exploiting the cor-

relations between bug-inducing and bug-ﬁxing com-

mits. In Proceedings of the 2019 27th ACM Joint

Meeting on European Software Engineering Confer-

ence and Symposium on the Foundations of Software

Engineering, pages 326–337.

Zhang, F., Khomh, F., Zou, Y., and Hassan, A. E. (2012).

An empirical study on factors impacting bug ﬁxing

time. In 2012 19th Working conference on reverse en-

gineering, pages 225–234. IEEE.

ICSOFT 2024 - 19th International Conference on Software Technologies

426