Investigating Bug Report Changes in Bugzilla

Felipe Emerson de Oliveira Calixto, Franklin Ramalho, Tiago Massoni and Jos

e Manoel Ferreira

Departamento de Sistemas e Computac¸

ao, Universidade Federal de Campina Grande, Campina Grande, Para

ıba, Brazil

Keywords:

Bug Report, Tracking Bug Report Systems, Bugzilla.

Abstract:

Bug report change behavior in bug tracking systems may help pinpoint negligence or misunderstanding when

submitters ﬁll in bug report information. This study investigates bug report changes in several projects within

Mozilla’s Bugzilla to identify which ﬁelds in a report change the most, which bug proﬁles receive more

changes and the relationship between these changes. We found that the most changed ﬁelds are ﬂagtypes.name

and cc. Reports are often modiﬁed when they indicate a valid bug, with medium to high priority and severity.

Moreover, there are moderate to high correlations between changes in the following ﬁeld pairs: product-

component, priority-severity, and platform-op sys. We believe these results are relevant to indicate which

submitter’s skills must be enhanced to improve the bug-tracking process.

1 INTRODUCTION

Users play an essential role in identifying bugs dur-

ing a software lifecycle, as a system is not free of

failures, even after its initial release. Typically, sys-

tems with a large number of users receive many bug

reports daily. For example, Mozilla receives around

307 new reports daily (Fan et al., 2020). For tracking

and monitoring the status of reported bugs, there are

speciﬁc systems such as Bugzilla and JIRA, among

others. Such systems allow the submitter, either the

developer or end user, to create and track reports.

These reports may describe requests for new fea-

tures/improvements, but most of them are for bugs

(Valdivia Garcia and Shihab, 2014).

For developers to ﬁnd and ﬁx the bug, the most

helpful report contains valuable information usually

described in the ﬁelds, such as affected product, af-

fected component, priority, severity, bug classiﬁca-

tion and bug type, among others. Nevertheless, it is

hard to guarantee that reporters will provide all the

needed information; a few studies have found that rel-

evant ﬁelds are often neglected or incorrectly ﬁlled

(Bettenburg et al., 2007; Bettenburg et al., 2008). In-

complete reports can be due to a lack of knowledge or

attention in the case of end users.

For example, Zimmerman et al. (Bettenburg et al.,

2008) investigated the overall quality of bug reports.

In one of the stages of their study, they conducted

a survey focused on developers and reporters, seek-

ing to identify, among other things, which features (i)

have been previously reported, (ii) are the most difﬁ-

cult for the reporter to provide, and (iii) are considered

most valuable by developers.

As a result, they identiﬁed a contrast between

what developers consider most useful and what the

authors provide, suggesting that this may be related to

the difﬁculty of providing speciﬁc information. Bug

report ﬁelds may not be ﬁlled or correctly informed

because the person who ﬁlls a report is often an end

user of the system who may need more technical

knowledge regarding the various characteristics of a

bug. Consequently, developers’ understanding of the

report and the time until the bug is resolved may be

affected. Soltani (Soltani et al., 2020) found that the

lack of crucial ﬁelds like steps-to-reproduce and stack

traces can impact the bug report resolution time by up

to 70%.

The bug-tracking process is subject to constant

changes during the lifecycle. We found that bug re-

ports have an average of 14.29 changes across their

lifecycle. A bug report’s life goes through numerous

updates, from status changes (when the bug is con-

ﬁrmed or ﬁxed, for example) to changes to correct in-

formation, such as the affected product. Other usual

changes could happen to identify speciﬁc aspects of

a bug. For instance, when a bug is blocking (block-

ing bug) the correction of others (blocked bug), when

a bug is identiﬁed and validated (valid bug), or the

bug described in the report is invalid (invalid bug).

Also, when a bug report addresses a previously re-

ported bug (duplicate bug) (Valdivia Garcia and Shi-

Calixto, F., Ramalho, F., Massoni, T. and Ferreira, J.

Investigating Bug Report Changes in Bugzilla.

DOI: 10.5220/0011847600003467

In Proceedings of the 25th International Conference on Enterprise Information Systems (ICEIS 2023) - Volume 2, pages 55-64

ISBN: 978-989-758-648-4; ISSN: 2184-4992

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

hab, 2014; Erfani Joorabchi et al., 2014; Rocha et al.,

2016). Poorly reported bugs may affect those status

changes, leading to rework and unnecessary cycles.

Then, it is relevant to understand how these changes

are related to the bug report ﬁelds often neglected or

misﬁled.

In this paper, we used Mozilla’s Bugzilla histor-

ical data to investigate how report ﬁelds change over

time, as they may indicate which ﬁelds are more likely

to have inaccurate or insufﬁcient information.

Furthermore, it helps us understand why some bug

reports are changed more often.

We also investigate whether there is any relation-

ship between ﬁeld changes by analyzing the relation-

ships between changes in pairs of ﬁelds. We consider

the following research questions: (RQ1) Which ﬁelds

change the most? (RQ2) What is the proﬁle of the

most changed bugs? (RQ3) Are there relationships

between ﬁeld changes in bug reports?

Bugzilla is one of the leading open-source soft-

ware used/built for bug tracking and monitoring ac-

tivities; it is used by companies/projects like Mozilla,

RedHat, and Eclipse, among others. Brieﬂy, users re-

port bugs; these are triaged and assigned to a devel-

oper to ﬁx them.

We chose Mozilla’s Bugzilla as our dataset due

to its availability and use in previous studies (Fan

et al., 2020; Bettenburg et al., 2008; Valdivia Garcia

and Shihab, 2014; Hooimeijer and Weimer, 2007; Er-

fani Joorabchi et al., 2014). The dataset has 690,817

bug reports, all of which are RESOLVED, and could

be from any of Mozilla’s projects. We identiﬁed

changes in a total of 617 ﬁelds across the dataset.

Our study revealed that changes typically occur in

two types of ﬁelds: custom (ﬁelds whose name starts

with cf and are created by Bugzilla administrators)

or non-custom (ﬁelds that exist by default), with the

latter undergoing most changes, with an average of

12.24 (vs. 2.04) changes during the bug life cycle.

Also, on average, the ﬁelds that change the most

are cc (users registered to follow up on a given report)

and ﬂagtypes.name (used to request information from

a user). The bug reports with the most changes are

related to valid bugs (with FIXED resolution), with

medium or high priority (P1, P2, or P3), and medium

or high severity (S1, S2, S3, blocker, critical, or ma-

jor). Some products, such as Infrastructure & Opera-

tions, have a lower mean of changes than others.

When a reporter makes a single update to a bug

report, several ﬁelds can also be changed. We thus

explored the occurrence of changes in pairs of ﬁelds

in the same update, that is, the number of changes oc-

curring for two ﬁelds in the same update, through cor-

relation. As a result, we could check whether a ﬁeld

tends to have more modiﬁcations alone or in pairs

with another. We found evidence that there are pairs

of ﬁelds with a median-strong correlation between

their change occurrences, such as: between platform

(0.94) and op sys (0.86), between product (0.64) and

component (0.69), and between priority (0.28) and

severity (0.61).

The main contributions of this work are: (i) char-

acterizing changes in bug reports, including an in-

depth exploration of ﬁelds not extensively studied in

prior research; (ii) a sample dataset that has already

been ﬁltered, a selection of features, and accompa-

nying scripts; (iii) results that may help other stud-

ies better estimate the use of change features in their

models; and (iv) an initial investigation into the si-

multaneous correlation of changes between pairs of

ﬁelds, which can be used to develop tools that indi-

cate which other ﬁeld(s) may tend to change in con-

junction with a given ﬁeld, thus resulting in more in-

formative bug reports. Helping reporters to complete

bug reports more comprehensively can facilitate de-

velopers in ﬁxing the bug. Overall, this work aims

to support better bug reporting practices and assist in

improving bug resolution outcomes.

We organized this document as follows. Section

2 discusses some concepts related to the Bugzilla

dataset. Section 3 discusses related work. Section

4 presents the methodology of this study. Section 5

presents the results achieved. Section 6 exposes po-

tential threats to research validity. Finally, Section 7

presents the conclusions and possible lines of research

for future work.

2 BACKGROUND

In this section, we detail the deﬁnition of the Bugzilla

ﬁelds addressed in this work and how Bugzilla struc-

tures the changes history of a bug report. Anvik (An-

vik et al., 2005) presents a general context of how an

open bug repository works.

2.1 Bug Report Fields in Bugzilla

Due to Bugzilla’s large number of ﬁelds, we se-

lected some of the most studied ﬁelds (Fan et al.,

2020; Bettenburg et al., 2008; Valdivia Garcia and

Shihab, 2014; Hooimeijer and Weimer, 2007; Er-

fani Joorabchi et al., 2014; Rocha et al., 2016; Gupta

and Sureka, 2014). Below, we have the name, de-

scription, and possible values of the main ﬁelds cov-

ered in this study:

• id: numerical ﬁeld used to identify a single bug

report uniquely;

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

• history: all change records (presented in more de-

tail in subsection 2.2);

• resolution: current resolution status of a re-

port. It can take one of the following val-

ues: FIXED, INVALID, INCOMPLETE, DUPLI-

CATE, WORKSFORME, WONTFIX, INACTIVE,

or MOVED. When a report has been conﬁrmed as

a bug and ﬁxed, the developer will set the reso-

lution to FIXED. When a report describes a bug

that has already been reported earlier, the report is

resolved as DUPLICATE. The rest of the values

refer to bugs that are not reproducible;

• product: the product affected by the bug. In

Mozilla, there are, for example, the Firefox and

Thunderbird products;

• component: indicates the component affected by

the bug. Each component belongs to a speciﬁc

product, and a product can have multiple compo-

nents. For example, the Firefox product has the

Menus component, and the Core product has the

JavaScript Engine component;

• priority: deﬁnes how important a bug is to being

ﬁxed compared to others. The priority has values

ranging from P1 to P5, where P1 refers to a bug

with maximum priority and P5 to a bug with very

low priority;

• severity: describes the severity of a bug. Cur-

rently, Mozilla’s Bugzilla has two severity scales.

The ﬁrst scale has values from S1 to S4, which go

from catastrophic to trivial severity. In addition,

severity may have the N/A value for reports where

a severity classiﬁcation does not apply, for exam-

ple, when a report is of the enhancement type.

The second scale is more self-explanatory, as each

value indicates the degree of severity (blocker,

critical, major, normal, minor, trivial, enhance-

ment).

2.2 Structure of a Bug Report’s History

Figure 1 shows a UML Class Diagram with the his-

tory structure used by Bugzilla to record changes to

a bug report - we can see other attributes of a bug re-

port in Bugzilla’s documentation. In short, each bug

report has a history that is an array, which can be

empty (when no changes are registered previously)

or can contain multiple objects of type ChangeSet.

ChangeSet consists of two attributes: who contains

the email of the user who made the change; and when

is the date on which the change was made. In ad-

dition, a ChangeSet is composed of changes, an array

of Change objects. A ChangeSet must include at least

one Change. Lastly, a Change object consists of the

attributes: ﬁeld name (name of the ﬁeld that someone

changed), added (the value that someone added), and

removed (the value that someone removed).

Figure 1: History structure.

3 RELATED WORK

Zhang (Zhang et al., 2016) reviewed the literature

about bug resolution and identiﬁed research lines on

this topic.

The quality of bug reports has been the subject of

several research studies. Zimmerman et al. (Betten-

burg et al., 2008) built a tool to measure the quality of

bug reports and suggest improvements to the reporter.

Some work has focused on developing tools that help

to increase the quality of bug reports: Song and Cha-

parro (Song and Chaparro, 2020) built BEE, a tool for

structuring and analyzing bug reports, and Fazzini et

al. (Fazzini et al., 2022) created EBug, a tool for as-

sist reporters in writing steps-to-reproduce in mobile

apps.

There is much research on applying models for

classifying bug reports and/or predicting features. Lo

et al.(Fan et al., 2020) designed a model to predict

whether a bug report is valid. Hooimeijer and West-

ley (Hooimeijer and Weimer, 2007) built a model to

predict when a bug will be triaged, given a certain

amount of time. For this, they used three features

https://wiki.mozilla.org/Bugzilla:BzAPI:Objects

Investigating Bug Report Changes in Bugzilla

related to changes: the number of severity changes,

comment count, and attachment count. Shihab

and Garcia (Valdivia Garcia and Shihab, 2014) also

worked with prediction models to predict whether a

bug is blocking or not, and they used the feature prior-

ity has increased, which tells if the priority has gone

up after the initial report. Xiao et al. (Xiao et al.,

2020) applied a deep neural network (DNN) model to

predict duplicated bug reports.

Joorabchi et al. (Erfani Joorabchi et al., 2014)

made a characterization of non-reproducible bug re-

ports seeking to understand their frequency, nature,

and cause of them. For this, they make a compar-

ative analysis of the properties of non-reproducible

bugs and their counterparts. Furthermore, they inves-

tigate their life-cycle patterns taking into account the

changes in history in status and resolution. Regard-

ing changes, Joorabchi’s work only focuses on under-

standing the transitions that occur in status and reso-

lution, while the entire focus of this work is to under-

stand more generally (with more ﬁelds) the frequency

of changes and the possible relationship between ﬁeld

changes.

Rocha et al. (Rocha et al., 2016) propose a study

of bug workﬂows in Mozilla Firefox to understand

developers’ workﬂow better while dealing with bugs.

To do so, they use the status’ changes history of de-

veloping workﬂow graphs and using the resolution to

compare workﬂows. Moreover, they compared work-

ﬂows between developers with different levels of ex-

perience. Thus, this work focuses on analyzing only

status changes. Gupta (Gupta and Sureka, 2014) used

business process mapping tools and techniques to cre-

ate a framework that generates runtime process maps

from analyzing the changes history in bug reports -

they used the ﬁelds: status, resolution, assigned to,

qa contact, and component.

These studies either use some change features in

their models or, by investigating the changes, focus on

the status and resolution ﬁelds. In this study, we fo-

cus on the frequency of changes by considering more

ﬁelds and the relationships between those changes,

which serve as complementing evidence.

4 METHODOLOGY

This work aims to analyze the change history of bug

reports reported in the Bugzilla database. Through an

exploratory analysis of this dataset, the objectives are:

(i) to identify which ﬁelds are most changed; (ii) the

proﬁle of the reports that have the most changes, and

(iii) to determine whether there are relations between

changes.

All study material is available at: https://github.

com/felipeemerson/Bugzilla-mozilla-investigation.

4.1 Research Questions

The study was carried out in order to answer the fol-

lowing research questions:

RQ1. Which ﬁelds change the most?

RQ2. What is the proﬁle of the most changed

bugs?

RQ3. Are there relationships between ﬁeld

changes in bug reports?

4.2 Dataset

The dataset used in this study was Mozilla’s Bugzilla

due to its popularity and availability and because of

several previous studies working with it (Fan et al.,

2020; Bettenburg et al., 2008; Valdivia Garcia and

Shihab, 2014; Hooimeijer and Weimer, 2007; Er-

fani Joorabchi et al., 2014). We deﬁned the following

ﬁlters:

• Report creation date range: between 01/01/2013

and 01/01/2022. It covers nine years, ensuring a

good amount of bug reports;

• Status: RESOLVED. Using this status, we avoid

getting current open bugs or invalid reports;

• Product: all. The dataset includes bug reports

from several Mozilla projects.

In total, the dataset used has 690,817 bug reports.

4.3 Metrics

In this study, we apply the following metrics:

• Total changes per bug report. It allows us to check

which bug reports change the most;

• Percentage of bug reports that recorded at least

one change per ﬁeld. The percentage of bug

reports registered at least one modiﬁcation in a

given ﬁeld. Provides clues about which changes

happen more frequently;

• We reused the metric total changes per ﬁeld of

a bug report, which is used as a feature (in the

severity ﬁeld) in Hooimeijer and Westley’s work

(Hooimeijer and Weimer, 2007).

4.4 Procedure

We obtained the data used in this work in two stages:

data collection and processing.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

4.4.1 Data Collection

The data was collected using Bugzilla’s REST API

using Python scripts. The API returns the data in

JSON format. Due to the massive amount of data,

the result was stored in the MongoDB database and is

accessed using the mongo engine library.

4.4.2 Data Processing

In order to obtain the metrics values described in

Section 4.2, we processed the data through Python

scripts, and the results were written to JSON ﬁles.

The ﬁles are used in notebook-type documents to pro-

duce the graphs and statistics to be analyzed. For

this purpose, the pandas

, matplotlib

, and seaborn

libraries were used.

4.5 Fields

The ﬁelds explored in this study are:

1. id: unique bug report identiﬁer;

2. history: the changes history;

3. resolution: it indicates as the bug report was re-

solved;

4. product: affected product;

5. severity: bug severity level;

6. priority: bug priority level.

The ﬁelds id, history, and resolution were down-

loaded between 06/03/2022 and 06/04/2022; While

product, severity and priority were added between

07/19/2022 and 07/31/2022. The second part of

downloads was necessary due to the later identiﬁca-

tion of their requirement with their ﬁnal values to an-

swer RQ2. Furthermore, we detected that a bug re-

port, with id 1604167, now requires access authoriza-

tion, which led it to be left out of the speciﬁc analyses

involving the ﬁelds added later.

5 RESULTS

The results achieved with this study provide evidence

to answer the research questions proposed in Section

4. We discuss this in this section.

https://bmo.readthedocs.io/en/latest/api/index.html

https://pandas.pydata.org/

https://matplotlib.org/

https://seaborn.pydata.org/

Table 1: Number of ﬁelds that registered changes by type.

Fields Value

Custom ﬁelds

Related to product

578 (93.68%)

544 (88.17%)

Not related to product 34 (5.51%)

Non-custom ﬁelds 39 (6.32%)

5.1 RQ1. Which Fields Change the

Most?

To answer RQ1, we identify which ﬁelds were subject

to at least one change in the entire dataset (Subsection

5.1.1). After checking that most ﬁelds are custom, we

investigated their prevalence (Subsection 5.1.2). In

Subsection 5.1.3, we present the general statistics of

changes to understand better the results obtained. In

Subsection 5.1.4, we present the answer to RQ1.

5.1.1 Total Fields with at Least One Change

As shown in Table 1, we identiﬁed 617 ﬁelds that reg-

istered at least one change in the dataset. Looking at

the ﬁeld names, we noticed that 578 ﬁelds start with

the preﬁx cf , used to identify custom ﬁelds, which

Bugzilla administrators create to meet some demands

that existing ﬁelds do not meet.

Among the custom ﬁelds, most are ﬁelds re-

lated to versions of a speciﬁc product and start

with the preﬁxes cf status or cf tracking, e.g.,

cf status ﬁrefox101, cf status thunderbird 103 and

cf tracking seamonkey237. There are 39 non-custom

ﬁelds, only 6% of the total.

5.1.2 Custom Fields that Have a Higher

Percentage of Presence

Considering only the custom ﬁelds that are not re-

lated to product versions, we have highlighted in

Table 2 that the cf last resolved ﬁeld (last date on

which the report was considered resolved) has 100%

presence, meaning that it is updated at least once

in every bug report in the dataset. Fields such as

cf has regression range (it says if a report has a re-

gression interval), cf crash signature (it saves the

fault signature), among others, have a presence below

2.48%, which means modiﬁcations involving these

ﬁelds occur in very few reports. These low values

may be due to the fact that only a developer can

modify custom ﬁelds. Furthermore, most are related

to a speciﬁc product version, being used for a short

time. For example, Mozilla Firefox has new releases

monthly.

5.1.3 Total Changes per Bug Report

Table 3 presents the values of the total changes by bug

report for custom and non-custom ﬁelds and the two

Investigating Bug Report Changes in Bugzilla

Table 2: The ﬁve most frequent custom ﬁelds with changes.

Field Presence percentage

cf last resolved 100%

cf blocking b2g 2.48%

cf qa whiteboard 1.68%

cf has regression range 1.61%

cf crash signature 0.96%

Table 3: Statistics of the total changes.

Mean Median SD Max

Custom ﬁelds 2.04 1 2.42 94

Non-custom ﬁelds 12.24 8 15.20 1793

All ﬁelds 14.29 10 16.25 1796

types together. On average, a bug report has 14.29

changes over its lifecycle, with most changes being

made to non-custom ﬁelds. Non-custom ﬁelds have

a high standard deviation (SD) compared to custom

ﬁelds, indicating that they have a higher value dis-

persion. As for the maximum values recorded, non-

custom ﬁelds had up to 1793 changes in a single bug

report, an exceptional outlier value compared to the

mean and median. This difference between the num-

ber of changes can be explained by the factors already

mentioned in subsection 5.1.2 (most custom ﬁelds are

ﬁelds used for a short period) and changes in custom

ﬁelds that occur in very few bug reports.

5.1.4 Total Changes per non-Custom Field

The previous subsection shows that non-custom ﬁelds

cover most of the total changes. Table 4 shows that

only three ﬁelds reached a non-zero median: cc, reso-

lution, and status; these last two are ﬁelds with 100%

presence in the entire base due to the restriction of all

bug reports in the dataset to have the RESOLVED sta-

tus and because it is necessary to inform the resolution

ﬁeld. So, apart from status and resolution, the ﬁelds

with the highest percentages are cc (it is used for users

to register and receive notiﬁcations about the report),

ﬂagtypes.name (it is used to ask a user for informa-

tion), and assigned to (it is ﬁlled in when a developer

is assigned to ﬁx a bug).

As for the maximum values of changes, two

ﬁelds that indicate relationships between bugs appear

among those with the highest maximum values:

depends on (list of bugs that block the current one)

with 1770 and blocks (list of bugs that are blocked

by the current one) with 353. Among the ﬁelds

with the lowest maximum values, we can highlight

regressed by (list of bugs that introduced the current

one) with only 5, which is another ﬁeld that indicates

relationships between bugs.

In summary, the cc and ﬂagtypes.name ﬁelds

stood out in the various scenarios, showing that they

are among the most modiﬁed and present. For cc,

the likely reason for the results is that users signing

up to track bug reports is very common. As for ﬂag-

type.name, the request for information from one user

to another is common for several reasons: a devel-

oper asking for new information about the described

bug or a user asking a developer for analysis, among

other situations. Moreover, in both cases, the more

complex and/or urgent the bug is, the more users tend

to participate in the bug. Consequently, more changes

occur in those ﬁelds.

5.2 RQ2. What Is the Proﬁle of the

Most Changed Bugs?

Grouping changes by the ﬁnal value of a ﬁeld can

give clues to understanding which values are related

to more changes. For this subsection, we calcu-

late statistics only using modiﬁcations in non-custom

ﬁelds (because they are the ones with the most mod-

iﬁcations, as seen in subsection 5.1). We use the res-

olution, priority, and severity ﬁelds to group the data.

Due to base restriction, we do not use status, where

all bugs have the ﬁnal status RESOLVED. In addition,

we perform grouping by the product ﬁeld, consider-

ing the most popular products within the dataset.

5.2.1 Grouping by Final Resolution

Due to the download time range described in section

4.5, 06 (six) bug reports were removed from the anal-

ysis as they were reopened within the range and had

their ﬁnal resolutions removed.

Table 5 introduces the total changes grouped by

the ﬁnal resolution. As we can see, it shows that

the resolution with the most changes is FIXED with

15.13 changes on average (this means that bug re-

ports, where the last value registered in resolution was

FIXED, have 15.13 changes on average). In com-

parison, INACTIVE and WONTFIX have around ten

changes on average, while DUPLICATE, INCOM-

PLETE and INVALID have around 8. There is a dif-

ference of up to 50% between the average modiﬁca-

tions from FIXED to the others. This result may in-

dicate that invalid reports tend to be identiﬁed with a

lower degree of change or that valid bug reports have

a longer life cycle, given that after a bug is identiﬁed,

its report can still present changes that help in the cor-

rection of the same.

The highest modiﬁcation values were recorded

in WONTFIX with 1793, INVALID with 1692, and

FIXED with 1369. Were these values recorded in the

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

Table 4: Results summary of the total changes in non-custom ﬁelds.

Highest Means Highest Medians Maximum Values Lowest Maximum Values Most Present Fields

ﬂagtypes.name

(2.70)

(1)

depends on

(1,770)

attachments.isprivate

(4)

resolution (100%)

(2.34)

resolution (1)

(773)

restrict comments (4) status (100%)

status (1.34)

status

(1)

ﬂagtypes.name (547)

regressed by

(5)

cc (77.43%)

resolution (1.16) product (0)

comment tag

(519)

bug mentor

(5)

ﬂagtypes.name (48.30%)

comment tag (0.69)

component

(0)

blocks (353) op sys (5) assigned to (34.90%)

Table 5: Total changes grouped by ﬁnal resolution.

Value Mean Median SD Max

FIXED 15.13 10 17.88 1369

INVALID 7.86 6 10.38 1692

INCOMPLETE 8.24 6 8.74 860

DUPLICATE 8.18 6 7.32 362

WORKSFORME 9.97 8 9.87 580

WONTFIX 10.39 7 14.86 1793

INACTIVE 10.42 8 12.62 247

MOVED 8.74 6 8.50 99

bug reports with id 838081

, 950073

, and 1243581

respectively. By examining them, we found they were

used as a hub for bug reports related to a speciﬁc

project. The bug report with id 838081 centralized

a Product Backlog (list of requirements) of Firefox’s

Metro interface, which was never released. The re-

port with id 950073 centralized Firefox Desktop re-

ports. Furthermore, the report with id 1243581 was

related to the Stylo project. In this way, related re-

ports were concentrated using the depends on ﬁeld,

and new bug reports were added and removed if they

were resolved. In summary, none of the three bug re-

ports were linked to any speciﬁc bug, and there may

be more bug reports in the same situation.

5.2.2 Grouping by Final Priority

Concerning the total changes grouped by the ﬁnal pri-

ority, we can see, In Table 6, that the highest averages

of changes by priority are P1 with 19.11 (bug reports

with P1 as ﬁnal value registered in priority have an

average of 19.11 changes), P2 with 17.69, and P3

with 15.40, but these values have a standard deviation

of up to 20.71 changes. Thus, the median can be a bet-

ter value to illustrate how these changes occur, where

priorities P1, P2, and P3 have a median between 11

and 13 modiﬁcations, while P4 and P5 have 7 and

https://bugzilla.mozilla.org/show bug.cgi?id=838081

https://bugzilla.mozilla.org/show bug.cgi?id=950073

https://bugzilla.mozilla.org/show bug.cgi?id=1243581

Table 6: Total changes grouped by ﬁnal priority.

Value Mean Median SD Max

– 11.41 8 14.28 1793

P1 19.11 13 20.71 544

P2 17.69 13 19.61 595

P3 15.40 11 17.21 1003

P4 8.44 7 9.06 860

P5 8.73 6 10.50 797

6, respectively. There is a difference in total changes

of up to about twice between low-priority and high-

priority bug reports. Furthermore, as the priority ﬁeld

is not required, there is a high number of bug reports

(478542) that do not have a deﬁned priority (repre-

sented by the value “–”), which have a median of 8

changes.

The results suggest that reports with medium or

high priority tend to present more changes than re-

ports with low priority.

5.2.3 Grouping by Final Severity

Unlike priority, whose values are on a scale from P1

to P5, severity has two different scales, one from S1

to S4 and another one whose classiﬁcation is made by

categories (critical, normal, enhancement, blocker,

major, minor and trivial). Apparently, the S1 to S4

scale was added later, and both coexist, leaving it up

to the user to choose.

Analyzing the values in Table 7, we have that for

the ﬁrst scale, severity S4 has the lowest average of

changes with 8.39, while S1, S2, and S3 have 16.68,

19.09, and 15.32 changes, respectively. Considering

the other scale, blocker, critical, and major, which

are the highest levels of severity, also have the high-

est averages of changes with 17.19, 15.38, and 15.87,

respectively.

As we can see, reports with medium to high sever-

ity tend to have more modiﬁcations than reports with

low severity.

Investigating Bug Report Changes in Bugzilla

Table 7: Total changes grouped by ﬁnal severity.

Value Mean Median SD Max

– 8.81 7 6.98 190

N/A 12.12 9 11.48 300

S1 16.68 11 14.05 83

S2 19.09 15 14.67 129

S3 15.32 13 11.32 348

S4 8.39 6 7.64 157

blocker 17.19 11 20.03 530

critical 15.38 11 13.97 254

enhancement 11.53 11 3.32 23

major 15.87 11 16.64 244

minor 11.72 9 9.31 137

normal 12.30 8 15.81 1793

trivial 12.05 9 21.50 842

Table 8: Total changes grouped by ﬁnal product (top 10

most popular products).

Value Mean Median SD Max

Core 14.53 10 18.01 1369

Firefox 11.17 8 13.55 767

Firefox OS Graveyard 13.90 9 16.40 576

Testing 10.80 7 13.21 326

DevTools 14.18 10 14.94 423

Infrastructure Operations 6.15 4 6.27 335

Toolkit 14.34 10 16.43 530

Firefox for Android Graveyard 13.95 10 14.82 433

Thunderbird 11.53 9 11.39 280

Firefox Build System 14.08 10 15.80 545

5.2.4 Grouping by Final Product

In the dataset used in this study, there are 163 differ-

ent products, so we chose to group the 10 (ten) most

popular ones, the products with the highest number

of reports identiﬁed in the dataset. Observing Table 8,

we have the product Infrastructure & Operations with

the lowest average of all, which is about 2 (two) times

lower compared to the other products. Therefore, it

may be possible that depending on the context of a

product, bug reports that affect it have fewer changes

than reports of other products.

5.3 RQ3. Are There Relations Between

Field Changes in Bug Reports?

As it is possible to register several changes at once,

the presence of a change may appear simultaneously

with another change, or the occurrence of a change

may be related to another change. Thus, there may

be correlations between the total of changes and the

simultaneous occurrence of changes between pairs of

ﬁelds, i.e., when we have an occurrence of changes

in both two ﬁelds in the same ChangeSet. In this con-

text, we consider the following ﬁelds for analysis: sta-

tus, resolution, assigned to, product, component, pri-

ority, severity, summary, platform, and op sys. The

choice was made because they are ﬁelds explored in

other studies (Fan et al., 2020; Bettenburg et al., 2008;

Valdivia Garcia and Shihab, 2014; Hooimeijer and

Weimer, 2007; Erfani Joorabchi et al., 2014; Rocha

et al., 2016; Gupta and Sureka, 2014).

Correlation between status and resolution. Fig-

ure 2 shows a very high correlation of 0.85 between

the number of changes in resolution and status ﬁelds,

which would be expected. Whenever the resolution

ﬁeld changes, the status ﬁeld will also change to-

gether. That is because a change in resolution only oc-

curs in two situations: (1) when the bug has been re-

solved, and then the status will change to RESOLVED

along with resolution; (2) when the bug is reopened

and then removes the resolution value and changes the

status to REOPENED. However, not always when the

status changes resolution will change along because

the status can change in other situations like when a

bug report is created (NEW) or assigned (ASSIGNED)

to a developer.

Correlation between platform and op sys. There

is a very high correlation of 0.82 between platform

and op sys, where platform refers to the device archi-

tecture (x86, ARM, etc.), while op sys refers to the

operating system. It is still possible to explore the

correlation of simultaneous occurrences of these two

ﬁelds with their total number of changes. Figure 3

shows that the correlation between the total platform

changes and the simultaneous occurrence between the

two is 0.94, even higher than the previous correlation.

The correlation between op sys and the simultaneous

occurrence is 0.86, which is still high. So, platform

changes rely more on op sys than vice versa.

Correlation between product and component.

The correlation between the total product number of

changes and total component changes is 0.41, consid-

ered a moderate correlation. However, Figure 4 shows

that the correlation between the number of changes

that co-occur in the component and product ﬁelds is

0.69, and of the product with the simultaneous occur-

rence is 0.64. That is a high correlation between a

change in the component ﬁeld coinciding with prod-

uct and vice versa.

Correlation between priority and severity. The

correlation between the number of priority changes

and the number of severity changes individually is

0.18, a shallow value. Nevertheless, as seen in Fig-

ure 5, when the simultaneous occurrence is consid-

ered, there is a high correlation of 0.61 between sever-

ity and its simultaneous occurrences and a low cor-

relation between priority and its simultaneous occur-

rences. Severity occurs more often, accompanied by

priority than the reverse.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

Figure 2: Correlations between the number of ﬁeld changes.

Figure 3: Correlation between platform, op sys, and their

simultaneous occurrences.

Figure 4: Correlation between product, component, and

their simultaneous occurrences.

6 THREATS TO VALIDITY

Internal Threats to Validity. The procedures per-

formed in the collection and/or processing phase may

be possible sources of errors. Therefore, we checked

each procedure more than once. Another factor is that

the data is constantly being updated, possibly reopen-

Figure 5: Correlation between priority, severity, and their

simultaneous occurrences.

ing bug reports and, consequently, having new up-

dates. Considering this, we downloaded change his-

tories from the bug reports in a short period (3 days).

However, it was later necessary to download more

ﬁelds, which led to 6 bugs being reopened in between.

External Threats to Validity. This study focused

only on Mozilla’s Bugzilla dataset, so the results may

not be valid for other datasets, whether they are open

source or not.

7 CONCLUSIONS

The present study identiﬁed that about 85% of the

modiﬁcations occur in non-custom ﬁelds. Except for

cf last resolved, custom ﬁelds have less than 3% pres-

ence in bug reports changes. Therefore, they may

not be promising as features. Cc and ﬂagtypes.name

ﬁelds are the most modiﬁed. Among these, only

cc was used in studies (Valdivia Garcia and Shihab,

2014), and (Erfani Joorabchi et al., 2014).

We have found that bug reports in Mozilla’s

Investigating Bug Report Changes in Bugzilla

Bugzilla tend to have a higher average of changes

when they are valid bugs with medium-high priority

and/or medium-high severity. This information can

help developers better estimate the effort needed to

track and ﬁx bugs.

Concerning the relations between changes, we

identiﬁed that the correlation between ﬁeld pair mod-

iﬁcations could be promising. For example, platform

and op sys ﬁelds present a robust correlation (0.94

and 0.86, respectively) between the simultaneous oc-

currence of changes in them. Most platform changes

occurred together with op sys changes and vice versa.

The product and component ﬁelds show a moderate

correlation in both cases.

Future work could evaluate the use of the ﬂag-

types.name ﬁeld as a feature in models or tools. In

addition, researchers could investigate which other

ﬁelds and their respective values affect the amount of

change in bug reports. A comparative study involv-

ing multiple datasets could further generalize the re-

sults. Future research could explore additional ﬁelds

to identify new promising pairs that correlate with

changes and their inﬂuence on bug report resolution.

REFERENCES

Anvik, J., Hiew, L., and Murphy, G. C. (2005). Coping with

an open bug repository. In Proceedings of the 2005

OOPSLA Workshop on Eclipse Technology EXchange,

eclipse ’05, page 35–39, New York, NY, USA. Asso-

ciation for Computing Machinery.

Bettenburg, N., Just, S., Schr

oter, A., Weiß, C., Premraj,

R., and Zimmermann, T. (2007). Quality of bug re-

ports in eclipse. In Proceedings of the 2007 OOPSLA

Workshop on Eclipse Technology EXchange, eclipse

’07, page 21–25, New York, NY, USA. Association

for Computing Machinery.

Bettenburg, N., Just, S., Schr

oter, A., Weiss, C., Prem-

raj, R., and Zimmermann, T. (2008). What makes a

good bug report? In Proceedings of the 16th ACM

SIGSOFT International Symposium on Foundations

of Software Engineering, SIGSOFT ’08/FSE-16, page

308–318, New York, NY, USA. Association for Com-

puting Machinery.

Erfani Joorabchi, M., Mirzaaghaei, M., and Mesbah,

A. (2014). Works for me! characterizing non-

reproducible bug reports. In Proceedings of the 11th

Working Conference on Mining Software Reposito-

ries, MSR 2014, page 62–71, New York, NY, USA.

Association for Computing Machinery.

Fan, Y., Xia, X., Lo, D., and Hassan, A. E. (2020). Chaff

from the wheat: Characterizing and determining valid

bug reports. IEEE Transactions on Software Engi-

neering, 46(5):495–525.

Fazzini, M., Moran, K. P., Bernal-Cardenas, C., Wendland,

T., Orso, A., and Poshyvanyk, D. (2022). Enhancing

mobile app bug reporting via real-time understanding

of reproduction steps. IEEE Transactions on Software

Engineering, pages 1–1.

Gupta, M. and Sureka, A. (2014). Nirikshan: Mining

bug report history for discovering process maps, in-

efﬁciencies and inconsistencies. In Proceedings of

the 7th India Software Engineering Conference, ISEC

’14, New York, NY, USA. Association for Computing

Machinery.

Hooimeijer, P. and Weimer, W. (2007). Modeling bug re-

port quality. In Proceedings of the Twenty-Second

IEEE/ACM International Conference on Automated

Software Engineering, ASE ’07, page 34–43, New

York, NY, USA. Association for Computing Machin-

ery.

Rocha, H., de Oliveira, G., Valente, M. T., and Marques-

Neto, H. (2016). Characterizing bug workﬂows in

mozilla ﬁrefox. In Proceedings of the XXX Brazilian

Symposium on Software Engineering, SBES ’16, page

43–52, New York, NY, USA. Association for Comput-

ing Machinery.

Soltani, M., Hermans, F., and B

ack, T. (2020). The sig-

niﬁcance of bug report elements. Empirical Software

Engineering, 25:5255–5294.

Song, Y. and Chaparro, O. (2020). Bee: A tool for struc-

turing and analyzing bug reports. In Proceedings of

the 28th ACM Joint Meeting on European Software

Engineering Conference and Symposium on the Foun-

dations of Software Engineering, ESEC/FSE 2020,

page 1551–1555, New York, NY, USA. Association

for Computing Machinery.

Valdivia Garcia, H. and Shihab, E. (2014). Character-

izing and predicting blocking bugs in open source

projects. In Proceedings of the 11th Working Con-

ference on Mining Software Repositories, MSR 2014,

page 72–81, New York, NY, USA. Association for

Computing Machinery.

Xiao, G., Du, X., Sui, Y., and Yue, T. (2020). Hindbr: Het-

erogeneous information network based duplicate bug

report prediction. In 2020 IEEE 31st International

Symposium on Software Reliability Engineering (IS-

SRE), pages 195–206.

Zhang, T., Jiang, H., Luo, X., and Chan, A. T. (2016). A

literature review of research in bug resolution: Tasks,

challenges and future directions. The Computer Jour-

nal, 59(5):741–773.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems