AN IDENTIFICATION METHOD OF RELATED GROUP THREADS
FOR A RECENT BUG THREAD BY PEAK CHARACTERISTICS
OF SIMILARITIES
Yuuki Imanara, Kota Itakura, Masaki Samejima and Masanori Akiyoshi
Graduate School of Information Science and Technology, Osaka University, 2-1, Yamadaoka, Suita, Osaka, Japan
Keywords:
Bug tracking system, Related group thread, SVM, Peak characteristics of similarities.
Abstract:
This paper addresses the problem to identify the related group threads that has dependent relationships with
recent bug threads. Because most of recent bug threads have no dependent relationships with group threads,
basic approach based on similarity regards them as having dependent relationships wrongly. In this paper, we
propose an identification method of related group threads by peak characteristics of similarities. The proposed
method removes recent bug threads that have no dependent relationships by Support Vector Machine based on
vectors representing peak characteristics of similarities between the recent bug thread and group threads. The
application result shows that the precision rate is improved by 49% and the recall rate is kept 76% on average
using the proposed method.
1 INTRODUCTION
In open source software development, communities
for developers are organized, and they discuss who
fixes the bug and how to fix it. In order to support their
discussion and manage bug information, bug tracking
systems (Serrano and Ciordia, 2005; Matsushita et al.,
2005) are introduced.
The bug tracking system generally consists of bug
threads posted by developers. Each bug thread has a
title, the progress to fix, developers’ comments, and
the dependent relationship. The dependent relation-
ship indicates the relationship that one bug can not
be fixed unless the other bug is fixed (Souza et al.,
2007). The bug threads that have dependent rela-
tionships each other are organized as “group thread”.
Every time a recent bug is reported, developers find
the group thread which the recent bug thread has
dependent relationships with bug threads in, that is
called “related group thread” to improve their discus-
sion (Black, 2002; Chen et al., 2010). Because there
are dozens of group thread, it is difficult for devel-
opers to find the related group thread (Zimmermann,
2009; Gall et al., 2003). The purpose of this research
is to identify the related group thread for the recent
bug thread automatically.
Since threads that has dependent relationships
each other have common symptom of the bugs, com-
ments on the threads are often similar (Nagwani and
Singh, 2009). So, the similar group thread to the re-
cent bug thread can be regarded as the related group
thread. With this concept, the basic approach is to
derive similarities between the recent bug thread and
each group thread by Cosine Similarity (Sullivan,
2001), and to decide the related group thread as the
group thread that has the highest similarity more than
a threshold. The threshold is derived from similari-
ties among existing bug threads. However, some of
the recent bug threads are similar to the thread group,
but do not have dependent relationship with the exist-
ing bug threads because the recent bug does not have
enough comments and the similarity is not correctly
derived. This causes misidentification of the related
group thread. So, it is necessary to extract charac-
teristics of the misidentified related group thread and
remove the recent bug thread before the identifica-
tion (Imanara et al., 2011).
We propose an identification method of related
group thread by peak characteristics of similarities. In
case that the recent bug thread has dependent relation-
ships with the related group thread, the similarity with
the related group thread is very high but the similarity
with the other group thread is low. We call the char-
acteristics of similarities “peak characteristics”. So,
the peak characteristics of similarities can be on these
similarities with the related group thread. Two kinds
179
Imanara Y., Itakura K., Samejima M. and Akiyoshi M..
AN IDENTIFICATION METHOD OF RELATED GROUP THREADS FOR A RECENT BUG THREAD BY PEAK CHARACTERISTICS OF SIMILARITIES.
DOI: 10.5220/0003506401790184
In Proceedings of the 6th International Conference on Software and Database Technologies (ICSOFT-2011), pages 179-184
ISBN: 978-989-8425-77-5
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)