Authors:
Pasquale Ardimento
and
Nicola Boffoli
Affiliation:
Department of Informatics, University of Bari Aldo Moro, Via Orabona 4, Bari, Italy
Keyword(s):
SLDA, Latent Topics, Bug-fixing, Repository Mining, Software Maintenance, Text Categorization, SLDA.
Abstract:
During software maintenance activities an accurate prediction of the bug-fixing time can support software managers to better resources and time allocation. In this work, each bug report is endowed with a response variable (bug-fixing time), external to its words, that we are interested in predicting. To analyze the bug reports collections, we used a supervised Latent Dirichlet Allocation (sLDA), whose goal is to infer latent topics that are predictive of the response. The bug reports and the responses are jointly modeled, to find latent topics that will best predict the response variables for future unlabeled bug reports. With a fitted model in hand, we can infer the topic structure of an unlabeled bug report and then form a prediction of its response. sLDA adds to LDA a response variable connected to each bug report. Two different variants of the bag-of-words (BoW) model are used as baseline discriminative algorithms and also an unsupervised LDA is considered. To evaluate the propos
ed approach the defect tracking dataset of LiveCode, a well-known and large dataset, was used. Results show that SLDA improves recall of the predicted bug-fixing times compared to other BoW single topic or multi-topic supervised algorithms.
(More)