Cohen, J. (1960). A coefficient of agreement for nominal
scales. Educational and Psychological Measurement,
20(1):37–46.
Eick, S. G., Graves, T. L., Karr, A. F., Marron, J. S., and
Mockus, A. (2001). Does code decay? assessing the
evidence from change management data. IEEE Trans-
actions on Software Engineering, 27(1):1–12.
Fluri, B., W
¨
ursch, M., Pinzger, M., and Gall, H. C. (2007).
Change distilling: Tree differencing for fine-grained
source code change extraction. IEEE Transactions
Software Engineering, 33(11):725–743.
Fu, Y., Yan, M., Zhang, X., Xu, L., Yang, D., and Kymer,
J. D. (2015). Automated classification of software
change messages by semi-supervised latent dirich-
let allocation. Information and Software Technology,
57:369–377.
Greenwell, B., Boehmke, B., Cunningham, J., and Develop-
ers, G. (2019). gbm: Generalized Boosted Regression
Models. R package version 2.1.5.
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V.
(2002). Gene Selection for Cancer Classification us-
ing Support Vector Machines. Machine Learning,
46(1/3):389–422.
Helske, S. and Helske, J. (2019). Mixture hidden markov
models for sequence data: The seqHMM package in
R. Journal of Statistical Software, 88(3):1–32.
Hindle, A., Germ
´
an, D. M., Godfrey, M. W., and Holt, R. C.
(2009). Automatic classication of large changes into
maintenance categories. In The 17th IEEE Interna-
tional Conference on Program Comprehension, pages
30–39. IEEE Computer Society.
H
¨
onel, S., Ericsson, M., L
¨
owe, W., and Wingkvist, A.
(2020). Using source code density to improve the ac-
curacy of automatic commit classification into main-
tenance activities. Journal of Systems and Software,
168:110673.
H
¨
onel, S. (2019). 359,569 commits with source code den-
sity; 1,149 commits of which have software mainte-
nance activity labels (adaptive, corrective, perfective).
H
¨
onel, S. (2019–2022). Git Density 2022.10: Analyze git
repositories to extract the Source Code Density and
other Commit Properties.
H
¨
onel, S. (2023). GitHub Repository: Datasets, Experi-
mental Setups, and Code for Exploiting Relations Be-
tween Commits. DOI: 10.5281/zenodo.7715007.
Kirinuki, H., Higo, Y., Hotta, K., and Kusumoto, S. (2014).
Hey! are you committing tangled changes? In
22nd International Conference on Program Compre-
hension, pages 262–265. ACM.
Kuhn, M. and Quinlan, R. (2020). C50: C5.0 Decision
Trees and Rule-Based Models. R package version
0.1.3.
Landis, J. R. and Koch, G. G. (1977). An application of
hierarchical kappa-type statistics in the assessment of
majority agreement among multiple observers. Bio-
metrics, 33(2):363–374. PMID:884196.
Levin, S. and Yehudai, A. (2016). Using temporal and se-
mantic developer-level information to predict mainte-
nance activity profiles. In 2016 IEEE International
Conference on Software Maintenance and Evolution.
Levin, S. and Yehudai, A. (2017a). 1151 commits with soft-
ware maintenance activity labels (corrective, perfec-
tive, adaptive).
Levin, S. and Yehudai, A. (2017b). Boosting automatic
commit classification into maintenance activities by
utilizing source code changes. In Turhan, B., Bowes,
D., and Shihab, E., editors, Proceedings of the 13th
International Conference on Predictive Models and
Data Analytics in Software Engineering.
Liaw, A. and Wiener, M. (2002). Classification and regres-
sion by randomforest. R News, 2(3):18–22.
Lin, I. and Gustafson, D. A. (1988). Classifying software
maintenance. In Proceedings of the Conference on
Software Maintenance, pages 241–247. IEEE.
Majka, M. (2019). naivebayes: High Performance Imple-
mentation of the Naive Bayes Algorithm in R. R pack-
age version 0.9.7.
Manning, C. D., Raghavan, P., and Sch
¨
utze, H. (2008). In-
troduction to information retrieval. Cambridge Uni-
versity Press.
McCallum, A., Freitag, D., and Pereira, F. C. N. (2000).
Maximum entropy markov models for information ex-
traction and segmentation. In Langley, P., editor, Pro-
ceedings of the Seventeenth International Conference
on Machine Learning, pages 591–598.
Mockus, A. and Votta, L. G. (2000). Identifying reasons for
software changes using historic databases. In 2000
International Conference on Software Maintenance,
pages 120–130. IEEE Computer Society.
O’Connell, J. and Højsgaard, S. (2011). Hidden Semi
Markov Models for Multiple Observation Sequences:
The mhsmm Package for R. Journal of Statistical
Software, 39(4):1–22.
Purushothaman, R. and Perry, D. E. (2005). Toward
understanding the rhetoric of small source code
changes. IEEE Transactions on Software Engineer-
ing, 31(6):511–526.
R Core Team (2019). R: A Language and Environment for
Statistical Computing. R Foundation for Statistical
Computing, Vienna, Austria.
Ramage, D. (2007). Hidden markov models fundamentals.
CS229 Section Notes.
Shivaji, S., Jr., E. J. W., Akella, R., and Kim, S. (2013).
Reducing features to improve code change-based bug
prediction. IEEE Transactions on Software Engineer-
ing, 39(4):552–569.
Sutton, C. and McCallum, A. (2012). An introduction to
conditional random fields. Foundations and Trends in
Machine Learning, 4(4):267–373.
Swanson, E. B. (1976). The dimensions of maintenance. In
Yeh, R. T. and Ramamoorthy, C. V., editors, Proceed-
ings of the 2nd International Conference on Software
Engineering, pages 492–497.
Visser, I. and Speekenbrink, M. (2010). depmixs4: an r
package for hidden markov models. Journal of Statis-
tical Software, 36(7):1–21.
Wright, M. N. and Ziegler, A. (2017). ranger: A fast im-
plementation of random forests for high dimensional
data in C++ and R. Journal of Statistical Software,
77(1):1–17.
Exploiting Relations, Sojourn-Times, and Joint Conditional Probabilities for Automated Commit Classification
331