Text Mining Studies of Software Repository Contents

Bartosz Dobrzyński, Janusz Sosnowski

2023

Abstract

Issue tracking systems comprise data which are useful in evaluating or improving software development processes. Revealing and interpreting this information is a challenging problem which needs appropriate algorithms and tools. For this purpose, we use text mining schemes adapted to the specificity of the software repository. They base on a detailed analysis of the used dictionaries which comprise Natural Language Words (NLW) and are enhanced with specialized entities in issue descriptions (e.g., emails, code snippets, technical names). They are defined with specially developed regular expressions. The pre-processed texts are submitted to original text mining algorithms (machine learning). This approach has been verified in commercial and open-source projects and showed possible development improvements.

Download


Paper Citation


in Harvard Style

Dobrzyński B. and Sosnowski J. (2023). Text Mining Studies of Software Repository Contents. In Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE, ISBN 978-989-758-647-7, SciTePress, pages 562-569. DOI: 10.5220/0011970100003464


in Bibtex Style

@conference{enase23,
author={Bartosz Dobrzyński and Janusz Sosnowski},
title={Text Mining Studies of Software Repository Contents},
booktitle={Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE,},
year={2023},
pages={562-569},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011970100003464},
isbn={978-989-758-647-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE,
TI - Text Mining Studies of Software Repository Contents
SN - 978-989-758-647-7
AU - Dobrzyński B.
AU - Sosnowski J.
PY - 2023
SP - 562
EP - 569
DO - 10.5220/0011970100003464
PB - SciTePress