Text Mining Studies of Software Repository Contents
Bartosz Dobrzyński, Janusz Sosnowski
2023
Abstract
Issue tracking systems comprise data which are useful in evaluating or improving software development processes. Revealing and interpreting this information is a challenging problem which needs appropriate algorithms and tools. For this purpose, we use text mining schemes adapted to the specificity of the software repository. They base on a detailed analysis of the used dictionaries which comprise Natural Language Words (NLW) and are enhanced with specialized entities in issue descriptions (e.g., emails, code snippets, technical names). They are defined with specially developed regular expressions. The pre-processed texts are submitted to original text mining algorithms (machine learning). This approach has been verified in commercial and open-source projects and showed possible development improvements.
DownloadPaper Citation
in Harvard Style
Dobrzyński B. and Sosnowski J. (2023). Text Mining Studies of Software Repository Contents. In Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE, ISBN 978-989-758-647-7, SciTePress, pages 562-569. DOI: 10.5220/0011970100003464
in Bibtex Style
@conference{enase23,
author={Bartosz Dobrzyński and Janusz Sosnowski},
title={Text Mining Studies of Software Repository Contents},
booktitle={Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE,},
year={2023},
pages={562-569},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011970100003464},
isbn={978-989-758-647-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE,
TI - Text Mining Studies of Software Repository Contents
SN - 978-989-758-647-7
AU - Dobrzyński B.
AU - Sosnowski J.
PY - 2023
SP - 562
EP - 569
DO - 10.5220/0011970100003464
PB - SciTePress