The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects

Nils Baumgartner, Elke Pulvermüller

2024

Abstract

This study explores the characteristics of data clumps, a specific type of code smells, in software projects. Code smells are characteristics in source code which indicate a deeper problem. Data clumps are identical groups of variables in different part of the code. The lack of datasets for data clumps can make it difficult to identify and manage these sets in software projects. We developed a tool to parse source code projects into an abstract syntax tree, facilitating detailed analysis of data clumps. Our findings reveal a notable presence of data clumps forming clusters, complicating manual refactoring. In this paper, we propose a unified reporting format for data clump detection and provide a granular dataset for data clumps. Additionally, we outline a detection methodology that can be applied across different programming languages and frameworks. We also provide a first look into the lifecycle and evolution of data clumps, showing that data clumps either remain in projects or accumulate over time. This work provides a foundation for further research aimed at enhancing software quality through identifying and refactoring data clumps, offering a starting point for discussions and improvements in this domain.

Download


Paper Citation


in Harvard Style

Baumgartner N. and Pulvermüller E. (2024). The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects. In Proceedings of the 12th International Conference on Model-Based Software and Systems Engineering - Volume 1: MODELSWARD; ISBN 978-989-758-682-8, SciTePress, pages 15-26. DOI: 10.5220/0012313900003645


in Bibtex Style

@conference{modelsward24,
author={Nils Baumgartner and Elke Pulvermüller},
title={The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects},
booktitle={Proceedings of the 12th International Conference on Model-Based Software and Systems Engineering - Volume 1: MODELSWARD},
year={2024},
pages={15-26},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012313900003645},
isbn={978-989-758-682-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Conference on Model-Based Software and Systems Engineering - Volume 1: MODELSWARD
TI - The Lifecycle of Data Clumps: A Longitudinal Case Study in Open-Source Projects
SN - 978-989-758-682-8
AU - Baumgartner N.
AU - Pulvermüller E.
PY - 2024
SP - 15
EP - 26
DO - 10.5220/0012313900003645
PB - SciTePress