Authors:
Nils Baumgartner
and
Elke Pulvermüller
Affiliation:
Research Group Software Engineering, Institute of Computer Science, Department of Mathematics and Computer Science, University of Osnabrück, Osnabrueck, Germany
Keyword(s):
Design Smell, Code Smell Dataset, Class Diagram, Data Clumps, Code Analysis, Reporting Format.
Abstract:
This study explores the characteristics of data clumps, a specific type of code smells, in software projects. Code smells are characteristics in source code which indicate a deeper problem. Data clumps are identical groups of variables in different part of the code. The lack of datasets for data clumps can make it difficult to identify and manage these sets in software projects. We developed a tool to parse source code projects into an abstract syntax tree, facilitating detailed analysis of data clumps. Our findings reveal a notable presence of data clumps forming clusters, complicating manual refactoring. In this paper, we propose a unified reporting format for data clump detection and provide a granular dataset for data clumps. Additionally, we outline a detection methodology that can be applied across different programming languages and frameworks. We also provide a first look into the lifecycle and evolution of data clumps, showing that data clumps either remain in projects or ac
cumulate over time. This work provides a foundation for further research aimed at enhancing software quality through identifying and refactoring data clumps, offering a starting point for discussions and improvements in this domain.
(More)