Authors:
Nils Baumgartner
and
Elke Pulvermüller
Affiliation:
Software Engineering Research Group, School of Mathematics/Computer Science/Physics, Osnabrück University, 49090, Osnabrück, Germany
Keyword(s):
Design Smell, Code Smell Dataset, Class Diagram, Data Clumps, Code Analysis, Reporting Format.
Abstract:
This study investigated the characteristics of data clumps in UML class diagrams. Data clumps are group of variables which appear together in multiple locations. In this study we compared the data clumps characteristics in UML class diagrams with them of source code projects. By analyzing the extensive Lindholmen and GenMyModel datasets, known for their real–world applicability, diversity, and containing more than 100,000 class diagrams in total, significant differences in the distribution and nature of data clumps were revealed. Approximately 19 % of the analyzed class diagrams contained data clumps. It was observed that field–field data clumps predominated in UML class diagrams, particularly in the GenMyModel dataset, while parame-ter–parameter data clumps were less frequent. Moreover, in contrast to the distribution in source code projects, data clumps in UML class diagrams were typically distributed across multiple classes or interfaces, forming larger chains. parameter–parameter
data clumps were predominant in source code projects, indicating more detailed implementation of methods in these projects. These findings reflect different modeling approaches and paradigms among the respective user groups. This study has provided important insights regarding the development of UML modeling tools, teaching methods, and design practices in software development.
(More)