An Extensive Analysis of Data Clumps in UML Class Diagrams

Nils Baumgartner, Elke Pulvermüller

2024

Abstract

This study investigated the characteristics of data clumps in UML class diagrams. Data clumps are group of variables which appear together in multiple locations. In this study we compared the data clumps characteristics in UML class diagrams with them of source code projects. By analyzing the extensive Lindholmen and GenMyModel datasets, known for their real–world applicability, diversity, and containing more than 100,000 class diagrams in total, significant differences in the distribution and nature of data clumps were revealed. Approximately 19 % of the analyzed class diagrams contained data clumps. It was observed that field–field data clumps predominated in UML class diagrams, particularly in the GenMyModel dataset, while parame-ter–parameter data clumps were less frequent. Moreover, in contrast to the distribution in source code projects, data clumps in UML class diagrams were typically distributed across multiple classes or interfaces, forming larger chains. parameter–parameter data clumps were predominant in source code projects, indicating more detailed implementation of methods in these projects. These findings reflect different modeling approaches and paradigms among the respective user groups. This study has provided important insights regarding the development of UML modeling tools, teaching methods, and design practices in software development.

Download


Paper Citation


in Harvard Style

Baumgartner N. and Pulvermüller E. (2024). An Extensive Analysis of Data Clumps in UML Class Diagrams. In Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE; ISBN 978-989-758-696-5, SciTePress, pages 15-26. DOI: 10.5220/0012550500003687


in Bibtex Style

@conference{enase24,
author={Nils Baumgartner and Elke Pulvermüller},
title={An Extensive Analysis of Data Clumps in UML Class Diagrams},
booktitle={Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE},
year={2024},
pages={15-26},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012550500003687},
isbn={978-989-758-696-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE
TI - An Extensive Analysis of Data Clumps in UML Class Diagrams
SN - 978-989-758-696-5
AU - Baumgartner N.
AU - Pulvermüller E.
PY - 2024
SP - 15
EP - 26
DO - 10.5220/0012550500003687
PB - SciTePress