Data Smells Are Sneaky
Nicolas Hahn, Afonso Sales
2025
Abstract
Data is the primary source for developing AI-based systems, and poor-quality data can lead to technical debt and negatively impact performance. Inspired by the concept of code smells in software engineering, data smells have been introduced as indicators of potential data quality issues, and can be used to evaluate data quality. This paper presents a simulation aimed at identifying specific data smells introduced in the unstructured format and detected in a tabular form. By introducing and analyzing specific data smells, the research examines the challenges in their detectability. The results underscore the need for robust detection mechanisms to address data smells across different stages of a data pipeline. This work expands the understanding of data smells and their implications, provinding new foundations for future improvements in data quality assurance for AI-driven systems.
DownloadPaper Citation
in Harvard Style
Hahn N. and Sales A. (2025). Data Smells Are Sneaky. In Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-749-8, SciTePress, pages 479-488. DOI: 10.5220/0013285500003929
in Bibtex Style
@conference{iceis25,
author={Nicolas Hahn and Afonso Sales},
title={Data Smells Are Sneaky},
booktitle={Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2025},
pages={479-488},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013285500003929},
isbn={978-989-758-749-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - Data Smells Are Sneaky
SN - 978-989-758-749-8
AU - Hahn N.
AU - Sales A.
PY - 2025
SP - 479
EP - 488
DO - 10.5220/0013285500003929
PB - SciTePress