A Well-founded Ontology to Support the Preparation of Training and Test Datasets
Lucimar Moura, Marcus da Silva, Kelli Cordeiro, Kelli Cordeiro, Maria Cavalcanti, Maria Cavalcanti
2021
Abstract
In the knowledge discovery process, a set of activities guide the data preprocessing phase, one of them is the data transformation from raw data to training and test data. This complex and multidisciplinary phase involves concepts and structured knowledge in distinct and particular ways in the literatures and specialized tools, demanding data scientists with suitable expertise. In this work, we present PPO-O, a reference ontology of the data preprocessing operators, to identify and represent the semantics of the concepts related to the data preprocessing phase. Moreover, the ontology highlights data preprocessing operators to the preparation of the training and test datasets. Based on PPO-O, Assistant-PP tool was developed, which made it capable to capture the retrospective data provenance during the execution of data preprocessing operators, facilitating the reproducibility and explainability of the dataset created. This approach might be helpful to non-experts users in data preprocessing.
DownloadPaper Citation
in Harvard Style
Moura L., da Silva M., Cordeiro K. and Cavalcanti M. (2021). A Well-founded Ontology to Support the Preparation of Training and Test Datasets. In Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-509-8, pages 99-110. DOI: 10.5220/0010460000990110
in Bibtex Style
@conference{iceis21,
author={Lucimar Moura and Marcus da Silva and Kelli Cordeiro and Maria Cavalcanti},
title={A Well-founded Ontology to Support the Preparation of Training and Test Datasets},
booktitle={Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2021},
pages={99-110},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010460000990110},
isbn={978-989-758-509-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - A Well-founded Ontology to Support the Preparation of Training and Test Datasets
SN - 978-989-758-509-8
AU - Moura L.
AU - da Silva M.
AU - Cordeiro K.
AU - Cavalcanti M.
PY - 2021
SP - 99
EP - 110
DO - 10.5220/0010460000990110