Data Cleansing with PDI for Improving Data Quality
Siti Aulia Noor, Tien Fabrianti Kusumasari, Muhammad Azani Hasibuan
2019
Abstract
Technological developments that will quickly produce diverse data or information can improve the decision-making process. This causes the organization to require quality data so that it can be used as a basis for decision making that can truly be trusted. Data quality is an important supporting factor for processing data to produce valid information that can be beneficial to the company. Therefore, in this paper we will discuss data cleaning to improve data quality by using open source tools. As an open source tool used in this paper is Pentaho Data Integration (PDI). The cleaning data collection method in this paper includes data profiles, determine the processing algorithm for data cleansing, mapping algorithms of data collection to components in the PDI, and finally evaluating. Evaluation is done by comparing the results of research with existing data cleaning tools (OpenRefine and Talend). The results of the implementation of data cleansing show the character of data settings that form for Drug Circular Permit numbers with an accuracy of 0.0614. The advantage of the results of this study is that the data sources used can consist of databases with various considerations.
DownloadPaper Citation
in Harvard Style
Noor S., Kusumasari T. and Hasibuan M. (2019). Data Cleansing with PDI for Improving Data Quality.In Proceedings of the International Conference on Creative Economics, Tourism and Information Management - Volume 1: ICCETIM, ISBN 978-989-758-451-0, pages 256-261. DOI: 10.5220/0009868102560261
in Bibtex Style
@conference{iccetim19,
author={Siti Aulia Noor and Tien Fabrianti Kusumasari and Muhammad Azani Hasibuan},
title={Data Cleansing with PDI for Improving Data Quality},
booktitle={Proceedings of the International Conference on Creative Economics, Tourism and Information Management - Volume 1: ICCETIM,},
year={2019},
pages={256-261},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009868102560261},
isbn={978-989-758-451-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Creative Economics, Tourism and Information Management - Volume 1: ICCETIM,
TI - Data Cleansing with PDI for Improving Data Quality
SN - 978-989-758-451-0
AU - Noor S.
AU - Kusumasari T.
AU - Hasibuan M.
PY - 2019
SP - 256
EP - 261
DO - 10.5220/0009868102560261