Data Cleansing with PDI for Improving Data Quality

Siti Aulia Noor, Tien Fabrianti Kusumasari, Muhammad Azani Hasibuan

2019

Abstract

Technological developments that will quickly produce diverse data or information can improve the decision-making process. This causes the organization to require quality data so that it can be used as a basis for decision making that can truly be trusted. Data quality is an important supporting factor for processing data to produce valid information that can be beneficial to the company. Therefore, in this paper we will discuss data cleaning to improve data quality by using open source tools. As an open source tool used in this paper is Pentaho Data Integration (PDI). The cleaning data collection method in this paper includes data profiles, determine the processing algorithm for data cleansing, mapping algorithms of data collection to components in the PDI, and finally evaluating. Evaluation is done by comparing the results of research with existing data cleaning tools (OpenRefine and Talend). The results of the implementation of data cleansing show the character of data settings that form for Drug Circular Permit numbers with an accuracy of 0.0614. The advantage of the results of this study is that the data sources used can consist of databases with various considerations.

Download


Paper Citation


in Harvard Style

Noor S., Kusumasari T. and Hasibuan M. (2019). Data Cleansing with PDI for Improving Data Quality.In Proceedings of the International Conference on Creative Economics, Tourism and Information Management - Volume 1: ICCETIM, ISBN 978-989-758-451-0, pages 256-261. DOI: 10.5220/0009868102560261


in Bibtex Style

@conference{iccetim19,
author={Siti Aulia Noor and Tien Fabrianti Kusumasari and Muhammad Azani Hasibuan},
title={Data Cleansing with PDI for Improving Data Quality},
booktitle={Proceedings of the International Conference on Creative Economics, Tourism and Information Management - Volume 1: ICCETIM,},
year={2019},
pages={256-261},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009868102560261},
isbn={978-989-758-451-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the International Conference on Creative Economics, Tourism and Information Management - Volume 1: ICCETIM,
TI - Data Cleansing with PDI for Improving Data Quality
SN - 978-989-758-451-0
AU - Noor S.
AU - Kusumasari T.
AU - Hasibuan M.
PY - 2019
SP - 256
EP - 261
DO - 10.5220/0009868102560261