A First Approach on Big Data Missing Values Imputation

Besay Montesdeoca, Julián Luengo, Jesús Maillo, Diego García-Gil, Salvador García, Francisco Herrera

2019

Abstract

Albeit most techniques and algorithms assume that the data is accurate, measurements in our analogic world are far from being perfect. Since our capabilities of storing and processing data are growing everyday, these imperfections will accumulate, generating poorer decisions and hindering any knowledge extraction process carried out over the raw data. One of the most disturbing imperfections is the presence of missing values. Many inductive algorithms assume that the data is complete, thus if they face missing data they will not work properly or the quality of the knowledge extracted will be poorer. At this point there is no sophisticated missing values treatment implemented in any major Big Data framework. In this contribution, we present two novel imputation methods based on clustering that achieve better results than simply removing the faulty examples or filling-in the missing values with the mean that can be easily ported to Spark’s MLlib.

Download


Paper Citation


in Harvard Style

Montesdeoca B., Luengo J., Maillo J., García-Gil D., García S. and Herrera F. (2019). A First Approach on Big Data Missing Values Imputation.In Proceedings of the 4th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS, ISBN 978-989-758-369-8, pages 315-323. DOI: 10.5220/0007738403150323


in Bibtex Style

@conference{iotbds19,
author={Besay Montesdeoca and Julián Luengo and Jesús Maillo and Diego García-Gil and Salvador García and Francisco Herrera},
title={A First Approach on Big Data Missing Values Imputation},
booktitle={Proceedings of the 4th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,},
year={2019},
pages={315-323},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007738403150323},
isbn={978-989-758-369-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 4th International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,
TI - A First Approach on Big Data Missing Values Imputation
SN - 978-989-758-369-8
AU - Montesdeoca B.
AU - Luengo J.
AU - Maillo J.
AU - García-Gil D.
AU - García S.
AU - Herrera F.
PY - 2019
SP - 315
EP - 323
DO - 10.5220/0007738403150323