Preserving Prediction Accuracy on Incomplete Data Streams

Olivier Parisot, Yoanne Didry, Thomas Tamisier, Benoît Otjacques

Abstract

Model tree is a useful and convenient method for predictive analytics in data streams, combining the interpretability of decision trees with the efficiency of multiple linear regressions. However, missing values within the data streams is a crucial issue in many real world applications. Often, this issue is solved by pre-processing techniques applied prior to the training phase of the model. In this article we propose a new method that proceeds by estimating and adjusting missing values before the model tree creation. A prototype has been developed and experimental results on several benchmarks show that the method improves the accuracy of the resulting model tree.

References

  1. Bache, K. and Lichman, M. (2013). UCI M.L. repository.
  2. Bifet, A., Holmes, G., Kirkby, R., and Pfahringer, B. (2010). Moa: Massive online analysis. The Journal of Machine Learning Research, 11:1601-1604.
  3. Breslow, L. A. and Aha, D. W. (1997). Simplifying decision trees: A survey. Knowl. Eng. Rev., 12(1):1-40.
  4. Cortez, P., Cerdeira, A., Almeida, F., Matos, T., and Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4):547 - 553. Smart Business Networks: Concepts and Empirical Evidence.
  5. Domingos, P. and Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 71-80. ACM.
  6. Enders, C. K. (2010). Applied missing data analysis. Guilford Publications.
  7. Farhangfar, A., Kurgan, L., and Dy, J. (2008). Impact of imputation of missing values on classification error for discrete data. Pattern Recognit., 41(12):3692-3705.
  8. Fong, S. and Yang, H. (2011). The six technical gaps between intelligent applications and real-time data mining: A critical review. Journal of Emerging Technologies in Web Intelligence, 3(2).
  9. Ikonomovska, E. and Gama, J. (2008). Learning model trees from data streams. In Discovery Science, pages 52-63. Springer.
  10. Ikonomovska, E., Gama, J., Sebastia˜o, R., and Gjorgjevik, D. (2009). Regression trees from data streams with drift detection. In Discovery Science, pages 121-135.
  11. Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., and Kolehmainen, M. (2004). Methods for imputation of missing values in air quality data sets. Atmospheric Environment, 38(18):2895 - 2907.
  12. Kotsiantis, S. (2013). Decision trees: a recent overview. Artificial Intelligence Review, 39(4):261-283.
  13. Marwala, T. and Global, I. (2009). Computational intelligence for missing data imputation, estimation and management: knowledge optimization techniques. Information Science Reference Herhsey, USA.
  14. Murthy, S. K. (1998). Automatic construction of decision trees from data: A multi-disciplinary survey. Data Min. Knowl. Discov., 2(4):345-389.
  15. Mwale, F., Adeloye, A., and Rustum, R. (2012). Infilling of missing rainfall and streamflow data in the Shire River basin, Malawi-a SOM approach. Phys. and Chem. of the Earth, 50:34-43.
  16. Quinlan, J. R. (1992). Learning with continuous classes. In 5th Australian joint Conference on Artificial Intelligence, volume 92, pages 343-348. Singapore.
  17. Shmueli, G. and Koppius, O. R. (2011). Predictive analytics in information systems research. Mis Quarterly, 35(3):553-572.
  18. Stiglic, G., Kocbek, S., Pernek, I., and Kokol, P. (2012). Comprehensive decision tree models in bioinformatics. PLoS ONE, 7(3):e33812.
  19. Tfwala, S. S., Wang, Y.-M., and Lin, Y.-C. (2013). Prediction of missing flow records using multilayer perceptron and coactive neurofuzzy inference system. The Sc. World Journal, 2013.
  20. Van Buuren, S. (2012). Flexible imputation of missing data. CRC press.
  21. Wang, Y. and Witten, I. H. (1996). Induction of model trees for predicting continuous classes.
  22. Zhu, X. and Wu, X. (2004). Class noise vs. attribute noise: A quantitative study. A. I. Review, 22(3):177-210.
  23. Zhu, X., Zhang, P., Wu, X., He, D., Zhang, C., and Shi, Y. (2008). Cleansing noisy data streams. In ICDM 08, pages 1139-1144. IEEE.
Download


Paper Citation


in Harvard Style

Parisot O., Didry Y., Tamisier T. and Otjacques B. (2015). Preserving Prediction Accuracy on Incomplete Data Streams . In Proceedings of 4th International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-103-8, pages 91-96. DOI: 10.5220/0005553500910096


in Bibtex Style

@conference{data15,
author={Olivier Parisot and Yoanne Didry and Thomas Tamisier and Benoît Otjacques},
title={Preserving Prediction Accuracy on Incomplete Data Streams},
booktitle={Proceedings of 4th International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2015},
pages={91-96},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005553500910096},
isbn={978-989-758-103-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of 4th International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Preserving Prediction Accuracy on Incomplete Data Streams
SN - 978-989-758-103-8
AU - Parisot O.
AU - Didry Y.
AU - Tamisier T.
AU - Otjacques B.
PY - 2015
SP - 91
EP - 96
DO - 10.5220/0005553500910096