A Statistical Decision Tree Algorithm for Data Stream Classification

Mirela Teixeira Cazzolato, Marcela Xavier Ribeiro, Cristiane Yaguinuma, Marilde Terezinha Prado Santos

2013

Abstract

A large amount of data is generated daily. Credit card transactions, monitoring networks, sensors and telecommunications are some examples among many applications that generate large volumes of data in an automated way. Data streams storage and knowledge extraction techniques differ from those used on traditional data. In the context of data stream classification many incremental techniques has been proposed. In this paper we present an incremental decision tree algorithm called StARMiner Tree (ST), which is based on Very Fast Decision Tree (VFDT) system, which deals with numerical data and uses a method based on statistics as a heuristic to decide when to split a node and also to choose the best attribute to be used in the test at a node. We applied ST in four datasets, two synthetic and two real-world, comparing its performance to the VFDT. In all experiments ST achieved a better accuracy, dealing well with noise data and describing well the data from the earliest examples. However, in three of four experiments ST created a bigger tree. The obtained results indicate that ST is a good classifier using large and smaller datasets, maintaining good accuracy and execution time.

References

  1. Bifet, A.. 2010. Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. Ebsco Publishing, ISBN 9781607504726.
  2. Chen Li, Zhang, Y., Xue Li, 2009. OcVFDT: one-class very fast decision tree for one-class classification of data streams. Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data. Paris, France: ACM.
  3. Domingos, P., Hulten, G, 2000. Mining High-Speed Data Streams. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, Massachusetts, United States: ACM: 71-80.
  4. Gama, J., Rocha, R., Medas, P., 2003. Accurate decision trees for mining high-speed data streams. Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, D.C.: ACM: 523-528.
  5. Hulten, G., Spencer, L., Domingos, P., 2001. Mining timechanging data streams. Proceedings of the seventh ACM SIGKDD international conference on Knowledge Discovery and Data Mining. San Francisco, California: ACM.
  6. Partil, A., Attar, V., 2011. Framework for Performance Comparison of Classifiers. In: Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011). December 20-22, 2011.
  7. Ribeiro, M. X., Balan, A. G. R., Felipe, J. C., Traina, A. J. M., Traina Jr., C., 2005. Mining Statistical Association Rules to Select the Most Relevant Medical Image Features. First International Workshop on Mining Complex Data (IEEE MCD'05), Houston, USA. IEEE Computer Society, 91-98.
  8. Yang, H., Fong, S., 2011. Optimized very fast decision tree with balanced classification accuracy and compact tree size. In 3rd International Conference on Data Mining and Intelligent Information Technology Applications (ICMiA), 2011, 24-26 Oct. 57-64.
  9. Zia-Ur Rehman, M., Tian-Rui Li, Tao Li, 2012. Exploiting empirical variance for data stream classification. Journal of Shanghai Jiaotong University (Science), vol. 17, 245-250.
Download


Paper Citation


in Harvard Style

Teixeira Cazzolato M., Xavier Ribeiro M., Yaguinuma C. and Terezinha Prado Santos M. (2013). A Statistical Decision Tree Algorithm for Data Stream Classification . In Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8565-59-4, pages 217-223. DOI: 10.5220/0004447202170223


in Bibtex Style

@conference{iceis13,
author={Mirela Teixeira Cazzolato and Marcela Xavier Ribeiro and Cristiane Yaguinuma and Marilde Terezinha Prado Santos},
title={A Statistical Decision Tree Algorithm for Data Stream Classification},
booktitle={Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2013},
pages={217-223},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004447202170223},
isbn={978-989-8565-59-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - A Statistical Decision Tree Algorithm for Data Stream Classification
SN - 978-989-8565-59-4
AU - Teixeira Cazzolato M.
AU - Xavier Ribeiro M.
AU - Yaguinuma C.
AU - Terezinha Prado Santos M.
PY - 2013
SP - 217
EP - 223
DO - 10.5220/0004447202170223