USING SELF-SIMILARITY TO ADAPT EVOLUTIONARY ENSEMBLES FOR THE DISTRIBUTED CLASSIFICATION OF DATA STREAMS

Clara Pizzuti, Giandomenico Spezzano

Abstract

Distributed stream-based classification methods have many important applications such as sensor data analysis, network security, and business intelligence. An important challenge is to address the issue of concept drift in the data stream environment, which is not easily handled by the traditional learning techniques. This paper presents a Genetic Programming (GP) based boosting ensemble method for the classification of distributed streaming data able to adapt in presence of concept drift. The approach handles flows of data coming from multiple locations by building a global model obtained by the aggregation of the local models coming from each node. The algorithm uses a fractal dimension-based change detection strategy, based on self-similarity of the ensemble behavior, that permits the capture of time-evolving trends and patterns in the stream, and to reveal changes in evolving data streams. Experimental results on a real life data set show the validity of the approach in maintaining an accurate and up-to-date GP ensemble.

References

  1. Abdulsalam, H., Skillicorn, D. B., and Martin, P. (2008). Classifying evolving data streams using dynamic streaming random forests. In DEXA 7808: Proceedings of the 19th international conference on Database and Expert Systems Applications, pages 643-651, Berlin, Heidelberg. Springer-Verlag.
  2. CantĂș-Paz, E. and Kamath, C. (2003). Inducing oblique decision trees with evolutionary algorithms. IEEE Transaction on Evolutionary Computation, 7(1):54- 68.
  3. Folino, G., Pizzuti, C., and Spezzano, G. (1999). A cellular genetic programming approach to classification. In Proc. Of the Genetic and Evolutionary Computation Conference GECCO99, pages 1015-1020, Orlando, Florida. Morgan Kaufmann.
  4. Gehrke, J., Ganti, V., Ramakrishnan, R., and Loh, W. (1999). Boat - optimistic decision tree construction. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'99), pages 169-180. ACM Press.
  5. Grassberger, P. (1983). Generalized dimensions of strange attractors. Physics Letters, 97A:227-230.
  6. Iba, H. (1999). Bagging, boosting, and bloating in genetic programming. In Proc. Of the Genetic and Evolutionary Computation Conference GECCO99, pages 1053-1060, Orlando, Florida. Morgan Kaufmann.
  7. Liebovitch, L. and Toth, T. (1989). A fast algorithm to determine fractal dimensions by box counting. Physics Letters, 141A(8):-.
  8. Mandelbrot, B. (1983). The Fractal Geometry of Nature. W.H Freeman, New York.
  9. Sarraille, J. and DiFalco, P. (1990). http://tori.postech.ac.kr/softwares.
  10. Schapire, R. E. (1996). Boosting a weak learning by majority. Information and Computation, 121(2):256-285.
  11. Utgoff, P. E. (1989). Incremental induction of decision trees. Machine Learning, 4:161-186.
  12. Valizadegan, H. and Tan, P.-N. (2007). A prototype-driven framework for change detection in data stream classification. In Proc. of IEEE Symposium on Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Computer Society.
  13. Wang, H., Fan, W., Yu, P., and Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the nineth ACM SIGKDD International conference on Knowledge discovery and data mining (KDD'03),, pages 226-235, Washington, DC, USA. ACM.
Download


Paper Citation


in Harvard Style

Pizzuti C. and Spezzano G. (2010). USING SELF-SIMILARITY TO ADAPT EVOLUTIONARY ENSEMBLES FOR THE DISTRIBUTED CLASSIFICATION OF DATA STREAMS . In Proceedings of the International Conference on Evolutionary Computation - Volume 1: ICEC, (IJCCI 2010) ISBN 978-989-8425-31-7, pages 176-181. DOI: 10.5220/0003074901760181


in Bibtex Style

@conference{icec10,
author={Clara Pizzuti and Giandomenico Spezzano},
title={USING SELF-SIMILARITY TO ADAPT EVOLUTIONARY ENSEMBLES FOR THE DISTRIBUTED CLASSIFICATION OF DATA STREAMS},
booktitle={Proceedings of the International Conference on Evolutionary Computation - Volume 1: ICEC, (IJCCI 2010)},
year={2010},
pages={176-181},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003074901760181},
isbn={978-989-8425-31-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Evolutionary Computation - Volume 1: ICEC, (IJCCI 2010)
TI - USING SELF-SIMILARITY TO ADAPT EVOLUTIONARY ENSEMBLES FOR THE DISTRIBUTED CLASSIFICATION OF DATA STREAMS
SN - 978-989-8425-31-7
AU - Pizzuti C.
AU - Spezzano G.
PY - 2010
SP - 176
EP - 181
DO - 10.5220/0003074901760181