AN ADAPTIVE SELECTIVE ENSEMBLE FOR DATA STREAMS CLASSIFICATION

Valerio Grossi, Franco Turini

Abstract

The large diffusion of different technologies related to web applications, sensor networks and ubiquitous computing, has introduced new important challenges for the data mining community. The rising need of analyzing data streams introduces several requirements and constraints for a mining system. This paper analyses a set of requirements related to the data streams environment, and proposes a new adaptive method for data streams classification. The system employs data aggregation techniques that, coupled with a selective ensemble approach, perform the classification task. The approach adopts the behaviour of the selective ensemble by dynamically updating the threshold for enabling the classifiers. The system is explicitly conceived to satisfy these requirements even in the presence of concept drifting.

References

  1. Aggarwal, C. C., Han, J., Wang, J., and Yu, P. (2003). A framework for clustering evolving data streams. In Proceedings of the 2003 International Conference on Very Large Data Bases (VLDB'03), pages 81-92, Berlin, Germany.
  2. Aggarwal, C. C., Han, J., Wang, J., and Yu, P. (2004a). A framework for projected clustering of high dimensional data streams. In Proceedings of the 2004 International Conference on Very Large Data Bases (VLDB'04), pages 852-863, Toronto, Canada.
  3. Aggarwal, C. C., Han, J., Wang, J., and Yu, P. (2004b). On demand classification of data streams. In Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining (KDD'04), pages 503- 508, Seattle, WA.
  4. Baena-Garcia, M., del Campo-Avila, J., Fidalgo, R., Bifet, A., Ravalda, R., and Morales-Bueno, R. (2006). Early drift detection method. In International Workshop on Knowledge Discovery from Data Streams.
  5. Bifet, A., Holmes, G., Pfahringer, B., Kirby, R., and Gavaldá, R. (2009). New ensemble methods for evolving data streams. In Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining, pages 139-148.
  6. Gama, J., Medas, P., Castillo, G., and Rodrigues, P. (2004). Learning with drift detection. In SBIA Brazilian Symposium on Artificial Intelligence, pages 286-295.
  7. Gama, J. and Pinto, C. (2006). Discretization from data streams: applications to histograms and data mining. In Proceedings of the 2006 ACM symposium on Applied computing (SAC'06), pages 662-667, Dijon, France.
  8. Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., and Strauss, M. (2002). Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the 2002 Annual ACM Symposium on Theory of Computing (STOC'02), pages 389-398, Montreal, Quebec, Canada.
  9. Grossi, V. (2009). A New Framework for Data Streams Classification. PhD thesis, Supervisor Prof. Franco Turini, University of Pisa.
  10. Grossi, V. and Turini, F. (2010). A new selective ensemble approach for data streams classification. In Proceedings of the 2010 International Conference in Artificial Intelligence and Applications (AIA'2010), pages 339- 346, Innsbruck, Austria.
  11. Guha, S., Koudas, N., and Shim, K. (2001). Datastreams and histograms. In Proceedings of the 2001 Annual ACM Symposium on Theory of Computing (STOC'01), pages 471-475, Heraklion, Crete, Greece.
  12. Hulten, G., Spencer, L., and Domingos, P. (2001). Mining time changing data streams. In Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD'01), pages 97-106, San Francisco, CA.
  13. Klinkenberg, R. (2004). Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, 8:281-300.
  14. Kolter, J. Z. and Maloof, M. A. (2005). Using additive expert ensembles to cope with concept drift. In Proceedings of the 22nd International Conference on Machine learning (ICML'05), pages 449-456, Bonn, Germany.
  15. Kolter, J. Z. and Maloof, M. A. (2007). Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research, 8:2755-2790.
  16. Lin, X. and Zhang, Y. (2008). Aggregate computation over data streams. In Procedings of the 10th Asia Pacific Web Conference (APWeb'08), pages 10-25, Shenyang, China.
  17. Oza, N. C. and Russell, S. (2001). Online bagging and boosting. In Proceedings of 8th International Workshop on Artificial Intelligence and Statistics (AISTATS'01), pages 105-112, Key West, FL.
  18. Pfahringer, B., Holmes, G., and Kirkby, R. (2008). Handling numeric attributes in hoeffding trees. In Proceeding of the 2008 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'08), pages 296-307, Osaka, Japan.
  19. Schlimmer, J. C. and Granger, R. H. (1986). Beyond incremental processing: Tracking concept drift. In Proceedings of the 5th National Conference on Artificial Intelligence, pages 502-507, Menlo Park, CA.
  20. Scholz, M. and Klinkenberg, R. (2005). An ensemble classifier for drifting concepts. In Proceeding of 2nd International Workshop on Knowledge Discovery from Data Streams, in conjunction with ECML-PKDD 2005, pages 53-64, Porto, Portugal.
  21. Street, W. N. and Kim, Y. (2001). A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD'01), pages 377-382, San Francisco, CA.
Download


Paper Citation


in Harvard Style

Grossi V. and Turini F. (2011). AN ADAPTIVE SELECTIVE ENSEMBLE FOR DATA STREAMS CLASSIFICATION . In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-40-9, pages 136-145. DOI: 10.5220/0003183501360145


in Bibtex Style

@conference{icaart11,
author={Valerio Grossi and Franco Turini},
title={AN ADAPTIVE SELECTIVE ENSEMBLE FOR DATA STREAMS CLASSIFICATION},
booktitle={Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2011},
pages={136-145},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003183501360145},
isbn={978-989-8425-40-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - AN ADAPTIVE SELECTIVE ENSEMBLE FOR DATA STREAMS CLASSIFICATION
SN - 978-989-8425-40-9
AU - Grossi V.
AU - Turini F.
PY - 2011
SP - 136
EP - 145
DO - 10.5220/0003183501360145