An Active Learning Approach for Ensemble-based Data Stream Mining

Rabaa Alabdulrahman, Herna Viktor, Eric Paquet

Abstract

Data streams, where an instance is only seen once and where a limited amount of data can be buffered for processing at a later time, are omnipresent in today’s real-world applications. In this context, adaptive online ensembles that are able to learn incrementally have been developed. However, the issue of handling data that arrives asynchronously has not received enough attention. Often, the true class label arrives after with a time-lag, which is problematic for existing adaptive learning techniques. It is not realistic to require that all class labels be made available at training time. This issue is further complicated by the presence of late-arriving, slowly changing dimensions (i.e., late-arriving descriptive attributes). The aim of active learning is to construct accurate models when few labels are available. Thus, active learning has been proposed as a way to obtain such missing labels in a data stream classification setting. To this end, this paper introduces an active online ensemble (AOE) algorithm that extends online ensembles with an active learning component. Our experimental results demonstrate that our AOE algorithm builds accurate models against much smaller ensemble sizes, when compared to traditional ensemble learning algorithms. Further, our models are constructed against small, incremental data sets, thus reducing the number of examples that are required to build accurate ensembles.

References

  1. Bifet, A. & Kirkby, R. 2009. Data Stream Mining a Practical Approach. The University of Waikato: Citeseer.
  2. Breiman, L., 1996. Bagging predictors. Machine learning, 24(2), pp.123-140.
  3. Bryll, R., Gutierrez-osuna, R. & Quek, F., 2003. Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern recognition, 36(6), pp.1291-1302.
  4. Chu, W., Zinkevich, M., Li, L., Thomas, A. & Tseng, B., 2011. Unbiased online active learning in data streams. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 195-203). ACM.
  5. De Souza, E. N. & Matwin, S., 2013, May. Improvements to Boosting with Data Streams. In Canadian Conference on Artificial Intelligence (pp. 248-255). Springer Berlin Heidelberg.
  6. Ghosh, A. K., 2006. On optimum choice of k in nearest neighbor classification. Computational Statistics & Data Analysis, 50(11), pp.3113-3123.
  7. Lichman, M., 2013. UCI Machine Learning Repository [https://archive.ics.uci.edu/ml]. University of California, School of Information and Computer Science. Irvine, CA.
  8. Mamitsuka, H. & Abe, N., 2007. Active ensemble learning: Application to data mining and bioinformatics. Systems and Computers in Japan, 38(11), pp.100-108.
  9. Muhivumundo, D. & Viktor, H., 2011. Detecting Data Duplication through Active Ensemble-based Learning. The IEEE African Conference on Software Engineering and Applied Computing (ACSEAC 2011). Cape Town, South Africa: IEEE.
  10. Oza, N. C., 2005, October. Online bagging and boosting. In 2005 IEEE international conference on Systems, man and cybernetics (Vol. 3, pp. 2340-2345). IEEE.
  11. Polikar, R., Upda, L., Upda, S. S. & Honavar, V., 2001. Learn++: An incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 31(4), pp. 497-508.
  12. Read, J., Bifet, A., Pfahringer, B. & Holmes, G., 2012, October. Batch-incremental versus instanceincremental learning in dynamic and evolving data. In International Symposium on Intelligent Data Analysis (pp.313-323). Springer Berlin Heidelberg.
  13. Sculley, D., 2007a, August. Online Active Learning Methods for Fast Label-Efficient Spam Filtering. In CEAS (Vol. 7, p.143).
  14. Sculley, D., 2007b, August. Practical learning from onesided feedback. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 609-618). ACM.
  15. Stefanowski, J. & Pachocki, M., 2009. Comparing performance of committee based approaches to active learning. Recent Advances in Intelligent Information Systems, pp.457-470.
  16. Witten, I. H. & Frank, E., 2005. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
  17. Zhu, X., Zhang, P., Lin, X. & Shi, Y., 2007, October. Active learning from data streams. In Seventh IEEE International Conference on Data Mining (ICDM 2007)(pp. 757-762). IEEE.
  18. Zliobaite, I., Bifet, A., Pfahringer, B. & Holmes, G., 2014. Active learning with drifting streaming data. IEEE transactions on neural networks and Learning Systems, 25(1), pp.27-39.
Download


Paper Citation


in Harvard Style

Alabdulrahman R., Viktor H. and Paquet E. (2016). An Active Learning Approach for Ensemble-based Data Stream Mining . In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016) ISBN 978-989-758-203-5, pages 275-282. DOI: 10.5220/0006047402750282


in Bibtex Style

@conference{kdir16,
author={Rabaa Alabdulrahman and Herna Viktor and Eric Paquet},
title={An Active Learning Approach for Ensemble-based Data Stream Mining},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},
year={2016},
pages={275-282},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006047402750282},
isbn={978-989-758-203-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)
TI - An Active Learning Approach for Ensemble-based Data Stream Mining
SN - 978-989-758-203-5
AU - Alabdulrahman R.
AU - Viktor H.
AU - Paquet E.
PY - 2016
SP - 275
EP - 282
DO - 10.5220/0006047402750282