A SEMI-SUPERVISED ENSEMBLE ALGORITHM WITH PROBABILISTIC WEIGHTS

Heidy-Marisol Marin-Castro, Miguel Morales-Sandoval, L. Enrique Sucar, Eduardo F. Morales

Abstract

This paper introduces a semi-supervised ensemble of classifiers, called WSA(Weighted Semi-supervised AdaBoost). This ensemble can significantly improve the data classification data by exploiting the use of labeled and unlabeled data. WSA is based on Adaboost, a supervised ensemble algorithm, however, it also considers the unlabeled data during the training process. WSA works with a set of Naive Bayes base classifiers which are combined in a cascade-based technique as in AdaBoost. At each stage of WSA, the current classifier of the ensemble is trained using the classification results of labeled and unlabeled data obtained by the classifier at the previous stage. Then, classification is performed and the results are used for training the next classifier of the ensemble. Unlike other semi-supervised approaches, the unlabeled instances are weighted using a probabilistic measurement of the predicted labels by the current classifier. This reduces the strong bias that dubious classification of unlabeled data may produced on semi-supervised learning algorithms. Experimental results on different benchmark data sets show that this technique significantly increases the performance of a semi-supervised learning algorithm.

References

  1. Bennett, K., Demiriz, A., and Maclin, R. (2002). Exploiting unlabeled data in ensemble methods. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 289-296, NY. ACM Press.
  2. Buc, D. F., Grandvalet, Y., and Ambroise, C. (2002). Semisupervised marginboost. In Advances in Neural Information Processing Systems (NIPS), pages 553-560.
  3. Chapelle, O., Schölkopf, B., and Zien, A. (2006). SemiSupervised Learning (Adaptive Computation and Machine Learning). The MIT Press.
  4. Chen, K. and Wang, S. (2008). Regularized boost for semisupervised learning. In Advances in Neural Information Processing Systems, pages 281-288.
  5. Domingos, P., Pazzani, M., one Loss, U. Z., Domingos, P., and Pazzani, M. (1997). On the optimality of the simple bayesian classifier under zero-one loss.
  6. Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In International Conference on Machine Learning, pages 148-156.
  7. Mitchell, T. (1997). Machine Learning. Carnegie Mellon University.
  8. Newman, D., Hettich, S., Blake, C., and Merz, C. (1998). UCI Repository of machine learning databases. University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html.
  9. Quinlan, J. R. (1996). Bagging, boosting, and c4.5. In In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 725-730. AAAI Press.
  10. Witten, I. and Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. 2nd Edition, Morgan Kaufmann, San Francisco.
  11. Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pages 189-196.
Download


Paper Citation


in Harvard Style

Marin-Castro H., Morales-Sandoval M., Enrique Sucar L. and F. Morales E. (2009). A SEMI-SUPERVISED ENSEMBLE ALGORITHM WITH PROBABILISTIC WEIGHTS . In - KDIR, (IC3K 2009) ISBN , pages 0-0


in Bibtex Style

@conference{kdir09,
author={Heidy-Marisol Marin-Castro and Miguel Morales-Sandoval and L. Enrique Sucar and Eduardo F. Morales},
title={A SEMI-SUPERVISED ENSEMBLE ALGORITHM WITH PROBABILISTIC WEIGHTS},
booktitle={ - KDIR, (IC3K 2009)},
year={2009},
pages={},
publisher={SciTePress},
organization={INSTICC},
doi={},
isbn={},
}


in EndNote Style

TY - CONF
JO - - KDIR, (IC3K 2009)
TI - A SEMI-SUPERVISED ENSEMBLE ALGORITHM WITH PROBABILISTIC WEIGHTS
SN -
AU - Marin-Castro H.
AU - Morales-Sandoval M.
AU - Enrique Sucar L.
AU - F. Morales E.
PY - 2009
SP - 0
EP - 0
DO -