On Selecting Useful Unlabeled Data Using Multi-view Learning Techniques

Thanh-Binh Le, Sang-Woon Kim

Abstract

In a semi-supervised learning approach, using a selection strategy, strongly discriminative examples are first selected from unlabeled data and then, together with labeled data, utilized for training a (supervised) classifier. This paper investigates a new selection strategy for the case when the data are composed of different multiple views: first, multiple views of the data are derived independently; second, each of the views are used for measuring corresponding confidences with which examples to be selected are evaluated; third, all the confidence levels measured from the multiple views are used as a weighted average for deriving a target confidence; this selecting-and-training is repeated for a predefined number of iterations. The experimental results, obtained using synthetic and real-life benchmark data, demonstrate that the proposed mechanism can compensate for the shortcomings of the traditional strategies. In particular, the results demonstrate that when the data is appropriately decomposed into multiple views, the strategy can achieve further improved results in terms of the classification accuracy.

References

  1. Asuncion, A. and Newman, D. J. (2007). UCI Machine Learning Repository. Irvine, CA. University of California, School of Information and Computer Science.
  2. Blum, A. and Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proc. the 11th Ann. Conf. Computational Learning Theory (COLT98), pages 92-100, Madison, WI.
  3. Chang, C. -C. and Lin, C. -J. (2011). LIBSVM : a library for support vector machines. ACM Trans. on Intelligent Systems and Technology, 2(3):1-27.
  4. Chen, M., Weinberger, K. Q., and Chen, Q. (2011). Automatic feature decomposition for single view cotraining. In Proc. of the 28 th Int'l Conf. Machine Learning (ICML-11), pages 953-960, Bellevue, Washington, USA.
  5. Dagan, I. and Engelson, S. P. (1995). Committee-based sampling for training probabilistic classifiers. In A. Prieditis, S. J. Russell, editor, Proc. Int'l Conf. on Machine Learning, pages 150-157, Tahoe City, CA.
  6. de Sa, V. (1994). Learning classification with unlabeled data. In Advances in Neural Information Processing Systems (NIPS), volume 6, pages 112-119.
  7. Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning. Springer.
  8. Ho, T. K. and Basu, M. (2002). Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. and Machine Intell., 24(3):289-300.
  9. Kumar, A. and Daume III, H. (2011). A co-training approach for multi-view spectral clustering. In Getoor, L. and Scheffer, T., editors, Proc. of the 28th Int'l Conf. on Machine Learning (ICML-11), ICML 7811, pages 393-400, New York, NY, USA. ACM.
  10. Le, T. -B. and Kim, S. -W. (2014). On selecting helpful unlabeled data for improving semi-supervised support vector machines. In A. Fred, M. de Marsico, and A. Tabbone, editor, Proc. the 3rd Int'l Conf. Pattern Recognition Applications and Methods (ICPRAM 2014), pages 48-59, Angers, France.
  11. Li, Y. -F. and Zhou, Z. -H. (2011). Improving semisupervised support vector machines through unlabeled instances selection. In Proc. the 25th AAAI Conf. on Artificial Intelligence (AAAI'11), pages 386-391, San Francisco, CA.
  12. Mallapragada, P. K., Jin, R., Jain, A. K., and Liu, Y. (2009). Semiboost: Boosting for semi-supervised learning. IEEE Trans. Pattern Anal. and Machine Intell., 31(11):2000-2014.
  13. Reitmaier, T. and Sick, B. (2013). Let us know your decision: Pool-based active training of a generative classifier with the selection strategy 4DS. Information Sciences, 230:106-131.
  14. Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Proc. of the 33rd annual meeting on Association for Computational Linguistics (ACL'95), pages 189-196, Cambridge, MA.
  15. Zhu, X. and Goldberg, A. B. (2009). Introduction to Semi-Supervised Learning. Morgan & Claypool, San Rafael, CA.
Download


Paper Citation


in Harvard Style

Le T. and Kim S. (2015). On Selecting Useful Unlabeled Data Using Multi-view Learning Techniques . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-076-5, pages 157-164. DOI: 10.5220/0005171301570164


in Bibtex Style

@conference{icpram15,
author={Thanh-Binh Le and Sang-Woon Kim},
title={On Selecting Useful Unlabeled Data Using Multi-view Learning Techniques},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2015},
pages={157-164},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005171301570164},
isbn={978-989-758-076-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - On Selecting Useful Unlabeled Data Using Multi-view Learning Techniques
SN - 978-989-758-076-5
AU - Le T.
AU - Kim S.
PY - 2015
SP - 157
EP - 164
DO - 10.5220/0005171301570164