On using Additional Unlabeled Data for Improving Dissimilarity-Based Classifications

Sang-Woon Kim

Abstract

This paper reports an experimental result obtained with additionally using unlabeled data together with labeled ones to improve the classification accuracy of dissimilarity-based methods, namely, dissimilarity-based classifications (DBC) (Pe¸kalska, E. and Duin, R. P .W., 2005). In DBC, classifiers among classes are not based on the feature measurements of individual objects, but rather on a suitable dissimilarity measure among the objects. In image classification tasks, on the other hand, one of the intractable problems is the lack of information caused by the insufficient number of data. To address this problem in DBC, in this paper we study a new way of measuring the dissimilarity distance between two object images by using the well-known one-shot similarity metric (OSS) (Wolf, L. et al., 2009). In DBC using OSS, the dissimilarity distance is measured based on unlabeled (background) data that do not belong to the classes being learned, and consequently, do not require labeling. From this point of view, the classification is done in a semi-supervised learning (SSL) framework. Our experimental results, obtained with well-known benchmarks, demonstrate that when the cardinalities of the unlabeled data set and the prototype set have been appropriately chosen using additional unlabeled data for the OSS metric in SSL, DBC can be improved in terms of classification accuracies.

References

  1. Ben-David, S., Lu, T., and Pal, D. (2008). Does unlabeled data provably help? Worst-case analysis of the sample complexity of semi-supervised learning. In Proc. of the 21st Annual Conf. on Learning Theory (COLT08), pages 33-44, Helsinki, Finland.
  2. Bicegoa, M., Murinoa, V., and Figueiredob, M. A. T. (2004). Similarity based classification of sequences using hidden markov models. Pattern Recognition, 37:2281-2291.
  3. Blum, A. and Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proc. of the 11th Annual Conf. on Computational Learning Theory (COLT 98), pages 92-100, Madison, WI.
  4. Chapelle, O., Schölkopf, B., and Zien, A. (2006). SemiSupervised Learning. The MIT Press, MA.
  5. Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern Classification Second Edition. John Wiley & Sons.
  6. Duin, R. P .W. (2011). Non-euclidean problems in pattern recognition related to human expert knowledge. In Proc. of ICEIS2010. Springer-Verlag.
  7. Duin, R. P .W., Juszczak, P., de Ridder, D., Paclík, P., Pe¸kalska, E., and Tax, D. M. J. (2004). PRTools 4: A Matlab Toolbox for Pattern Recognition. Delft University of Technology, Delft, The Netherlands.
  8. Frank, A. and Asuncion, A. (2010). UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA.
  9. Kim, S. -W. and Oommen, B. J. (2007). On using prototype reduction schemes to optimize dissimilaritybased classification. Pattern Recognition, 40:2946- 2957.
  10. Mallapragada, P. K., Jin, R., Jain, A. K., and Liu, Y. (2009). Semiboost: Boosting for semi-supervised learning. IEEE Trans. Pattern Anal. and Machine Intell., 31(11):2000-2014.
  11. McClosky, D., Charniak, E., and Johnson, M. (2008). When is self-training effective for parsing? In Proc. of the 22nd Int'l Conf. on Computational Linguistics (Coling 2008), pages 561-568, Manchester, UK.
  12. Millán-Giraldo, M., García, V., and Sánchez, J. S. (2012). Prototype selection in imbalanced data for dissimilarity representation - A preliminary study. In Proc. of the 1st Int'l Conf. on Pattern Recognition Applications and Methods (ICPRAM 2012), pages 242-246.
  13. Orozco-Alzate, M., Duin, R. P .W., and CastellanosDominguez, G. (2009). A generalization of dissimilarity representations using feature lines and feature planes. Pattern Recognition Letters, 30:242-254.
  14. Pe¸kalska, E. and Duin, R. P .W. (2005). The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientific, Singapore.
  15. Pe¸kalska, E. and Duin, R. P .W. (2008). Beyond traditional kernels: Classification in two dissimilarity-based representation spaces. IEEE Trans. Sys., Man, and Cybern.(C), 38(6):727-744.
  16. Wolf, L., Hassner, T., and Taigman, Y. (2009). The OneShot Similarity Kernel. In T. Matsuyama, C. Cipolla, et al., editor, Proc. IEEE Intl Conf. Computer Vision, pages 897-902. IEEE Computer Society Press.
  17. Wolf, L., Hassner, T., and Taigman, Y. (2011). Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE Trans. Pattern Anal. and Machine Intell., 33(10):1978-1990.
Download


Paper Citation


in Harvard Style

Kim S. (2013). On using Additional Unlabeled Data for Improving Dissimilarity-Based Classifications . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 132-137. DOI: 10.5220/0004218901320137


in Bibtex Style

@conference{icpram13,
author={Sang-Woon Kim},
title={On using Additional Unlabeled Data for Improving Dissimilarity-Based Classifications},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2013},
pages={132-137},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004218901320137},
isbn={978-989-8565-41-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - On using Additional Unlabeled Data for Improving Dissimilarity-Based Classifications
SN - 978-989-8565-41-9
AU - Kim S.
PY - 2013
SP - 132
EP - 137
DO - 10.5220/0004218901320137