PROXIMITY-BASED GRAPH EMBEDDINGS FOR MULTI-LABEL CLASSIFICATION

Tingting Mu, Sophia Ananiadou

Abstract

In many real applications of text mining, information retrieval and natural language processing, large-scale features are frequently used, which often make the employed machine learning algorithms intractable, leading to the well-known problem “curse of dimensionality”. Aiming at not only removing the redundant information from the original features but also improving their discriminating ability, we present a novel approach on supervised generation of low-dimensional, proximity-based, graph embeddings to facilitate multi-label classification. The optimal embeddings are computed from a supervised adjacency graph, called multi-label graph, which simultaneously preserves proximity structures between samples constructed based on feature and multi-label class information. We propose different ways to obtain this multi-label graph, by either working in a binary label space or a projected real label space. To reduce the training cost in the dimensionality reduction procedure caused by large-scale features, a smaller set of relation features between each sample and a set of representative prototypes are employed. The effectiveness of our proposed method is demonstrated with two document collections for text categorization based on the “bag of words” model.

References

  1. Barutcuoglu, Z., Schapire, R. E., and Troyanskaya, O. G. (2006). Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7):830-836.
  2. Bekkerman, R., Tishby, N., Winter, Y., Guyon, I., and Elisseeff, A. (2003). Distributional word clusters vs. words for text categorization. Journal of Machine Learning Research, 3:1183-1208.
  3. Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373-1396.
  4. Bengio, Y., Paiement, J., Vincent, P., Delalleau, O., Roux, N. L., and Ouimet, M. (2003). Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps, and spectral clustering. In Proc. of Neural Information Processing Systems, NIPS.
  5. Bennett, P. N. and Nguyen, N. (2009). Refined experts: improving classification in large taxonomies. In Proc. of the 32nd Int'l ACM SIGIR conference on Research and development in information retrieval.
  6. Blei, D. M., Ng, A. Y., Jordan, M., and Lafferty, J. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3:2003.
  7. Cai, D., He, X., and Han, J. (2007a). Spectral regression: A unified subspace learning framework for contentbased image retrieval. In Proc. of the ACM Conference on Multimedia.
  8. Cai, D., He, X., and Han, J. (2007b). Spectral regression for efficient regularized subspace learning. In Proc. of the International Conf. on Data Mining, ICDM.
  9. Chan, P. K., Schlag, M. D. F., and Zien, J. Y. (1994). Spectral k-way ratio-cut partitioning and clustering. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 13(9):1088-1096.
  10. Coppersmith, D. and Winograd, S. (1990). Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation, 9:251-280.
  11. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391-407.
  12. Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In Proc. of the 7th ACM SIGKDD International Conf. on Knowledge discovery and data mining, pages 269-274, San Francisco, California, US.
  13. Dhillon, I. S., Mallela, S., and Kumar, R. (2003). A division information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research, 3:1265-1287.
  14. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179- 188.
  15. Gonzalez, T. F. (1985). Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:23-306.
  16. Hardoon, D. R., Szedmak, S. R., and Shawe-taylor, J. R. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639 - 2664.
  17. He, X. (2004). Incremental semi-supervised subspace learning for image retrieval. In Proc. of the ACM Conference on Multimedia.
  18. He, X. and Niyogi, P. (2003). Locality preserving projections. In Proc. of Neural Information Processing Systems 16, NIPS.
  19. He, X., Yan, S., Hu, Y., Niyogi, P., and Zhang, H. (2005). Face recognition using laplacianfaces. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27(3):328-340.
  20. HildII, K. E., Erdogmus, D., Torkkola, K., and Principe, J. C. (2006). Feature extraction using informationtheoretic learning. IEEE Trans. on Pattern Analysis and Machine Intelligence, 28(9):1385-1392.
  21. Huang, Y., Chiang, C., Shieh, J., and Grimson, W. (2002). Prototype optimization for nearest-neighbor classification. Pattern Recognition, (6):12371245.
  22. Jolliffe, I. T. (1986). Principal Component Analysis. Springer-Verlag, New York, NY.
  23. Kim, H., Howland, P., and Parl, H. (2005). Dimension reduction in text classification with support vector machines. Journal of Machine Learning Research, 6:3753.
  24. Kokiopoulou, E. and Saad, Y. (2007). Orthogonal neighborhood preserving projections: A projectionbased dimensionality reduction technique. IEEE Trans. on Pattern Analysis and Machine Intelligence, 29(12):2143-2156.
  25. Kokiopouloua, E. and Saadb, Y. (2009). Enhanced graph-based dimensionality reduction with repulsion laplaceans. Pattern Recognition, 42:2392-2402.
  26. Lewis, D. D. (1992). Feature selection and feature extraction for text categorization. In Proc. of the workshop on Speech and Natural Language, pages 212- 217, Harriman, New York.
  27. Li, S., Xia, R., Zong, C., and Huang, C.-R. (2009). A framework of feature selection methods for text categorization. In Proc. of the Joint Conf. of the 47th Annual Meeting of the ACL and the 4th Int'l Joint Conf. on Natural Language Processing of the AFNLP, pages 692-700, Suntec, Singapore. Association for Computational Linguistics.
  28. Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4).
  29. Mollineda, R. and andE. Vidal, F. F. (2002). An efficient prototype merging strategy for the condensed 1-nn rule through class-conditional hierarchical clustering. Pattern Recognition, (12):27712782.
  30. Pekalska, E. and Duin, R. (2002). Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters, (8):943-956.
  31. Pekalska, E., Duin, R., and Paclik, P. (2006). Prototype selection for dissimilarity-based classifiers. Pattern Recognition, (2):189-208.
  32. Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323-2326.
  33. Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8):888-905.
  34. Steinwart, I. (2001). On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2:67-93.
  35. Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. Journal of Machine Learning Research, 8:1027- 1061.
  36. Sugiyama, M. (2010). Semi-supervised local fisher discriminant analysis for dimensionality reduction. Machine Learning, 78(1-2):35-61.
  37. Sun, L., Ji, S., and Ye, J. (2008). Hypergraph spectral learning for multi-label classification. In Proc. of the 14th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, pages 668-676, Las Vegas, Nevada, USA.
  38. Tang, L., Rajan, S., and Narayanan, V. K. (2009). Large scale multi-label classification via metalabeler. In Proc. of 18th Int'l Conf. on World Wide Web.
  39. Wall, M. E., Andreas, R., and Rocha, L. M. (2003). Singular value decomposition and principal component analysis. A Practical Approach to Microarray Data Analysis, pages 91-109.
  40. Yan, S., Xu, D., Zhang, B., Zhang, H., Yang, Q., and Lin, S. (2007). Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Trans. on Pattern Analysis and Machine Intelligence, 29(1):40-51.
  41. Yu, J. and Tian, Q. (2006). Learning image manifolds by semantic subspace projection. In Proc. of the ACM Conference on Multimedia.
  42. Yu, S., Yu, K., Tresp, V., and Kriegel, H. (2006). Multioutput regularized feature projection. IEEE Trans. on Knowledge and Data Eigeneering, 18(12):1600- 1613.
  43. Zhang, W., Xue, X., Sun, Z., Guo, Y., and Lu, H. (2007). Optimal dimensionality of metric space for classification. In Proc. of the 24th International Conf. on machine learning, ICML, volume 227, pages 1135-1142.
  44. Zhang, Y., Surendran, A. C., Platt, J. C., and Narasimhan, M. (2008). Learning from multitopic web documents for contextual advertisement. In Proc. of 14th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining.
  45. Zhang, Y. and Zhou, Z. (2007). Multi-label dimensionality reduction via dependence maximization. In Proc. of the 23rd National Conf. on Artificial intelligence, volume 3, pages 1503-1505, Chicago, Illinois.
Download


Paper Citation


in Harvard Style

Mu T. and Ananiadou S. (2010). PROXIMITY-BASED GRAPH EMBEDDINGS FOR MULTI-LABEL CLASSIFICATION . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 74-84. DOI: 10.5220/0003092200740084


in Bibtex Style

@conference{kdir10,
author={Tingting Mu and Sophia Ananiadou},
title={PROXIMITY-BASED GRAPH EMBEDDINGS FOR MULTI-LABEL CLASSIFICATION},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={74-84},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003092200740084},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - PROXIMITY-BASED GRAPH EMBEDDINGS FOR MULTI-LABEL CLASSIFICATION
SN - 978-989-8425-28-7
AU - Mu T.
AU - Ananiadou S.
PY - 2010
SP - 74
EP - 84
DO - 10.5220/0003092200740084