Bengio, Y., Paiement, J., Vincent, P., Delalleau, O., Roux,
N. L., and Ouimet, M. (2003). Out-of-sample exten-
sions for LLE, Isomap, MDS, eigenmaps, and spectral
clustering. In Proc. of Neural Information Processing
Systems, NIPS.
Bennett, P. N. and Nguyen, N. (2009). Refined experts:
improving classification in large taxonomies. In Proc.
of the 32nd Int’l ACM SIGIR conference on Research
and development in information retrieval.
Blei, D. M., Ng, A. Y., Jordan, M., and Lafferty, J.
(2003). Latent Dirichlet allocation. Journal of Ma-
chine Learning Research, 3:2003.
Cai, D., He, X., and Han, J. (2007a). Spectral regression:
A unified subspace learning framework for content-
based image retrieval. In Proc. of the ACM Conference
on Multimedia.
Cai, D., He, X., and Han, J. (2007b). Spectral regression
for efficient regularized subspace learning. In Proc. of
the International Conf. on Data Mining, ICDM.
Chan, P. K., Schlag, M. D. F., and Zien, J. Y. (1994). Spec-
tral k-way ratio-cut partitioning and clustering. IEEE
Trans. on Computer-Aided Design of Integrated Cir-
cuits and Systems, 13(9):1088–1096.
Coppersmith, D. and Winograd, S. (1990). Matrix multi-
plication via arithmetic progressions. Journal of Sym-
bolic Computation, 9:251–280.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer,
T. K., and Harshman, R. (1990). Indexing by latent
semantic analysis. Journal of the American Society
for Information Science, 41:391–407.
Dhillon, I. S. (2001). Co-clustering documents and words
using bipartite spectral graph partitioning. In Proc. of
the 7th ACM SIGKDD International Conf. on Knowl-
edge discovery and data mining, pages 269–274, San
Francisco, California, US.
Dhillon, I. S., Mallela, S., and Kumar, R. (2003). A division
information-theoretic feature clustering algorithm for
text classification. Journal of Machine Learning Re-
search, 3:1265–1287.
Fisher, R. A. (1936). The use of multiple measurements in
taxonomic problems. Annals of Eugenics, 7(2):179–
188.
Gonzalez, T. F. (1985). Clustering to minimize the maxi-
mum intercluster distance. Theoretical Computer Sci-
ence, 38:23–306.
Hardoon, D. R., Szedmak, S. R., and Shawe-taylor, J. R.
(2004). Canonical correlation analysis: An overview
with application to learning methods. Neural Compu-
tation, 16(12):2639 – 2664.
He, X. (2004). Incremental semi-supervised subspace
learning for image retrieval. In Proc. of the ACM Con-
ference on Multimedia.
He, X. and Niyogi, P. (2003). Locality preserving projec-
tions. In Proc. of Neural Information Processing Sys-
tems 16, NIPS.
He, X., Yan, S., Hu, Y., Niyogi, P., and Zhang, H.
(2005). Face recognition using laplacianfaces. IEEE
Trans. on Pattern Analysis and Machine Intelligence,
27(3):328–340.
HildII, K. E., Erdogmus, D., Torkkola, K., and Principe,
J. C. (2006). Feature extraction using information-
theoretic learning. IEEE Trans. on Pattern Analysis
and Machine Intelligence, 28(9):1385–1392.
Huang, Y., Chiang, C., Shieh, J., and Grimson, W. (2002).
Prototype optimization for nearest-neighbor classifi-
cation. Pattern Recognition, (6):12371245.
Jolliffe, I. T. (1986). Principal Component Analysis.
Springer-Verlag, New York, NY.
Kim, H., Howland, P., and Parl, H. (2005). Dimension
reduction in text classification with support vector
machines. Journal of Machine Learning Research,
6:3753.
Kokiopoulou, E. and Saad, Y. (2007). Orthogonal
neighborhood preserving projections: A projection-
based dimensionality reduction technique. IEEE
Trans. on Pattern Analysis and Machine Intelligence,
29(12):2143–2156.
Kokiopouloua, E. and Saadb, Y. (2009). Enhanced
graph-based dimensionality reduction with repulsion
laplaceans. Pattern Recognition, 42:2392–2402.
Lewis, D. D. (1992). Feature selection and feature extrac-
tion for text categorization. In Proc. of the work-
shop on Speech and Natural Language, pages 212–
217, Harriman, New York.
Li, S., Xia, R., Zong, C., and Huang, C.-R. (2009). A frame-
work of feature selection methods for text categoriza-
tion. In Proc. of the Joint Conf. of the 47th Annual
Meeting of the ACL and the 4th Int’l Joint Conf. on
Natural Language Processing of the AFNLP, pages
692–700, Suntec, Singapore. Association for Compu-
tational Linguistics.
Luxburg, U. (2007). A tutorial on spectral clustering. Statis-
tics and Computing, 17(4).
Mollineda, R. and andE. Vidal, F. F. (2002). An efficient
prototype merging strategy for the condensed 1-nn
rule through class-conditional hierarchical clustering.
Pattern Recognition, (12):27712782.
Pekalska, E. and Duin, R. (2002). Dissimilarity represen-
tations allow for building good classifiers. Pattern
Recognition Letters, (8):943–956.
Pekalska, E., Duin, R., and Paclik, P. (2006). Prototype
selection for dissimilarity-based classifiers. Pattern
Recognition, (2):189–208.
Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimension-
ality reduction by locally linear embedding. Science,
290(5500):2323–2326.
Shi, J. and Malik, J. (2000). Normalized cuts and image
segmentation. IEEE Trans. on Pattern Analysis and
Machine Intelligence, 22(8):888–905.
Steinwart, I. (2001). On the influence of the kernel on the
consistency of support vector machines. Journal of
Machine Learning Research, 2:67–93.
Sugiyama, M. (2007). Dimensionality reduction of multi-
modal labeled data by local fisher discriminant analy-
sis. Journal of Machine Learning Research, 8:1027–
1061.
PROXIMITY-BASED GRAPH EMBEDDINGS FOR MULTI-LABEL CLASSIFICATION
83