UNSUPERVISED DISCRIMINANT EMBEDDING IN CLUSTER SPACES

Eniko Szekely, Eric Bruno, Stephane Marchand-Maillet

Abstract

This paper proposes a new representation space, called the cluster space, for data points that originate from high dimensions. Whereas existing dedicated methods concentrate on revealing manifolds from within the data, we consider here the context of clustered data and derive the dimension reduction process from cluster information. Points are represented in the cluster space by means of their a posteriori probability values estimated using Gaussian Mixture Models. The cluster space obtained is the optimal space for discrimination in terms of the Quadratic Discriminant Analysis (QDA).Moreover, it is shown to alleviate the negative impact of the curse of dimensionality on the quality of cluster discrimination and is a useful preprocessing tool for other dimension reduction methods. Various experiments illustrate the effectiveness of the cluster space both on synthetic and real data.

References

  1. Belkin, M. and Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems, 14.
  2. Borg, I. and Groenen, P. (2005). Modern multidimensional scaling: Theory and applications. Springer.
  3. Demartines, P. and Hérault, J. (1997). Curvilinear component analysis: A self-organizig neural network for nonlinear mapping of data sets. IEEE Transactions on Neural Network.
  4. Fraley, C. and Raftery, A. (2002). Model-based clustering, discriminant analysis and density estimation. Journal of American Statistical Association, pages 611-631.
  5. Gupta, G. and Ghosh, J. (2001). Detecting seasonal trends and cluster motion visualization for very highdimensional transactional data. In Proceedings of the First International SIAM Conference on Data Mining.
  6. Hastie, T., Tibshirani, R., and Friedman, J. (2001). The elements of statistical learning. Springer-Verlag.
  7. Hinton, G. and Roweis, S. (2002). Stochastic neighbor embedding. In Advances in Neural Information Processing Systems.
  8. Iwata, T., Saito, K., Ueda, N., Stromsten, S., Griffiths, T., and Tenenbaum, J. (2007). Parametric embedding for class visualization. Neural Computation.
  9. Lee, J., Lendasse, A., and Verleysen, M. (2000). A robust nonlinear projection method. In Proceedings of ESANN'2000, Belgium, pages 13-20.
  10. Roweis, S. and Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323-2326.
  11. Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18.
  12. Tenenbaum, J., de Silva, V., and Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319-2323.
Download


Paper Citation


in Harvard Style

Szekely E., Bruno E. and Marchand-Maillet S. (2009). UNSUPERVISED DISCRIMINANT EMBEDDING IN CLUSTER SPACES . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009) ISBN 978-989-674-011-5, pages 70-76. DOI: 10.5220/0002306500700076


in Bibtex Style

@conference{kdir09,
author={Eniko Szekely and Eric Bruno and Stephane Marchand-Maillet},
title={UNSUPERVISED DISCRIMINANT EMBEDDING IN CLUSTER SPACES},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)},
year={2009},
pages={70-76},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002306500700076},
isbn={978-989-674-011-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)
TI - UNSUPERVISED DISCRIMINANT EMBEDDING IN CLUSTER SPACES
SN - 978-989-674-011-5
AU - Szekely E.
AU - Bruno E.
AU - Marchand-Maillet S.
PY - 2009
SP - 70
EP - 76
DO - 10.5220/0002306500700076