# SIMPLEX DECOMPOSITIONS USING SVD AND PLSA

### Madhusudana Shashanka, Michael Giering

#### Abstract

Probabilistic Latent Semantic Analysis (PLSA) is a popular technique to analyze non-negative data where multinomial distributions underlying every data vector are expressed as linear combinations of a set of basis distributions. These learned basis distributions that characterize the dataset lie on the standard simplex and themselves represent corners of a simplex within which all data approximations lie. In this paper, we describe a novel method to extend the PLSA decomposition where the bases are not constrained to lie on the standard simplex and thus are better able to characterize the data. The locations of PLSA basis distributions on the standard simplex depend on how the dataset is aligned with respect to the standard simplex. If the directions of maximum variance of the dataset are orthogonal to the standard simplex, then the PLSA bases will give a poor representation of the dataset. Our approach overcomes this drawback by utilizing Singular Values Decomposition (SVD) to identify the directions of maximum variance, and transforming the dataset to align these directions parallel to the standard simplex before performing PLSA. The learned PLSA features are then transformed back into the data space. The effectiveness of the proposed approach is demonstrated with experiments on synthetic data.

#### References

- Blei, D. and Lafferty, J. (2006). Correlated Topic Models. In NIPS.
- Blei, D., Ng, A., and Jordan, M. (2003). Latent Dirichlet Allocation. Jrnl of Machine Learning Res., 3.
- Gaussier, E. and Goutte, C. (2005). Relation between PLSA and NMF and Implications. In Proc. ACM SIGIR Conf. on Research and Dev. in Information Retrieval, pages 601-602.
- Hofmann, T. (2001). Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42.
- Lee, D. and Seung, H. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401.
- Lee, D. and Seung, H. (2001). Algorithms for Non-negative Matrix Factorization. In NIPS.
- Shashanka, M. (2009). Simplex Decompositions for RealValued Datasets. In Proc. Intl. Workshop on Machine Learning and Signal Processing.
- Shashanka, M., Raj, B., and Smaragdis, P. (2008). Probabilistic latent variable models as non-negative factorizations. Computational Intelligence and Neuroscience.

#### Paper Citation

#### in Harvard Style

Shashanka M. and Giering M. (2012). **SIMPLEX DECOMPOSITIONS USING SVD AND PLSA** . In *Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,* ISBN 978-989-8425-98-0, pages 248-252. DOI: 10.5220/0003797802480252

#### in Bibtex Style

@conference{icpram12,

author={Madhusudana Shashanka and Michael Giering},

title={SIMPLEX DECOMPOSITIONS USING SVD AND PLSA},

booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},

year={2012},

pages={248-252},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0003797802480252},

isbn={978-989-8425-98-0},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,

TI - SIMPLEX DECOMPOSITIONS USING SVD AND PLSA

SN - 978-989-8425-98-0

AU - Shashanka M.

AU - Giering M.

PY - 2012

SP - 248

EP - 252

DO - 10.5220/0003797802480252