this paper, we have focused on data that have nonneg-
ative entries but the proposed approach is also appli-
cable for datasets with real-valued entries. The algo-
rithm described earlier can be applied to any arbitrary
datasets with real-values entries without any modifi-
cations. This is an alternative approach to the one
proposed by (Shashanka, 2009). In that work, data
is transformed into the next higher dimension so that
PLSA can be applied while in this work, we use SVD
to align the dataset along the dimensions of the stan-
dard simplex. It will be instructiveto compare the two
approaches in this context and we leave that for future
work.
4 CONCLUSIONS
In this paper, we presented a novel approach to per-
form Simplex Decompositions on datasets. Specifi-
cally, the approach learns a set of basis vectors such
that each data vector can be expressed as a linear com-
bination of the learned set of bases and where the cor-
responding mixture weights are nonnegative and sum
to 1. PLSA performs a similar decomposition but it
characterizes the normalized datapoints instead of the
original dataset itself. We demonstrated the spurious
effect such a normalization can have with the help of
a synthetic dataset. We described our approach and
demonstrated that it provides a way to overcome this
drawback. This work has several potential applica-
tions in tasks such as clustering, feature extraction,
and classification. We would like to continue this
work by applying the technique on real-world prob-
lems and demonstrating its usefulness. We also intend
to extend this work to be applicable to other related
latent variable methods such as Probabilistic Latent
Component Analysis.
REFERENCES
Blei, D. and Lafferty, J. (2006). Correlated Topic Models.
In NIPS.
Blei, D., Ng, A., and Jordan, M. (2003). Latent Dirichlet
Allocation. Jrnl of Machine Learning Res., 3.
Gaussier, E. and Goutte, C. (2005). Relation between PLSA
and NMF and Implications. In Proc. ACM SIGIR
Conf. on Research and Dev. in Information Retrieval,
pages 601–602.
Hofmann, T. (2001). Unsupervised Learning by Probabilis-
tic Latent Semantic Analysis. Machine Learning, 42.
Lee, D. and Seung, H. (1999). Learning the parts of objects
by non-negative matrix factorization. Nature, 401.
Lee, D. and Seung, H. (2001). Algorithms for Non-negative
Matrix Factorization. In NIPS.
Shashanka, M. (2009). Simplex Decompositions for Real-
Valued Datasets. In Proc. Intl. Workshop on Machine
Learning and Signal Processing.
Shashanka, M., Raj, B., and Smaragdis, P. (2008). Prob-
abilistic latent variable models as non-negative fac-
torizations. Computational Intelligence and Neuro-
science.
APPENDIX
In this appendix, we briefly describe how to choose
the transformation matrix T that transforms M-
dimensional data V such that the first (M − 1) prin-
cipal components lie parallel to the standard (M− 1)-
Simplex. We need to indentify a set of (M − 1) M-
dimensional orthonormal vectors that span the stan-
dard (M − 1)-simplex.
(Shashanka, 2009) developed a procedure to find
exactly such a matrix and the method is based on in-
duction. Let R
M
denote a M × (M − 1) matrix of
(M − 1) orthogonal vectors. Let
~
1
M
and
~
0
M
denote
M-vectors where all the entries are 1’s and 0’s respec-
tively. Similarly, let 1
a×b
and 0
a×b
denote a × b ma-
trices of all 1’s and 0’s respectively. They showed that
the matrix R
(M+1)
given by
"
R
M
~
1
M
~
0
T
(M−1)
−M
#
if M is even, and
R
(M+1)/2
0
(M+1)/2×(M−1)/2
~
1
(M+1)/2
0
(M+1)/2×(M−1)/2
R
(M+1)/2
−
~
1
(M+1)/2
,
if M is odd, is orthogonal. R
(M+1)
is then normalized
to obtain an orthonormal matrix.
Given the above relation and the fact that R
1
is an
empty matrix, one can compute R
M
inductively for any
value of M.
We have an additional constraint that the last prin-
cipal component be orthogonal to the standard simplex
and this can be easily achieved by appending a column
vector of 1’s to R
M
.
Thus, the matrix T defining our desired transfor-
mation is given by [R
M
~
1
M
].
ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods
252