SIMPLEX DECOMPOSITIONS USING SVD AND PLSA

Madhusudana Shashanka and Michael Giering

United Technologies Research Center, East Hartford, CT 06108, U.S.A.

Keywords: Matrix factorization, Probabilistic Latent Semantic Analysis (PLSA), Nonnegative Matrix Factorization

(NMF), Singular Values Decomposition (SVD), Principal Components Analysis (PCA).

Abstract:

Probabilistic Latent Semantic Analysis (PLSA) is a popular technique to analyze non-negative data where

multinomial distributions underlying every data vector are expressed as linear combinations of a set of basis

distributions. These learned basis distributions that characterize the dataset lie on the standard simplex and

themselves represent corners of a simplex within which all data approximations lie. In this paper, we describe

a novel method to extend the PLSA decomposition where the bases are not constrained to lie on the standard

simplex and thus are better able to characterize the data. The locations of PLSA basis distributions on the

standard simplex depend on how the dataset is aligned with respect to the standard simplex. If the directions

of maximum variance of the dataset are orthogonal to the standard simplex, then the PLSA bases will give

a poor representation of the dataset. Our approach overcomes this drawback by utilizing Singular Values

Decomposition (SVD) to identify the directions of maximum variance, and transforming the dataset to align

these directions parallel to the standard simplex before performing PLSA. The learned PLSA features are

then transformed back into the data space. The effectiveness of the proposed approach is demonstrated with

experiments on synthetic data.

1 INTRODUCTION

The need for analyzing non-negative data arises in

several applications such as computer vision, seman-

tic analysis and gene expression analysis among oth-

ers. Nonnegative Matrix Factorization (NMF) (Lee

and Seung, 1999; Lee and Seung, 2001) was speciﬁ-

cally proposed to analyze such data where every data

vector is expressed as a linear combination of a set of

characteristic basis vectors. The weights with which

these vectors combine differ from data point to data

point. All entries of the basis vectors and the weights

are constrained to be nonnegative. The nonnegativity

constraint produces basis vectors that can only com-

bine additively without any cross-cancellations and

thus can be intuitively thought of as building blocks

of the dataset. Given these desirable properties, the

technique has found wide use across different appli-

cations. However, one of the main drawbacks of NMF

is that the energies of data vectors is split between the

basis vectors and mixture weights during decomposi-

tion. In other words, the basis vectors may lie in an

entirely different part of the data space making any

geometric interpretation meaningless.

Probabilistic Latent Semantic Analysis (PLSA)

(Hofmann, 2001) is a related method with probabilis-

tic foundations which was proposed around the same

time in the context of semantic analysis of document

corpora. A corpus of documents is represented as

a matrix where each column vector corresponds to

a document and each row corresponds to a word in

the vocabulary and the entry corresponds to the nu-

mer of times the word appeared in the document.

PLSA decomposes this matrix as a linear combination

of a set of multinomial distributions over the words

called topics where the weight vectors are multino-

mial distributions as well. Non-negativity constraint

is imposed implicitly because the extracted topics or

basis distributions and weights represent probabili-

ties. It has been shown that the underlying compu-

tations in NMF and PLSA are identical (Gaussier and

Goutte, 2005; Shashanka et al., 2008). However, un-

like NMF where there are no additional constraints

beyond nonegativity, PLSA bases and weights being

multinomial distriutions also have the contraint that

the entries sum to 1. Since the weights sum to 1, the

PLSA approximations of the data can be thought of

as lying within a simplex deﬁned by the basis distriu-

tions. (Shashanka, 2009) formalizes this geometric

intuition as Simplex Decompositions where the model

248

Shashanka M. and Giering M. (2012).

SIMPLEX DECOMPOSITIONS USING SVD AND PLSA.

In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, pages 248-252

DOI: 10.5220/0003797802480252

 SciTePress

extracts basis vectors that combine additivelyand cor-

respond to the corners of a simplex surrounding the

modeled data. PLSA and its extensions such as Latent

Dirichlet Allocation (Blei et al., 2003) and Correlated

Topic Models (Blei and Lafferty, 2006) are speciﬁc

examples of Simplex Decompositions.

Since PLSA (and other PLSA extensions) does

not decompose the data-vectors themselves but the

underlying multinomial distributions (i.e. the data

vectors normalized to sum to unity), the extracted ba-

sis vectors don’t lie in the data space but lie on the

standard simplex. This can be a drawback depend-

ing on the dataset under consideration and may pose

a particular poroblem if the data is aligned such that

most of the variability and structure characterizing the

dataset lies in directions orthogonal to the standard

simplex. In such cases, the projections of the data

vectors onto the simplex (which is what is decom-

posed by PLSA) carry very little information about

the shape of the data distribution and thus the obtained

PLSA bases are much less informative.

In this paper, we propose an approach to get

around this drawback of PLSA. We ﬁrst use Singu-

lar Values Decomposition (SVD) to identify the di-

rections of the most variability in the dataset and then

transform the dataset so that these vectors are paral-

lel to the standard simplex. We perform PLSA on the

transformed data and obtain PLSA basis vectors in the

transformed space. Since the transformation is afﬁne

and invertible, we apply the inverse transformation on

the basis vectors to obtain basis vectors the character-

ize the data in the original data space. These basis

vectors no longer are constrained to live on the stan-

dard simplex but lie within the data space and corre-

spond to corners of a simplex that surrounds all the

data points.

The paper is organized as follows. In Section 2,

we provide the necessary background by describing

the PLSA algorithm and geometry. Section 3 de-

scribes our proposed approach and constitutes the

bulk of the paper. We illustrate the applicability of the

method by applying the proposed technique on syn-

thetic data. We also provide a short discussion of the

algorithm and its applicability for semi-nonnegative

factorizations. We conclude the paper in Section 4

with a brief summary and avenues for future work.

2 BACKGROUND

Consider an M × N non-negativedata matrix V where

each column v

represents the n-th data vector and

represents the (mn)-th element. Let

represent

the normalized vector v

and

V is the matrix V with

Figure 1: Illustration of Probabilistic Latent Semantic Anal-

ysis. The data matrix V with 1000 3-dimensional vectors

is shown as red points and the normalized data

V is

shown as blue points on the Standard simplex. PLSA was

performed on V and the three extracted basis distributions

shown by w

, w

and w

are points on the Standard simplex

that form the corners of the PLSA simplex around normal-

ized data points

shown in blue.

all columns normalized.

PLSA characterizes the bidimensional distribution

P(m,n) underlying V as

P(m,n) = P(n)P(m|n) = P(n)

∑

P(m|z)P(z|n), (1)

where z is a latent variable. PLSA represents

data distributions P(m|n) which in turn is expressed

as a linear combination of basis distributions P(m|z).

These basis distributions combine with different pro-

portions given by P(z|n) to form data distributions.

PLSA parameters P(m|z) and P(z|n) can be esti-

mated through iterations of the following equations

derived using the EM algorithm,

P(z|m,n) =

P(m|z)P(z|n)

∑

P(m|z)P(z|n)

P(m|z) =

∑

P(z|m,n)

∑

P(z|m,n)

, and

P(z|n) =

∑

P(z|m,n)

∑

EM algorithm guarantees that the above updates con-

verge to a local optimum.

PLSA can be written as a matrix factorization

M× N

≈ W

M× Z

Z×N

= P

M× N

, (2)

where W is the matrix of basis distributions P(m|z)

with column w

corresponding to the z-th basis dis-

tribution, H is the mixture weight distriution matrix

of entries P(z|n) with column h

corresponding to the

SIMPLEX DECOMPOSITIONS USING SVD AND PLSA

249

n-th data vector, and P is the matrix of model approxi-

mations P(m|n) with column p

corresponding to the

n-th data vector. See Figure 1 for an illustration of

PLSA.

3 ALGORITHM

The previous section described PLSA algorithm and

illustrated the geometry of the technique. This sec-

tion presents our proposed approach. We ﬁrst brieﬂy

present the motivation for our algorithm and then de-

scribe the details of the algorithm. We illustrate the

algorithm by applying it on a synthetic dataset.

3.1 Motivation

As illustrated in Figure 1, the basis distributions ob-

tained by applying PLSA on a dataset lie on the Stan-

dard simplex. The basis distributions form the corners

of a PLSA Simplex containing not the original data-

points but the normalized datapoints instead.

Our goal is to extend the technique so that the ba-

sis vectors form a simplex around the original data-

points. In other words, we would like to remove the

constraint that the basis vectors form multinomial dis-

tributions and thus they don’t have to lie on the stan-

dard simplex. However, since we need the basis vec-

tors to still form a simplex around the data approxima-

tions, the mixture weights with which they combine

are still constrained to be multinomial distributions.

The necessity of such an approach becomes appar-

ent when one considers the implication of normaliza-

tion of datapoints that PLSA implicitly does. The nor-

malization skews the relative geometry of datapoints.

In certain cases, the normalization can hide the real

shape of the distriution of datapoints as illustrated in

Figure 2.

3.2 Problem Formulation

Given the data matrix V, we would like to ﬁnd a ma-

trix decomposition similar to equation 2 of the form

M× N

≈ W

M× Z

Z×N

= P

M× N

(3)

where Z is the dimensionality of the desired decom-

position, W is the matrix of basis vectors, H is the

matrix of mixture weights, and P is the matrix of ap-

proximations.

The above equation is similar to equation (2) but

with important differences. In equation (2), the ma-

trix undergoing decomposition is

V whereas the goal

here is to decompose the original data matrix V. The

(000)

(001)

(010)

(100)

Figure 2: Illustration of normalization on a dataset. Points

in red represents a dataset of 1000 3-dimensional points

where the directions of maximum variance are orthogonal

to the plane corresponding to the standard simplex. Thus,

the projection of points in the dataset onto the standard sim-

plex removes important information about the distribution

of datapoints.

matrix W is analogous to W from equation (2) but un-

like the columns of W that are constrained to sum to

1, the columns of W have no such constraints. Simi-

larly, P is analogous to P but the columns of the for-

mer are not constrained to sum to 1 like the columns

of P. However, since both equations (2) and (3) are

simplex decompositions, matrices H and H are alike

with entries in each of their columns constrained to

sum to 1.

3.3 Algorithm

Consider a scenario where a dataset V that we desire

to decompose using PLSA already lies on the stan-

dard simplex. Then, all the constraints that we need as

described in the previous subsection are already satis-

ﬁed. Since all data points lie on the standard simplex,

the dataset V is identical to its normalized version

Hence, the decomposition desired in equation (3) be-

comes identical to the decomposition in equation (2).

We can apply PLSA directly to the given dataset V

and obtain the desired basis vectors.

This observationpoints to the approach we present

below. If we could transform the dataset so that all

points lie on the standard simplex and the transfor-

mation is invertible, we can achieve the desired de-

composition. However, the standard simplex in M-

dimensional space represents part of the (M − 1)-

dimensional hyperplane. Thus, instead of being able

to have the points exactly lie on the standard simplex,

we are constrained to transforming data such that the

projections of the data onto (M − 1) dimensions of

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

250

Figure 3: Results of applying PLSA on the dataset shown

in Figure 2. Since the projections of data points on the stan-

dard simplex (shown in blue) have very narrow variance in

one direction, the PLSA simplex obtained is degenerate and

almost forms a straight line through the data projections.

our choice will lie on the simplex. Choosing the ﬁrst

(M − 1) principal components of the dataset as the

(M − 1) dimensions on which data will be projected

will produce the least error of all possible projections.

The problem now reduces to ﬁnding the right

transformation that takes the projections of the data

on the ﬁrst (M − 1) principal components and aligns

them parallel to the standard simplex. The last princi-

pal component is transformed such that it left orthog-

onal to the standard simplex. We leverage the work of

(Shashanka, 2009) to deﬁne this transformation ma-

trix. More details can be found in the Appendix.

Given the M × N data matrix V, the entire algo-

rithm can be summarized as follows:

1. Center the data by removing the mean vector to

obtain

V, i.e.

V = V− mean(V).

2. Perform SVD of matrix

to obtain U, the ma-

trix of data projections on the singular vectors, i.e.

= USX

3. Obtain the M × M transformation matrix T (see

Appendix for details of this computation).

4. Transform the data to lie parallel to the standard

simplex, i.e. B = (UT

)

5. Center the transformed data such that the centroid

of the simplex coincides with the data mean, i.e.

B = B− mean(B) + c, where c is a vector corre-

sponding to the centroid of the standard simplex.

6. Ensure all entries of

B are nonnegative by sub-

tracting the minimum entry from the matrix, i.e.

B =

B− min(

B).

7. Normalize the matrix

B such that entries of the

center of the dataset sum to 1, i.e. B

′

B/b,

where b = 1 − min(

B).

8. The matrix is now ready for PLSA. Apply PLSA

on B

′

to obtain W and H , i.e. B

′

≈ WH .

9. Undo steps 7, 6, 5 and 4 respectively for the basis

vector matrix W to obtain

W, i.e.

• W = W× b

• W = W+ min(

• W = W+ mean(B) − c

•

W = W

10. Undo the SVD projection and data centering for

W to obtain W , i.e.

•

W = (

WSX

)

• W =

W+ mean(V)

The desired decomposition is given by V ≈ W H .

Figure 4: Result of applying our approach on the dataset

illlustrated in Figure 2. As desired, the extracted basis vec-

tors form a simplex (dotted black line around the red points)

around the original datapoints instead of around the data

projections on the standard simplex.

For experiments, we created a synthetic dataset of

1000 3-dimensional points as illustrated in Figure 2.

The dataset was created in such a way that the direc-

tions of maximal variance present in the data was or-

thogonal to the plane of the standard simplex. Re-

sults of applying PLSA on the dataset is summarized

in Figure 3 and results of applying the proposed ap-

proach is illustrated in Figure 4.

3.4 Discussion

We ﬁrst point out that even though we have used

PLSA as the speciﬁc example, the proposed approach

is applicable to any topic modeling technique such as

Latent Dirichlet Allocation or Correlated Topic Mod-

els where data distributions are expressed as linear

combinations of charateristic basis distributions.

In the approach described in the previous subsec-

tions, no explicit constraints were placed as to the

nonnegativity of the entries of basis vectors. So far in

SIMPLEX DECOMPOSITIONS USING SVD AND PLSA

251

this paper, we have focused on data that have nonneg-

ative entries but the proposed approach is also appli-

cable for datasets with real-valued entries. The algo-

rithm described earlier can be applied to any arbitrary

datasets with real-values entries without any modiﬁ-

cations. This is an alternative approach to the one

proposed by (Shashanka, 2009). In that work, data

is transformed into the next higher dimension so that

PLSA can be applied while in this work, we use SVD

to align the dataset along the dimensions of the stan-

dard simplex. It will be instructiveto compare the two

approaches in this context and we leave that for future

work.

4 CONCLUSIONS

In this paper, we presented a novel approach to per-

form Simplex Decompositions on datasets. Speciﬁ-

cally, the approach learns a set of basis vectors such

that each data vector can be expressed as a linear com-

bination of the learned set of bases and where the cor-

responding mixture weights are nonnegative and sum

to 1. PLSA performs a similar decomposition but it

characterizes the normalized datapoints instead of the

original dataset itself. We demonstrated the spurious

effect such a normalization can have with the help of

a synthetic dataset. We described our approach and

demonstrated that it provides a way to overcome this

drawback. This work has several potential applica-

tions in tasks such as clustering, feature extraction,

and classiﬁcation. We would like to continue this

work by applying the technique on real-world prob-

lems and demonstrating its usefulness. We also intend

to extend this work to be applicable to other related

latent variable methods such as Probabilistic Latent

Component Analysis.

REFERENCES

Blei, D. and Lafferty, J. (2006). Correlated Topic Models.

In NIPS.

Blei, D., Ng, A., and Jordan, M. (2003). Latent Dirichlet

Allocation. Jrnl of Machine Learning Res., 3.

Gaussier, E. and Goutte, C. (2005). Relation between PLSA

and NMF and Implications. In Proc. ACM SIGIR

Conf. on Research and Dev. in Information Retrieval,

pages 601–602.

Hofmann, T. (2001). Unsupervised Learning by Probabilis-

tic Latent Semantic Analysis. Machine Learning, 42.

Lee, D. and Seung, H. (1999). Learning the parts of objects

by non-negative matrix factorization. Nature, 401.

Lee, D. and Seung, H. (2001). Algorithms for Non-negative

Matrix Factorization. In NIPS.

Shashanka, M. (2009). Simplex Decompositions for Real-

Valued Datasets. In Proc. Intl. Workshop on Machine

Learning and Signal Processing.

Shashanka, M., Raj, B., and Smaragdis, P. (2008). Prob-

abilistic latent variable models as non-negative fac-

torizations. Computational Intelligence and Neuro-

science.

APPENDIX

In this appendix, we brieﬂy describe how to choose

the transformation matrix T that transforms M-

dimensional data V such that the ﬁrst (M − 1) prin-

cipal components lie parallel to the standard (M− 1)-

Simplex. We need to indentify a set of (M − 1) M-

dimensional orthonormal vectors that span the stan-

dard (M − 1)-simplex.

(Shashanka, 2009) developed a procedure to ﬁnd

exactly such a matrix and the method is based on in-

duction. Let R

denote a M × (M − 1) matrix of

(M − 1) orthogonal vectors. Let

and

denote

M-vectors where all the entries are 1’s and 0’s respec-

tively. Similarly, let 1

a×b

and 0

a×b

denote a × b ma-

trices of all 1’s and 0’s respectively. They showed that

the matrix R

(M+1)

given by

(M−1)

−M

if M is even, and



(M+1)/2

(M+1)/2×(M−1)/2

(M+1)/2

(M+1)/2×(M−1)/2

(M+1)/2

−

(M+1)/2



if M is odd, is orthogonal. R

(M+1)

is then normalized

to obtain an orthonormal matrix.

Given the above relation and the fact that R

is an

empty matrix, one can compute R

inductively for any

value of M.

We have an additional constraint that the last prin-

cipal component be orthogonal to the standard simplex

and this can be easily achieved by appending a column

vector of 1’s to R

Thus, the matrix T deﬁning our desired transfor-

mation is given by [R

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

252