spection of the reduced space, recovering the struc-
tures existing in the original space is of importance in
a number of applications: medecine (recovering the
groups of diseases), information retrieval (recovering
the groups of information (image, text)) etc.
Therefore, we propose an embedding in the clus-
ter space, where point coordinates are calculated by
means of their relative distances to each of the clus-
ters. The algorithm starts with a first step of clus-
tering. Once the cluster information is collected in
the original space using a Gaussian Mixture Model,
the discriminant functions provide the coordinates of
points in the cluster space. Moreover, considering
that the estimation of the GMM parameters is opti-
mal, the cluster space represents the optimal space for
discrimination.
The next section revisits related work. Section 3
formally defines the cluster space. Experiments on
artificial and real data and comparisons with other di-
mension reduction methods are described in Section
4. The paper ends with discussions and conclusions
in Section 5.
2 RELATED WORK
Many different approaches were proposed for the em-
bedding high-dimensional data into low-dimensional
spaces. Among the coordinate-based methods, the
linear method of Principal Components Analysis is
the most commonly used. It tries to linearly capture as
much as possible from the variance in the data. Meth-
ods based on pairwise distance matrices were de-
signed either: 1) to preserve as faithfully as possible
the original Euclidean interpoint distances (Multidi-
mensional Scaling (MDS) (Borg and Groenen, 2005),
Sammon Mapping (Sammon, 1969) - which increases
the weight given to small distances) or 2) to pre-
serve non-linear transformation of distances (Nonlin-
ear MDS (Borg and Groenen, 2005)) or 3) to un-
fold data that lies on manifolds (Isomap (Tenenbaum
et al., 2000), Curvilinear Component Analysis (CCA)
(Demartines and H´erault, 1997), Curvilinear Distance
Analysis (CDA) (Lee et al., 2000)).
Manifolds are non-linear structures where two
points, even if close with respect to the Euclidean
distance, can still be located far away on the man-
ifold. Isomap and CDA use the geodesic distance,
that is, the distance over the manifold and not through
the manifold. Both CCA and CDA weight the dis-
tances in the output space and not in the input space
like MDS, Isomap or Sammon Mapping do. Differ-
ent from Isomap, which is a global method, Locally
Linear Embedding (Roweis and Saul, 2000) is a lo-
cal method which tries to preserve the local structure
- the linear reconstruction of a point from its neigh-
bours. Similar to LLE, Laplacian Eigenmaps (Belkin
and Niyogi, 2002) build a neighborhood graph and
embed points with respect to the eigenvectors of the
Laplacian matrix. Stochastic Neighbour Embedding
(Hinton and Roweis, 2002) rather than preserving
distances, preserves probabilities of points of being
neighbours of other points. The methods presented
are not capable of projecting new testing points in the
reduced space, since the embedding has to be recom-
puted each time a new point is added.
In the introduction we discussed the importance
of preserving cluster information in reduced spaces.
Clustering is generally approached through hierarchi-
cal or partitional methods. Hierarchical clustering
generates a tree (a dendrogram) with each node be-
ing connected to its parent and with nodes at lower
levels being more similar than nodes at higher lev-
els. Partitional methods partition the data into differ-
ent clusters by doing a hard assignement - each point
belongs to exactly one cluster. Soft clustering, on the
other side, assigns to each point different degrees of
belonging to clusters. The most common example of
soft clustering is the probabilistic Gaussian Mixture
Model, which assumes that data comes from a mix-
ture of gaussians with different covariance matrices.
The idea of representing points in the space of the
clusters was discussed in (Gupta and Ghosh, 2001)
and in (Iwata et al., 2007). In (Gupta and Ghosh,
2001) the authors propose a Cluster Space model in
order to analyze the similarity between a customer
and a cluster in the transactional application area. The
solution uses hard clustering on different datasets and
then maps the results of the different clustering algo-
rithms into a common space, the cluster space, where
analysis is further performed to model the dynam-
ics of the clients. In (Iwata et al., 2007) a Paramet-
ric Embedding is proposed that embeds the poste-
rior probabilities of points to belong to clusters in a
lower-dimensional space using Kullback-Leibler di-
vergence (here posterior probabilities are considered
to be given as input to the algorithm). Our approach
differs from the above ones in that it proposes a solu-
tion that captures the discriminant information in the
embedding space.
3 CLUSTER SPACE
Let us consider that the dataset is grouped into clus-
ters and model it using a full Gaussian Mixture Model
(F-GMM). F-GMM makes the general assumption
that clusters follow Gaussian distributions and they
UNSUPERVISED DISCRIMINANT EMBEDDING IN CLUSTER SPACES
71