of the observed dataset. In this study, the resulting
prototypes y
k
of the GTM models are further clus-
tered using the K-means algorithm. In a similar two-
stage procedure to the one described in (Vesanto and
Alhoniemi, 2000), based on SOM, the second stage
K-means initialization in this study is first randomly
replicated 100 times, subsequently choosing the best
available result, which is the one that minimizes the
error function E =
∑
C
c=1
∑
x∈G
c
kx− µ
c
k
2
, where C is
the final number of clusters in the second stage and
µ
c
is the centre of the K-means cluster G
c
. This ap-
proach seems somehow wasteful, though, as the use
of GTM instead of SOM can provide us with richer a
priori information to be used for fixing the K-means
initialization in the second stage.
Two novel fixed initialization strategies that use
the prior knowledge obtained by GTM in the first
stage are proposed. They are based on the Magni-
fication Factors (MF) and the Cumulative Responsi-
bility (CR). The MF measure the level of stretching
that the mapping undergoes from the latent to the data
spaces. Areas of low data density correspond to high
distorsions of the mapping (high MF), whereas areas
of high data density correspond to low MF. The MF
is described in terms of the derivatives of the basis
functions φ
j
(u) in the form:
dA
′
dA
= det
1/2
ψ
T
W
T
Wψ
, (6)
where ψ has elements ψ
ji
= ∂φ
j
/∂u
i
(Bishop et al.,
1997) and dA
′
and dA are, in turn, infinitesimal rect-
angles in the manifold and latent spaces. If we choose
C to be the final number of clusters for K-means in
the second stage, the first proposed fixed initialization
strategy will consist on the selection of the class-GTM
prototypes corresponding to the C non-contiguous la-
tent points with lowest MF for K-means initialization.
That way, the second stage algorithm is meant to start
from the areas of highest data density.
The CR is the sum of responsibilities over all data
points in X for each cluster k:
CR
k
=
N
∑
n=1
ˆz
c
kn
(7)
The second proposed fixed initialization strategy,
based on CR, is similar in spirit to that based on MF.
Again, if we choose C to be the final number of clus-
ters for K-means in the second stage, the fixed ini-
tialization strategy will now consist on the selection
of the GTM prototypes corresponding to the C non-
contiguous latent points with highest CR. That is, the
second stage is meant to start from those prototypes
that are found in the first stage to be most responsible
for the generation of the observed data.
3 EXPERIMENTS
3.1 Human Brain Tumour Data
The multi-center data used in this study consists of
217 single voxel PROBE (PROton Brain Exam sys-
tem) MR spectra acquired in vivo for six brain tumour
types: meningiomas (58 cases), glioblastomas (86),
metastases (38), astrocytomas (22), oligoastrocy-
tomas (6), and oligodendrogliomas (7). For the analy-
ses, the spectra were grouped into three types (typol-
ogy that will be used in this study as class informa-
tion), as in (Tate et al., 2006): high grade malignant
(metastases and glioblastomas), low grade gliomas
(astrocytomas, oligodendrogliomas and oligoastrocy-
tomas) and meningiomas. The clinically relevant re-
gions of the spectra were sampled to obtain 200 fre-
quency intensity values. The high dimensionality of
the problem was compounded by the small number of
spectra available, which is commonplace in MRS data
analysis.
3.2 Experimental Design and Settings
The GTM, t-GTM and their class-enriched counter-
parts were implemented in MATLAB
R
. For the
experiments reported next, the adaptive matrix W
was initialized, following a PCA-based procedure de-
scribed in (Bishop et al., 1998). This ensures the
replicability of the results. The grid of latent points
u
k
was fixed to a square 20x20 layout for the MRS
dataset. The corresponding grid of basis functions φ
was equally fixed to a 5x5 square layout.
The goals of these experiments are fourfold. First,
we aim to assess whether the inclusion of class infor-
mation in the first stage of the procedure results in any
improvement in terms of cluster-wise class separabil-
ity (and under what circumstances) compared to the
procedure using standard GTM. Second, we aim to
assess whether the two-stage procedure improves, in
the same terms, on the use of direct clustering of the
data using K-means. Third, we aim to test whether the
second stage initialization procedures based on MF
and the CR of the class-GTM, described in section
2.2, retain the cluster-wise class separability capabil-
ities of the two-stage clustering procedure in which
K-means is randomly initialized. In fourth place, we
aim to explore the properties of the structure of the
dataset concerning atypical data. For that, we use the
t-GTM (Vellido, 2006), as described in section 2.1.
The clustering results of all models will be first
compared visually, which should help to illustrate the
visualization capabilities of the models. Beyond the
visual exploration, the second stage clustering results
TWO-STAGE CLUSTERING OF A HUMAN BRAIN TUMOUR DATASET USING MANIFOLD LEARNING
MODELS
193