Multi-level Visualisation using Gaussian Process Latent Variable Models

Shahzad Mumtaz

, Darren R. Flower

and Ian T. Nabney

Non-Linearity and Complexity Research Group, Aston University, Birmingham B4 7ET, U.K.

School of Life and Health Sciences, Aston University, Birmingham B4 7ET, U.K.

Keywords:

Multi-level Gaussian Process Latent Variable Model, k-means, Gaussian Mixture Model, Trustworthiness,

Continuity, Negative Log-likelihood, Visualisation Distance Distortion, Mean Relative Rank Errors, Major

Histocompatibility Complex.

Abstract:

Projection of a high-dimensional dataset onto a two-dimensional space is a useful tool to visualise structures

and relationships in the dataset. However, a single two-dimensional visualisation may not display all the

intrinsic structure. Therefore, hierarchical/multi-level visualisation methods have been used to extract more

detailed understanding of the data. Here we propose a multi-level Gaussian process latent variable model

(MLGPLVM). MLGPLVM works by segmenting data (with e.g. K-means, Gaussian mixture model or inter-

active clustering) in the visualisation space and then ﬁtting a visualisation model to each subset. To measure

the quality of multi-level visualisation (with respect to parent and child models), metrics such as trustwor-

thiness, continuity, mean relative rank errors, visualisation distance distortion and the negative log-likelihood

per point are used. We evaluate the MLGPLVM approach on the ‘Oil Flow’ dataset and a dataset of protein

electrostatic potentials for the ‘Major Histocompatibility Complex (MHC) class I’ of humans. In both cases,

visual observation and the quantitative quality measures have shown better visualisation at lower levels.

1 INTRODUCTION

Recent advances in sciences as diverse as astron-

omy, biology, weather forecasting and economics

have led to the generation and storage of large

high-dimensional datasets. Such datasets have not

only presented new challenges for researchers but

also created new openings for theoretical develop-

ments (Donoho, 2000).

In the machine-learning domain, projection of

large and high-dimensional datasets onto a single

two-dimensional plot is a useful and popular way

of extracting intrinsic structures. A single two-

dimensional visualisation plot (using either linear or

non-linear visualisation methods) seldom captures all

the intrinsic structure in complex datasets. Con-

sider, for example, situations where a single two-

dimensional visualisation plot of a complex and large

high-dimensional dataset can only show major clus-

ters whereas a hierarchical or multi-level visualisation

approach could show more interesting detailed struc-

tures. Such a tree-like visualisation model uses a root-

level visualisation plot to give a high-level overview

of a dataset and child visualisation plots for more de-

tailed views.

In the last two decades, unsupervised hierarchi-

cal visualisation or clustering models have been in-

troduced and are reviewd as in (Vicente and Vel-

lido, 2004; Murtagh and Contreras, 2011). Two well

known visualisation approaches are: probabilistic hi-

erarchical models and multi-level models. Both cat-

egories are based on a top-down divisive approach to

build a tree-like visualisation structure. The term ‘hi-

erarchical’ is used here to indicate a probabilistic way

of assigning data points (also known as soft member-

ships) to child visualisations whereas the term ‘multi-

level’ indicates that we partition the dataset into sub-

sets (using hard clustering) for building the child vi-

sualisations.

Probabilistic hierarchical models use density esti-

mation to build a complete and consistent hierarchical

model of the data. For example, a hierarchical mix-

ture of latent variables model (HMLVM) uses a single

linear latent variable model to obtain a top-level visu-

alisation plot and uses a probabilistic mixture of la-

tent variable models to represent the lower-level mod-

els (Bishop and Tipping, 1998). This process can

be continued recursively where child models at level

N +1 represent a mixture decomposition of the parent

model at level N. A non-linear extension of the HM-

122

Mumtaz S., Flower D. and Nabney I..

Multi-level Visualisation using Gaussian Process Latent Variable Models.

DOI: 10.5220/0004686801220129

In Proceedings of the 5th International Conference on Information Visualization Theory and Applications (IVAPP-2014), pages 122-129

ISBN: 978-989-758-005-5

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

LVM was proposed in (Tino and Nabney, 2002). This

non-linear variant uses the Generative Topographic

Mapping (GTM) as a building block to represent visu-

alisation of high-dimensional datasets in a tree struc-

ture of multiple two-dimensional plots and is known

as hierarchical GTM (HGTM). A multiple manifold

learning framework based on HGTM was proposed

in (Wang et al., 2008): this uses an approximation

method to initialize a hierarchical model to represent

each sub-model as a single manifold. These proba-

bilistic hierarchical visualisation methods are based

on soft clustering.

Multi-level approaches based on the Self Orga-

nizing Map (SOM) use hard clustering where each

of the low-dimensional code vectors represents a

single cluster and the corresponding points in the

high-dimensional space are projected onto a sepa-

rate lower-level two-dimensional plot to show sub-

clusters (Miikkulainen, 1990; Versino and Gam-

bardella, 1996; Lampinen and Oja, 1992).

In this paper we propose a multi-level visualisa-

tion with Gaussian Process Latent Variable Models

(MLGPLVM). In MLGPLVM we visualise a com-

plete dataset at the root level which gives a high-level

view. We apply clustering on this root-level visual-

isation plot in order to create a hard partition into

subsets. Each subset is then used to build a child-

level visualisation model. These child-level or subset-

level models may help us to visualise detailed local

structures or clusters in the dataset. We take this ap-

proach since there is no simple way of modifying the

GPLVM to take account of ‘soft’ cluster membership,

as would be needed for a probabilistic hierarchy. In

our experiments we used K-means, Gaussian mix-

ture models and interactive clustering for partitioning

the data. An interactive clustering approach permits

the user to draw polygons on the visualisation plot to

identify clusters. The beneﬁt of clustering in the visu-

alisation space is that the user can see the nature of the

segmentation, and correct it manually if necessary. To

measure the effectiveness of our proposed multi-level

visualisation approach we apply it to two datasets and

compute quantitative visualisation quality measures.

The outline of the paper is as follows. In Section 2,

we brieﬂy discuss the theory of GPLVM with sparse

and back-constrained extensions. In Section 3 we ex-

plain our proposed MLGPLVM approach with brief

details about the clustering approaches we used in our

analysis. Section 4 deﬁnes the quantitative quality

matrics used in our analysis to show the effectiveness

of our approach. In Section 5 we describe brieﬂy the

‘Oil Flow’ and the ‘MHC class-I’ datasets. In sec-

tion 6 we explain the visualisation experiments and

discuss the results for both datasets. Section 7 con-

cludes our paper with potential advantages and dis-

advantages of this new method and proposes future

work.

2 GAUSSIAN PROCESS LATENT

VARIABLE MODEL (GPLVM)

2.1 Probabilistic Dimensionality

Reduction Process

A latent-variable model is used to represent a dataset

Y ∈ R

N×D

with N data points in D dimensions by

mapping from the low-dimensional X ∈ R

N×Q

with

N data points in Q dimensions (usually Q = 2). The

mapping between a low-dimensional data point x

and

a high-dimensional data point y

is deﬁned by

= f

) + η

, (1)

where η

represents noise for the ith feature of the

nth data point. The noise model we assume is a Gaus-

sian with zero mean and inverse variance β. So the

conditional distribution of a data point y

given a data

point x

p(y

) =

∏

i=1

N(y

| f

(x), β

−1

). (2)

If the mapping is assumed to be linear f

) = w

and the latent variable x is drawn from a Gaussian

prior (with zero mean and unit variance) then the

maximum likelihood solution of the model repre-

sents the principal subspaces of the data (Tipping and

Bishop, 1999): this is Probabilistic Principal Compo-

nent Analysis (PPCA). Integrating out the latent vari-

able gives the marginal likelihood,

p(y

) =

p(y

)p(x

)dx

. (3)

So the marginal distribution for the complete dataset

is given as

p(Y ) =

∏

n=1

p(y

). (4)

In a standard latent-variable model we use maximum

likelihood to optimize weights and marginalize out la-

tent variables.

2.2 Standard GPLVM

A non-linear extension of PPCA is the Gaussian pro-

cess latent variable model (GPLVM), which uses a

smooth mapping from the latent space to the data

space. In the GPLVM instead of optimizing weights

Multi-levelVisualisationusingGaussianProcessLatentVariableModels

123

they are marginalized out and instead of marginaliz-

ing over the latent space it is optimized (i.e. the po-

sition of each point in the latent space is optimized).

A conjugate prior over the weights is chosen, taking

the form of a spherical Gaussian distribution for each

dimension

p(w) =

∏

i=1

N(w

|0, I). (5)

The likelihood after marginalizing the weights is

p(Y |X) =

∏

i=1

N(y

(:,i)

| f

(x), β

−1

), (6)

where p(y

(:,i)

) = N(y

(:,i)

|0, XX

+ β

−1

I) represents a

distribution over a single feature in the data space.

GPLVM uses the following likelihood function to op-

timize the latent variables (similar to the likelihood

used in (Tipping and Bishop, 1999))

L = −

log(2π) −

log(detK)−

tr(K

−1

(7)

If K = X X

+ β

−1

I is a linear kernel, then it is sim-

ilar to PPCA. But for the GPLVM a non-linear RBF

kernel is used as explained in (Lawrence and Hyvri-

nen, 2005; Lawrence, 2004) and then the optimiza-

tion of the latent variable can be achieved using the

non-linear optimization algorithm using the gradient

of the likelihood with respect to the kernel. The ker-

nel parameters are optimized by combining this gra-

dient with the derivative of the kernel parameters us-

ing the chain rule. The gradient calculation uses the

inverse of the kernel matrix; it has O(N

) complex-

ity thereby making it less practical for large datasets.

Due to this complexity, a GPLVM is usually trained

using sparse approximations where a small subset of

data points of size k << N known as ‘inducing points’

or the ‘active set’ is used to reduce the complex-

ity from O(N

) to O(k

N). For sparse approxima-

tion the standard GPLVM uses informative vector ma-

chine (IVM) (where data points are chosen sequen-

tially based on the reduction of the posterior process’s

entropy (Lawrence et al., 2003)). But we used the

GPLVM with the improved sparse approximation ap-

proach compared to IVM based sparse approach as

proposed in (Lawrence, 2008). This new improved

approximation process was originally proposed for

Gaussian process regression and is based on the uni-

ﬁed view process as explained in (Quionero-candela

et al., 2005).

2.3 Preserving Local Distances

The standard GPLVM uses mapping from the latent

space to the data space for the training data only

which constraints distant points in the data space to

be distant in the latent space at the expense of the lo-

cal distance preservation. When users visualize data,

it is the local structure that is most relevant to their

analysis (for example, when they identify clusters).

Therefore we use the variant of GPLVM where con-

strained smooth mapping as in Neuroscale (Lowe and

Tipping, 1996) is employed to overcome the problem

of local distance preservation because the data points

x are no longer freely optimized. Instead they are the

image of points y in the data space under the non-

linear function like a Radial Basis function (RBF)

kernel or multi-layer kernel perceptron (MLP). This

constrained mapping (also known as back-constraint)

ensures that the data points which are close in the vi-

sualisation space are also close in the data space. We

use an MLP kernal as a back-constraint. Further de-

tails on preserving local distances with GPLVM can

be found in (Lawrence, 2006).

3 MULTI-LEVEL GPLVM

We propose here a multi-level GPLVM (MLGPLVM)

visualisation method for analysing complex datasets.

The fundamental building block of our proposed visu-

alisation model is GPLVM with back constraint. The

steps involved to generate a MLGPLVM visualisation

are:

1. Generate a root visualisation plot using standard

GPLVM with back constraint which represents

the mapping from data space Y to latent visuali-

sation space X.

2. Cluster the data in the two-dimensional visualisa-

tion space.

3. Partition the high-dimensional dataset Y into sub-

sets based on the clustering information at step 2.

4. Build a separate visualisation model for each sub-

set to generate local visualisation sub-models.

5. Repeat steps 2 to 4 on local visualisation sub-

models if required to add more levels in the multi-

level visualisation.

The structure of the multi-level GPLVM visualisation

approach is shown in Figure 1 where the top-level vi-

sualisation shows only three major clusters whereas

second-level visualisations show more detailed local

structures (e.g. cluster-1 at level-1 shows four clear

sub-clusters at level-2). The similar high-level and

detailed views have been found in real datasets (as

shown in Section 6). For the purpose of comparison

and ﬁndnig the effectiveness of our proposed ML-

GPLVM approach, we use three different clustering

IVAPP2014-InternationalConferenceonInformationVisualizationTheoryandApplications

124

Cluster

Level-1 Visualisation

Level-2 Visualisation

Figure 1: Structure of multi-level GPLVM visualisation.

methods to partition a dataset to generate lower-level

visualisations: K−means, Gaussian mixture models

and interactive clustering by drawing polygons.

3.1 Cluster Identiﬁcation

Traditional well known clustering approaches such as

K−means and Gaussian mixture models can be used

for generating subsets of a dataset required to gener-

ate level−2 visualisations.

K-means is an unsupervised clustering method

where K cluster centres are initialised as randomly

selected data points (MacQueen, 1967) and is trained

iteratively in a two step process: in the ﬁrst step the

cluster centres are kept ﬁxed to compute cluster mem-

berships and in the second step cluster centres are up-

dated to be the mean of the assigned data points. This

continues until no cluster memberships change. This

algorithm faces certain limitations:there is no princi-

pled way of determining the true number of clusters,

it is sensitive to outliers and it is difﬁcult to identify

true boundaries of clusters due to the unknown opti-

mal number of clusters.

As argued in (Iwata et al., 2012; Lawrence and

Hyvrinen, 2005), the standard GPLVM considers a

single Gaussian in the latent space to represent a more

complex dataset. Therefore applying Gaussian mix-

ture model (GMM) could not be a useful way to iden-

tify clusters in the latent space. We also observed the

same by applying variational mixture modeling (Cor-

duneanu and Bishop, 2001) approach on the latent vi-

sualisation space in order to determine the true num-

ber of Gaussians. As GPLVM gives good cluster-

ing results in the visualisation latent space (Lawrence,

2006) therefore only for the purpose of compari-

son with our proposed interactive region selection

approach we applied ﬁnite Gaussian mixture mod-

els (McLachlan and Basford, 1988) for the purpose of

segmenting the dataset based on the clustering in the

visualisation latent space. Visualisation results using

K−means and GMM under MLGPLVM framework

are available in a technical report (Mumtaz et al.,

2013).

As K-means and GMM have limitations in deﬁn-

ing the true number of clusters and identiﬁcation of

cluster boundaries onto visualisation space, we there-

fore propose involving user in identifying clusters by

drawing polygon regions interactively on visualisa-

tion space can give us better clusters by deﬁning true

boundaries (Larkin and Simon, 1987). This interac-

tive approach of deﬁning clusters using human per-

ception requires no mathematical or statistical mod-

eling but enables the user to control the drill-down

directly (Shneiderman, 2002). Clusters were iden-

tiﬁed by using a polygon region-selection approach

proposed in (Hormann and Agathos, 2001). We com-

pute mapping precision for the GPLVM (see Figures 2

and 4 where it is represented as grey background).

This can be helpful to identify clusters interactively.

4 MLGPLVM VISUALISATION

QUALITY MEASURES

Evaluating visualisation performance quantitatively is

necessary but difﬁcult because there is no true target

output. The log likelihood is a global model ﬁt mea-

sure. Because visual interpretation is often focused

on clusters of points, we need to use metrics that cap-

ture local neighbourhood preservation. To compare

the mapping at different levels of the hierarchy we use

local quality measures such as visualisation distance

distortion, trustworthiness, continuity and mean rela-

tive rank errors. We brieﬂy explained them in the fol-

lowing sub-sections whereas detailed description of

each of these is avaialable in a technical report (Mum-

taz et al., 2013).

4.1 Visualisation Distance Distortion

The visualisation distance distortion (VDD) measure

is used to compare the distances between the points in

the data space Y and the projection space X for each

data point and its k nearest neighbours. VDD is cal-

culated as the norm of the difference vectors between

the scaled distances in the data space and the visual-

isation latent space. The scaled distances are used to

make the distance comparable between the data space

and the latent visualisation space. The idea of VDD

is similar to the projection precision score (PPS) as

discussed in (Schreck et al., 2010) where it was used

to observe projection precision quality on the visuali-

sation plot. We compute the sum of the VDD values

of all the data points in a subset to compare the subset

Multi-levelVisualisationusingGaussianProcessLatentVariableModels

125

visualisation quality in the level-1 and level-2 visual-

isation plots.

4.2 Rank Based Neighbourhood

Visualisation Quality Measures

In the information visualisation domain, two well

known visualisation quality measures based on com-

paring neighbourhoods in the data space Y and pro-

jection space X are trustworthiness and continu-

ity (Venna and Kaski, 2001). Mapping is assumed

to be trustworthy if k-neighbourhood in the visu-

alised space matches in the data space but if the k-

neighbourhood in the data space matches the visual-

ized space it maintains continuity. Two another qual-

ity measures mean relative rank errors (MRREs) with

respect to data and latent space are also used (Lee

et al., 2007). MRREs are computed using the exact

rank position differences within the k-neighbourhood

of the data space and visualisation space.

The higher the value of trustworthiness and con-

tinuity (ranges from 0 to 1) the better the proximity

preservation is whereas for mean relative rank errors

the lower the measure is the better the proximity is

preserved.

5 DATASETS

For evaluating the MLGPLVM visualisation, we con-

sider two datasets: ‘oil ﬂow’ and ‘MHC class I’.

5.1 Oil Flow Dataset

The ‘oil ﬂow’ dataset is a twelve-dimensional dataset

collected from a simulation of a non-invasive moni-

toring system (Bishop and James, 1993) and used pre-

viously to demonstrate the Generative Topographic

Mapping (Bishop and Svensen, 1998) and hierar-

chical visualisation algorithms (Bishop and Tipping,

1998; Tino and Nabney, 2002). The dataset comprises

1000 data points and it was generated artiﬁcially in

a multiphase ﬂow conﬁguration of three liquids (oil,

water and gas) by deﬁning three known classes: ho-

mogeneous, annular and laminar. From knowledge of

the generation process, the data is expected to lie on

low-dimensional manifolds.

5.2 Major Histocompatibility Complex

class-I

The second dataset is related to MHC class-I that we

used previously to demonstrate variants of generative

topographic mapping (GTM) and GTM with simulta-

neous feature saliency (Mumtaz et al., 2012).

Here we brieﬂy explain the process of generat-

ing a Poisson-Boltzmann electrostatic potential data

for the MHC class-I genes. MHC genes in humans

are known as Human Leukocyte Antigen (HLA). We

modelled 3, 944 three-dimensional protein structures

of HLA class I (1236 for HLA-A, 1, 779 for HLA-B

and 929 for HLA-C) using homology modelling (as

in (Doytchinova et al., 2004)). We then computed a

Poisson Boltzmann electrostatic potential for all the

modelled proteins by placing a three dimensional grid

box (with 17

= 2, 601 grid points) around the top sur-

face (covering α1 and α2 region). We are interested

in analysing interaction with other molecules there-

fore ignored all those grid points that were inside the

van der Waals surface of the target protein. We end up

with 2, 418 grid points which were deﬁnitely outside

the van der Waals surface of all the target proteins.

Each grid point worked as a variable in our dataset

and each protein is represented as a row in our dataset.

Further details for the data generation process can be

found in (Mumtaz et al., 2012; Mumtaz et al., 2013).

6 EXPERIMENTS

We used full GPLVM for the ‘oil ﬂow’ dataset and

sparse GPLVM for the ‘HLA class I’ dataset for visu-

alisation under the MLGPLVM framework. The ‘oil

ﬂow’ dataset has fewer data points and fewer vari-

ables and therefore, applying full GPLVM, it was pos-

sible to generate results in a matter of a few hours

whereas applying full GPLVM on the MHC class-I

dataset could take months as this has thousands of

data points with more than a couple of thousand vari-

ables. Therefore, we used sparse GPLVM (as brieﬂy

explained in Section 2.2) for the ‘HLA class-I’ dataset

to create visualisations under the MLGPLVM frame-

work in a matter of few hours. Each visualisation

model under the MLGPLVM framework is trained

(with 1500 iterations for the ‘oil ﬂow’ dataset and

2000 iterations for ‘HLA-Class I’ dataset) using the

scaled conjugate gradient optimisation method.

To evaluate the visualisation quality of ML-

GPLVM, we compute the quality measures deﬁned

in Section 4 for a range of number of neighbours

k = 5, 10, . . . , 50 for each cluster as indicated at level-

1 and its corresponing subset visualisation at level-

2. We intially computed the mean of these measures

(over k) (see in (Mumtaz et al., 2013)) and then sum-

marised them by taking mean across all the subsets to

compare the performance across levels (see Table 1).

IVAPP2014-InternationalConferenceonInformationVisualizationTheoryandApplications

126

6.1 Results

Root level visualisation plot for both the datasets (the

‘oil ﬂow’ and ‘HLA class I’) have shown that the

three classes in each dataset case are well separated

with a number of clusters observed for each class (see

Figure 2(a) for ‘oil ﬂow’ dataset and Figure 4(a) for

‘HLA class I’ dataset whereas applying linear visual-

isation such as Principal Component Analysis (PCA)

has not shown clear separation of the alleles of each

HLA gene but instead the alleles of all three genes

overlap (as shown in Figure 3)). We present in this pa-

per second level visualisation results (see Figures 2(b)

and 4(b)) generated by applying interactive cluster-

ing at the root level visualisation plots (See (Mumtaz

et al., 2013) for second level visualisation results gen-

erated by applying K-means and GMM clustering on

the root level visualisation plots). Experiments were

performed with a different numbers of clusters at root

level but here we present only the results with 4 clus-

ters used to generate the second-level visualisation.

Visual inspection of all these local second-level

visualisation models show that they provide a more

detailed clustering/visualisation structure compared

to the root visualisation. Tabel 1 show the mean of the

quality measure over clusters at level-1 and over sub-

models at level-2 using three clustering approaches

under MLGPLVM framework: K-means, GMM and

interactive clustering. We compute per point nega-

tive log-likelihood for each cluster at level-1 and sub-

model at level-2. The mean negative log-likelihood

(per point) is then computed over clusters at level-1

and sub-models at level-2 and presented in terms of

ratio increase or decrease of level-2 with respect to

level-1. For the ‘oil ﬂow’ dataset we observe that the

mean quantitative quality matrics (over clusters) to

compare visualisation quality across levels appeared

better for level-2 compared to level-1 visualisations

(see Table 1). For the ‘HLA class I’ dataset trust-

worthiness, continuity and TVDD are observed better

for all the second level visulisation models using each

of the clustering algorithm applied to generate second

visualisations (as shown in Table 1). The other mea-

sures such as MRREs and negative log-likelihoods

were slightly better for level-1.

By rigorous state-of-the-art analysis of projected

properties, we have identiﬁed clusters corresponding

to the three class I human MHC loci, and sub groups

therein. It is notable that the analysis recovers the

HLA-A; HLA-B, and HLA-C alleles without prior

knowledge of such a division at the root level visual-

isation and hence such grouping is reﬁned by adding

the lower level visualisations. This gives conﬁdence

to any assertion we might make regarding the division

(a)

Submodel 1

Submodel 2

Submodel 3 Submodel 4

(b)

Figure 2: MLGPLVM visualisation of ‘oil ﬂow’ dataset

with K-means clustering. (a) Root visualisation plot: blue

circles with numbers indicate cluster centres and blue lines

represent cluster boundaries, cyan dots (‘.’) for ‘Homo-

geneous’, red plus signs (‘+’) for ‘Annular’, blue squares

(‘’) for ‘Laminar’, and the grey background shows map-

ping precision (lighter regions correspond to better preci-

sion in mapping). (b) Level-2 visualisation.

Figure 3: PCA visualisation (cyan dots (‘.’) for HLA-A,

red plus signs (‘+’) for HLA-B and blue squares (‘’) for

HLA-C).

of the allele population into structurally and function-

ally similar sub-groups. The results of our analysis

are fully consistent with both the choice of Poisson-

Boltzmann electrostatic potential as a meaningful in-

dicator of molecular spatial interactions and with the

sophisticated methods of data reduction used to de-

rive the ﬁnal clustering. It is also consistent with

the evolutionary argument, since it suggests that with

the exception of a handful of genes, the three class-I

loci exhibit quite distinct speciﬁcities for peptides and

TCRs, since redundant speciﬁcities shared between

loci would be not favourable since it would reduce

the diversity of peptides that a host could recognize

and respond to and thus the diversity of pathogens it

could effectively combat. It will be interesting to ex-

tend our analysis to investigate the structural basis for

this phenomenon.

Multi-levelVisualisationusingGaussianProcessLatentVariableModels

127

Table 1: Visualisation Quality Matrics of MLGPLVM eval-

uation for two datasets (i.e. Oil ﬂow and HLA class 1

dataset). (‘Trust’ for trustworthiness, ‘Cont’ for continuity,

‘MRREd’ and ‘MRREl’ for mean relative rank errors with

respect to data space and latent space respectively, ‘TVDD’

for total visualsiation distance distortion) and ‘NLL’ for

negative log likelihood. Measures such as ‘Trust’ and

‘Cont’ the higher the better visualisation whereas measures

such as ‘MRREd’, ‘MRREl’ ‘TVDD’ and ‘NLL’ the lower

the better the visualisation.

Clustering Oil Flow HLA class 1

Level-1 Level-2 Level-1 Level-2

K-means

Trust 0.9484 0.9705 0.7895 0.8191

Cont 0.9409 0.9749 0.8022 0.8406

MRREd 0.1974 0.1352 0.0440 0.0445

MRREl 0.1956 0.1346 0.0407 0.4130

TVDD 0.5807 0.4752 0.8121 0.7978

NLL 1.0000 0.9911 1.0000 1.0295

GMM

Trust 0.9501 0.9729 0.7896 0.8285

Cont 0.9428 0.9743 0.8022 0.8406

MRREd 0.1661 0.1103 0.0440 0.0445

MRREl 0.1658 0.1103 0.0407 0.0413

TVDD 0.5806 0.4853 0.8121 0.7978

NLL 1.0000 0.9931 1.0000 1.0303

Interactive

Trust 0.9469 0.9788 0.7932 0.8243

Cont 0.9353 0.9817 0.8055 0.8341

MRREd 0.1682 0.1037 0.0439 0.0433

MRREl 0.1700 0.1043 0.0405 0.0408

TVDD 0.5990 0.4360 0.8112 0.7917

NLL 1.0000 0.9918 1.0000 1.1661

7 CONCLUSIONS AND FUTURE

WORK

In this paper we propose a multi-level visualisa-

tion using Gaussian process latent variable models

where the root-level visualisation gives an overview

of the complete dataset and the second-level view

gives reﬁned visualisation results for the clustered

data. We experiment the generation of second level

visualisations using three different clustering algo-

rithms: K-means, GMM and interactive. Both the

datasets we used for the demonstration of MLGPLVM

have shown promising improvements on the root-

level visualisation by giving reﬁned lower level vi-

sualisations. We brieﬂy conclude here the results of

‘MHC class-I’ dataset by saying that the present ap-

proach, which combines the established protocol of

chemical landscape proﬁling with calculated proper-

ties and state-of-the-art data visualization and clus-

tering, is promising. We will seek to extend this to

approach and apply it to the classiﬁcation of MHC

alleles in terms of peptide speciﬁcity, TCR speci-

ﬁcity, and antibody interaction and use it to inves-

tigate practical problems in epitope prediction, solid

organ and bone marrow transplantation, mate-choice,

(a)

Submodel 1 Submodel 2

Submodel 3 Submodel 4

(b)

Figure 4: MLGPLVM visualisation of ‘MHC class-I’

dataset with interactive clustering. (a) Root visualisation

plot (numbered blue circles with numbers indicate cluster

centres and blue lines represent cluster boundaries, cyan

dots (‘.’) for HLA-A, red plus sign (‘+’) for HLA-B and

blue squares (‘’) for HLA-C, and the grey background

show mapping precision (lighter regions show better pre-

cision in mapping)). (b) Level-2 visualisation.

and MHC-mediated adverse drug reactions.

We have also incorporated the code of ML-

GPLVM in our recently developed visualisation tool

called Data Visualisation and Modelling System

(DVMS). This tool is freely accessible from our web-

site

. We plan to extend this work with a probabli-

istic hierarchical visualisation framework (based on

soft assignments to child models).

REFERENCES

Bishop, C. and James, G. (1993). Analysis of Multiphase

Flows Using Dual-Energy Gamma Densitometry and

Neural Networks. Nuclear Instruments and Methods

in Physics Research, 327(2-3):580–593.

Bishop, C. M. and Svensen, M. (1998). GTM: The gen-

erative topographic mapping. Neural Compuatation,

10(1):215–234.

Bishop, C. M. and Tipping, M. E. (1998). A hierarchical la-

tent variable model for data visualization. IEEE Trans.

Pattern Anal. Mach. Intell., 20(3):281–293.

Corduneanu, A. and Bishop, C. M. (2001). Variational

bayesian model selection for mixture distributions.

In Artiﬁcial intelligence and Statistics, volume 2001,

pages 27–34. Morgan Kaufmann Waltham, MA.

Donoho, D. L. (2000). High-dimensional data analysis: the

curses and blessings of dimensionality. In American

http://www.aston.ac.uk/ncrg

IVAPP2014-InternationalConferenceonInformationVisualizationTheoryandApplications

128

Mathematical Society Conf. Math Challenges of the

21st Century.

Doytchinova, I. A., Guan, P., and Flower, D. R. (2004).

Identifying human MHC supertypes uisng bioin-

formatics methods. The Journal of Immunology,

172:4314–4323.

Hormann, K. and Agathos, A. (2001). The point in poly-

gon problem for arbitrary polygons. Comput. Geom.

Theory Appl., 20(3):131–144.

Iwata, T., Duvenaud, D., and Ghahramani, Z. (2012).

Warped mixtures for nonparametric cluster shapes.

arXiv preprint arXiv:1206.1846.

Lampinen, J. and Oja, E. (1992). Clustering properties of

hierarchical self-organizing maps. Journal of Mathe-

matical Imaging and Vision, 2:261–272.

Larkin, J. H. and Simon, H. A. (1987). Why a diagram

is (sometimes) worth ten thousand words. Cognitive

Science, 11(1):65–100.

Lawrence, N. and Hyvrinen, A. (2005). Probabilistic

non-linear principal component analysis with gaus-

sian process latent variable models. Journal of Ma-

chine Learning Research, 6:1783–1816.

Lawrence, N. D. (2004). Gaussian process latent variable

models for visualisation of high dimensional data. In

Thrun, S., Saul, L., and Sch

olkopf, B., editors, Ad-

vances in Neural Information Processing Systems 16,

pages 329–336, Cambridge, MA. MIT Press.

Lawrence, N. D. (2006). Local distance preservation in the

GPLVM through back constraints. In In ICML, pages

513–520. ACM Press.

Lawrence, N. D. (2008). Large scale learning with the gaus-

sian process latent variable model. Technical report,

university of shefﬁeld, United Kingdom.

Lawrence, N. D., Seeger, M., and Herbrich, R. (2003).

Fast sparse gaussian process methods: The informa-

tive vector machine. In Advances in Neural Infor-

mation Processing Systems 15, pages 609–616. MIT

Press.

Lee, D., Redfern, O., and Orengo, C. (2007). Predicting

protein function from sequence and structure. Nature

Reviews Molecular Cell Biology, 8:995–1005.

Lowe, D. and Tipping, M. E. (1996). Neuroscale: Novel

topographic feature extraction using RBF networks.

In NIPS, pages 543–549.

MacQueen, J. (1967). Some methods for classiﬁcation and

analysis of multivariate observations. In Le Cam,

L. M. and Neyman, J., editors, Proceedings of the 5th

Berkeley Symposium on Mathematical Statistics and

Probability - Vol. 1, pages 281–297. University of Cal-

ifornia Press, Berkeley, CA, USA.

McLachlan, G. and Basford, K. (1988). Mixture Mod-

els: Inference and Applications to Clustering. Marcel

Dekker, New York.

Miikkulainen, R. (1990). Script recognition with hierarchi-

cal feature maps. Connection Science, 2:83–101.

Mumtaz, S., Flower, D. R., and Nabney, I. T.

(2013). Multi-level visualisation. Technical re-

port, NCRG, Aston University, Birmingham, UK.

http://eprints.aston.ac.uk/.

Mumtaz, S., Nabney, I. T., and Flower, D. (2012). Novel

visualization methods for protein data. In Compu-

tational Intelligence in Bioinformatics and Computa-

tional Biology (CIBCB), 2012 IEEE Symposium on,

pages 198 –205.

Murtagh, F. and Contreras, P. (2011). Methods of hierarchi-

cal clustering. CoRR, abs/1105.0121.

Quionero-candela, J., Rasmussen, C. E., and Herbrich, R.

(2005). A unifying view of sparse approximate gaus-

sian process regression. Journal of Machine Learning

Research, 6:1939–1959.

Schreck, T., von Landesberger, T., and Bremm, S. (2010).

Techniques for precision-based visual analysis of pro-

jected data. Information Visualization, 9(3):181–193.

Shneiderman, B. (2002). Inventing discovery tools: com-

bining information visualization with data mining. In-

formation Visualization, 1(1):5–12.

Tino, P. and Nabney, I. T. (2002). Hierarchical GTM: Con-

structing localized nonlinear projection manifolds in

a principled way. In IEEE Transactions on Pattern

Analysis and Machine Intelligence, volume 24, pages

639–656.

Tipping, M. E. and Bishop, C. M. (1999). Probabilistic prin-

cipal component analysis. Journal of the Royal Statis-

tical Society, Series B, 61:611–622.

Venna, J. and Kaski, S. (2001). Neighborhood preserva-

tion in nonlinear projection methods: An experimen-

tal study. In Proceedings of the International Con-

ference on Artiﬁcial Neural Networks, ICANN ’01,

pages 485–491, London, UK, UK. Springer-Verlag.

Versino, C. and Gambardella, L. M. (1996). Learning ﬁne

motion by using the hierarchical extended kohonen

map. In MAP, Proceedings Proc. ICANN96, Inter-

national Conference on Artiﬁcial Neural Networks,

pages 221–226. Springer-Verlag.

Vicente, D. and Vellido, A. (2004). Review of hierarchi-

cal models for data clustering and visualization. Ten-

dencias de la Minera de Datos en Espaa, Espaola de

Minera de Datos.

Wang, X., Ti

no, P., and Fardal, M. A. (2008). Multiple man-

ifolds learning framework based on hierarchical mix-

ture density model. In Proceedings of the European

conference on Machine Learning and Knowledge Dis-

covery in Databases - Part II, ECML PKDD ’08,

pages 566–581, Berlin, Heidelberg. Springer-Verlag.

Multi-levelVisualisationusingGaussianProcessLatentVariableModels

129