Authors:
Shahzad Mumtaz
1
;
Darren R. Flower
2
and
Ian T. Nabney
1
Affiliations:
1
Non-Linearity and Complexity Research Group and Aston University, United Kingdom
;
2
School of Life and Health Sciences and Aston University, United Kingdom
Keyword(s):
Multi-level Gaussian Process Latent Variable Model, k-means, Gaussian Mixture Model, Trustworthiness, Continuity, Negative Log-likelihood, Visualisation Distance Distortion, Mean Relative Rank Errors, Major Histocompatibility Complex.
Related
Ontology
Subjects/Areas/Topics:
Abstract Data Visualization
;
Computer Vision, Visualization and Computer Graphics
;
Databases and Visualization, Visual Data Mining
;
General Data Visualization
;
High-Dimensional Data and Dimensionality Reduction
;
Information and Scientific Visualization
;
Visual Data Analysis and Knowledge Discovery
Abstract:
Projection of a high-dimensional dataset onto a two-dimensional space is a useful tool to visualise structures
and relationships in the dataset. However, a single two-dimensional visualisation may not display all the
intrinsic structure. Therefore, hierarchical/multi-level visualisation methods have been used to extract more
detailed understanding of the data. Here we propose a multi-level Gaussian process latent variable model
(MLGPLVM). MLGPLVM works by segmenting data (with e.g. K-means, Gaussian mixture model or interactive
clustering) in the visualisation space and then fitting a visualisation model to each subset. To measure
the quality of multi-level visualisation (with respect to parent and child models), metrics such as trustworthiness,
continuity, mean relative rank errors, visualisation distance distortion and the negative log-likelihood
per point are used. We evaluate the MLGPLVM approach on the ‘Oil Flow’ dataset and a dataset of protein
electrostatic potentials for the
‘Major Histocompatibility Complex (MHC) class I’ of humans. In both cases,
visual observation and the quantitative quality measures have shown better visualisation at lower levels.
(More)