# Visualisation of Heterogeneous Data with the Generalised Generative Topographic Mapping

### Michel F. Randrianandrasana, Shahzad Mumtaz, Ian T. Nabney

#### Abstract

Heterogeneous and incomplete datasets are common in many real-world visualisation applications. The probabilistic nature of the Generative Topographic Mapping (GTM), which was originally developed for complete continuous data, can be extended to model heterogeneous (i.e. containing both continuous and discrete values) and missing data. This paper describes and assesses the resulting model on both synthetic and real-world heterogeneous data with missing values.

#### References

- Bache, K. and Lichman, M. (2013). UCI machine learning repository.
- Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press.
- Bishop, C. M. and Svensen, M. (1998). GTM: The generative topographic mapping. Neural Compuatation, 10(1):215-234.
- Bishop, C. M., Svensen, M., and Williams, C. K. I. (1998). Developments of the generative topographic mapping. Neurocomputing, 21(1):203-224.
- de Leon, A. R. and Chough, K. C. (2013). Analysis of Mixed Data: Methods & Applications. Taylor & Fracis Group. Chapman and Hall/CRC.
- Dunson, D. B. (2000). Bayesian latent variable models for clustered mixed outcomes. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 62(2):355-366.
- Ghahramani, Z. and Jordan, M. I. (1994). Learning from incomplete data. Technical Report AIM-1509.
- Kabán, A. and Girolami, M. (2001). A combined latent class and trait model for the analysis and visualization of discrete data. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(8):859-872.
- Krzanowski, W. J. (1983). Distance between populations using mixed continuous and categorical variables. Biometrika, 70(1):235-243.
- Lee, J. A. and Verleysen, M. (2008). Rank-based quality assessment of nonlinear dimensionality reduction. In ESANN, pages 49-54.
- McLachlan, G. and Krishnan, T. (1997). The EM algorithm and extensions. Wiley, New York.
- Moustaki, I. (1996). A latent trait and a latent class model for mixed observed variables. British Journal of Mathematical and Statistical Psychology, 49(2):313-334.
- Sammel, M. D., Ryan, L. M., and Legler, J. M. (1997). Latent variable models for mixed discrete and continuous outcomes. Journal of the Royal Statistical Society. Series B (Methodological), 59(3):667-678.
- Sun, Y., Tino, P., and Nabney, I. (2002). Visualisation of incomplete data using class information constraints. In Winkler, J. and Niranjan, M., editors, Uncertainty in Geometric Computations, volume 704 of The Springer International Series in Engineering and Computer Science, pages 165-173. Springer US.
- Teixeira-Pinto, A. and Normand, S. T. (2009). Correlated bivariate continuous and binary outcomes: issues and applications. Statistics in Medicine, 28(13):1753- 1773.
- Tipping, M. E. (1999). Probabilistic visualisation of highdimensional binary data. In Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II, pages 592-598, Cambridge, MA, USA. MIT Press.
- Venna, J. and Kaski, S. (2001). Neighborhood preservation in nonlinear projection methods: an experimental study. In Proceedings of the International Conference on Artificial Neural Networks, ICANN 7801, pages 485-491, London, UK. Springer-Verlag.
- Yu, K. and Tresp, V. (2004). Heterogenous data fusion via a probabilistic latent-variable model. In Müller-Schloer, C., Ungerer, T., and Bauer, B., editors, ARCS, volume 2981 of Lecture Notes in Computer Science, pages 20-30. Springer.

#### Paper Citation

#### in Harvard Style

F. Randrianandrasana M., Mumtaz S. and T. Nabney I. (2015). **Visualisation of Heterogeneous Data with the Generalised Generative Topographic Mapping** . In *Proceedings of the 6th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2015)* ISBN 978-989-758-088-8, pages 233-238. DOI: 10.5220/0005305002330238

#### in Bibtex Style

@conference{ivapp15,

author={Michel F. Randrianandrasana and Shahzad Mumtaz and Ian T. Nabney},

title={Visualisation of Heterogeneous Data with the Generalised Generative Topographic Mapping},

booktitle={Proceedings of the 6th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2015)},

year={2015},

pages={233-238},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0005305002330238},

isbn={978-989-758-088-8},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 6th International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2015)

TI - Visualisation of Heterogeneous Data with the Generalised Generative Topographic Mapping

SN - 978-989-758-088-8

AU - F. Randrianandrasana M.

AU - Mumtaz S.

AU - T. Nabney I.

PY - 2015

SP - 233

EP - 238

DO - 10.5220/0005305002330238