continuous and discrete data with Gaussian and
Bernoulli/multinomial distributions respectively.
These extensions have been suggested in (Bishop
et al., 1998) but this is the first time the mathematical
details have been worked out and an implementation
written and evaluated.
Visualisation results for synthetic data using the
GGTM have shown more compact clusters for each
class compared to the standard GTM whereas for the
real dataset no significant difference was observed.
For synthetic datasets with missing values, GGTM vi-
sualisations have greater compactness for each class.
In terms of visualisation quality evaluation metrics,
we observed that for a mix of continuous and binary
data, the trustworthiness and MRRE
x
are slightly bet-
ter for standard GTM compared to GGTM whereas
the continuity and MRRE
z
were better for GGTM
compared to standard GTM. However, for a mix of
continuous, binary and multi-category features, all the
quality evaluation measures were better for GGTM
compared to the standard GTM. Missing values have
caused limited deterioration in results compared to the
complete data case.
REFERENCES
Bache, K. and Lichman, M. (2013). UCI machine learning
repository.
Bishop, C. M. (1995). Neural networks for pattern recog-
nition. Oxford University Press.
Bishop, C. M. and Svensen, M. (1998). GTM: The gen-
erative topographic mapping. Neural Compuatation,
10(1):215–234.
Bishop, C. M., Svensen, M., and Williams, C. K. I. (1998).
Developments of the generative topographic mapping.
Neurocomputing, 21(1):203–224.
de Leon, A. R. and Chough, K. C. (2013). Analysis of
Mixed Data: Methods & Applications. Taylor &
Fracis Group. Chapman and Hall/CRC.
Dunson, D. B. (2000). Bayesian latent variable models
for clustered mixed outcomes. Journal of the Royal
Statistical Society. Series B (Statistical Methodology),
62(2):355–366.
Ghahramani, Z. and Jordan, M. I. (1994). Learning from
incomplete data. Technical Report AIM-1509.
Kab´an, A. and Girolami, M. (2001). A combined latent
class and trait model for the analysis and visualization
of discrete data. Pattern Analysis and Machine Intel-
ligence, IEEE Transactions on, 23(8):859–872.
Krzanowski, W. J. (1983). Distance between popula-
tions using mixed continuous and categorical vari-
ables. Biometrika, 70(1):235–243.
Lee, J. A. and Verleysen, M. (2008). Rank-based quality
assessment of nonlinear dimensionality reduction. In
ESANN, pages 49–54.
McLachlan, G. and Krishnan, T. (1997). The EM algorithm
and extensions. Wiley, New York.
Moustaki, I. (1996). A latent trait and a latent class model
for mixed observed variables. British Journal ofMath-
ematical and Statistical Psychology, 49(2):313–334.
Sammel, M. D., Ryan, L. M., and Legler, J. M. (1997). La-
tent variable models for mixed discrete and continu-
ous outcomes. Journal of the Royal Statistical Society.
Series B (Methodological), 59(3):667–678.
Sun, Y., Tino, P., and Nabney, I. (2002). Visualisa-
tion of incomplete data using class information con-
straints. In Winkler, J. and Niranjan, M., editors, Un-
certainty in Geometric Computations, volume 704 of
The Springer International Series in Engineering and
Computer Science, pages 165–173. Springer US.
Teixeira-Pinto, A. and Normand, S. T. (2009). Correlated
bivariate continuous and binary outcomes: issues and
applications. Statistics in Medicine, 28(13):1753–
1773.
Tipping, M. E. (1999). Probabilistic visualisation of high-
dimensional binary data. In Proceedings of the 1998
Conference on Advances in Neural Information Pro-
cessing Systems II, pages 592–598, Cambridge, MA,
USA. MIT Press.
Venna, J. and Kaski, S. (2001). Neighborhood preserva-
tion in nonlinear projection methods: an experimen-
tal study. In Proceedings of the International Con-
ference on Artificial Neural Networks, ICANN ’01,
pages 485–491, London, UK. Springer-Verlag.
Yu, K. and Tresp, V. (2004). Heterogenous data fusion via a
probabilistic latent-variable model. In M¨uller-Schloer,
C., Ungerer, T., and Bauer, B., editors, ARCS, volume
2981 of Lecture Notes in Computer Science, pages
20–30. Springer.
IVAPP2015-InternationalConferenceonInformationVisualizationTheoryandApplications
238