projection can reach similar, though not fully identi-
cal, separation levels. Two open questions arise here:
(1) How can we adapt the NN approach to learn such
a strong separation? Our experiments show that hy-
perparameter settings, including regularization, data
augmentation, optimizer, loss function and network
architecture cannot fully eliminate diffusion, although
by using MAE as loss function, quality metrics in-
creased in value. The main open dimension to study
is using non-standard loss functions than those used;
(2) Can we design quality metrics able to better cap-
ture diffusion? If so, such metrics could be used to
next design suitable loss functions to minimize this
undesired effect. Both these issues are open to future
research.
6 CONCLUSION
We presented an in-depth study aimed at assessing the
quality and stability of dimensionality reduction (DR)
using supervised deep learning. For this, we explored
the design space of a recent deep learning method
in this class (Espadoto et al., 2019) in six orthogo-
nal directions: training-set size, network architecture,
regularization, optimization, data augmentation, and
loss functions. These are the main design degrees-
of-freedom present when creating any deep learning
architecture. We sampled each direction using sev-
eral settings (method types and parameter values) and
compared the resulting projections with the ground-
truth (t-SNE method) quantitatively, using four qual-
ity metrics, and also qualitatively by visual inspec-
tion.
Our results deliver an optimal hyperparameter set-
ting for which the deep-learned projections can ap-
proach very closely the quality of the t-SNE ground
truth. Separately, we showed that the deep learning
projection method is stable with respect to all param-
eter settings, training-set size, and noise added to the
input data. These results complement recent evalua-
tions (Espadoto et al., 2019) to argue strongly that su-
pervised deep learning is a practical, robust, simple-
to-set-up, and high-quality alternative to t-SNE for
dimensionality reduction in data visualization. More
broadly, this study is, to our knowledge, the only work
that presents in detail how hyperparameter spaces of a
projection method can be explored to find optimal set-
tings and evidence for the projection method stability.
We believe that our methodology can be directly used
to reach the same goals (optimal settings and proof
of stability) for any projection technique under study,
whether using deep learning or not.
We plan to extend these results in several direc-
tions. First, we aim to explore non-standard loss func-
tions to reduce the small, but visible, amount of dif-
fusion present in the deep learned projections. Sec-
ondly, we aim to extend our approach to project time-
dependent data in a stable and out-of-sample manner,
which is a long-standing, but not yet reached, goal for
high-dimensional data visualization.
ACKNOWLEDGMENTS
This study was financed in part by the Coordenac¸
˜
ao
de Aperfeic¸oamento de Pessoal de N
´
ıvel Superior -
Brasil (CAPES) - Finance Code 001.
REFERENCES
Espadoto, M., Hirata, N., and Telea, A. (2019). Deep learn-
ing multidimensional projections. arXiv:1902.07958
[cs.CG].
Feurer, M. and Hutter, F. (2019). Hyperparameter optimiza-
tion. In Automated Machine Learning. Springer.
Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing
the dimensionality of data with neural networks. Sci-
ence, 313(5786):504–507.
Ilievski, I., Akhtar, T., Feng, J., and Shoemaker, C. (2017).
Efficient hyperparameter optimization of deep learn-
ing algorithms using deterministic RBF surrogates. In
Proc. AAAI.
Joia, P., Coimbra, D., Cuminato, J. A., Paulovich, F. V., and
Nonato, L. G. (2011). Local affine multidimensional
projection. IEEE TVCG, 17(12):2563–2571.
Jolliffe, I. T. (1986). Principal component analysis and fac-
tor analysis. In Principal Component Analysis, pages
115–128. Springer.
Kehrer, J. and Hauser, H. (2013). Visualization and vi-
sual analysis of multifaceted scientific data: A survey.
IEEE TVCG, 19(3):495–513.
Kingma, D. and Ba, J. (2014). Adam: A method for
stochastic optimization. arXiv:1412.6980.
Krogh, A. and Hertz, J. A. (1992). A simple weight decay
can improve generalization. In NIPS, pages 950–957.
LeCun, Y., Cortes, C., and Burges, C. (2010).
MNIST handwritten digit database. AT&T Labs
http://yann.lecun.com/exdb/mnist, 2.
Liu, S., Maljovec, D., Wang, B., Bremer, P.-T., and
Pascucci, V. (2015). Visualizing high-dimensional
data: Advances in the past decade. IEEE TVCG,
23(3):1249–1268.
Martins, R., Minghim, R., and Telea, A. C. (2015). Explain-
ing neighborhood preservation for multidimensional
projections. In Proc. CGVC, pages 121–128. Euro-
graphics.
McInnes, L. and Healy, J. (2018). UMAP: Uniform man-
ifold approximation and projection for dimension re-
duction. arXiv:1802.03426.