show that the learned dimensionality reductions are
very useful for information visualization and visual
data mining. Here, ReNDA has shown to be more
robust against the simulated data fluctuations than
GerDA. As far as we know, this paper is the first to
present such extensive experiments on the robustness
of DNNs. Therefore, we were forced to run all the
experiments on our own. Our experiments with the
DBN-DNN provide a first glance at the capability of
another approach to dimensionality reduction for data
visualization (cf. Section 3.2.5). Of course, there are
other DNN approaches that are capable of learning
informative, visualizable features but here, one must
have in mind that extensive experiments are essential
to the process of finding suitable DNNs.
In this context, an important task is to figure out
what we can learn from other suitable DNNs, e.g. the
DBN-DNN (Tanaka and Okutomi, 2014). One of our
future task will be to include the recently proposed
drop out regularization (Srivastava et al., 2014). In
addition to the investigation and integration of other
promising approaches, there are of course some open
questions within the ReNDA approach itself: The
most important is if there is a better value for λ than
0.5 and if there is a way to automatically determine
an optimal λ-value. The next question points out a
chance that comes with ReNDA: Can we exploit the
unsupervised learning that the DAE part of ReNDA
performs, so that semi-supervised learning tasks can
be handled?
In summary, ReNDA has been shown to provide
a good way of learning dimensionality reductions for
data visualization. Moreover, questions like the two
above clearly show that the ReNDA approach can be
further advanced and adapted to suit a wide range of
real-world applications. Providing the possibility to
use ReNDA for semi-supervised learning will be an
essential advancement in this context.
ACKNOWLEDGEMENTS
We would like to thank the OVGU Magdeburg, the
Faculty of Media (HS D
¨
usseldorf) and the Faculty of
Mechanical and Process Engineering (HS D
¨
usseldorf)
for providing us with the computational power for our
extensive experiments.
REFERENCES
Duda, R. O., Hart, P. E., and Stork, D. G. (2000). Pattern
Classification. John Wiley & Sons, Inc.
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A.,
and Vincent, P. (2010). Why does unsupervised pre-
training help deep learning? Journal of Machine
Learning Research, 11:625–660.
Fisher, R. A. (1936). The use of multiple measurements in
taxonomic problems. Annals of Eugenics, 7:179–188.
Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing
the dimensionality of data with neural networks. SCI-
ENCE, 313:504–507.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. Proc. of the IEEE, pages 1–46.
Sips, M., Neubert, B., Lewis, J. P., and Hanrahan, P. (2009).
Selecting good views of high-dimensional data us-
ing class consistency. Computer Graphics Forum,
28(3):831–838.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I.,
and Salakhutdinov, R. (2014). Dropout: A simple way
to prevent neural networks from overfitting. Journal
of Machine Learning Research, 15:1929–1958.
Stuhlsatz, A., Lippel, J., and Zielke, T. (2012). Feature ex-
traction with deep neural networks by a generalized
discriminant analysis. IEEE Transactions on Neural
Networks, 666:1–666.
Stuhlsatz, A., Meyer, C., Eyben, F., ZieIke, T., Meier,
G., and Schuller, B. (2011). Deep neural networks
for acoustic emotion recognition: Raising the bench-
marks. In Proc. IEEE Intern. Conf. on Acoustics,
Speech and Signal Processing (ICASSP).
Tanaka, M. (2016). Deep neural network. MATLAB Central
File Exchange (# 42853). Retrieved Dec 2016.
Tanaka, M. and Okutomi, M. (2014). A novel inference
of a restricted boltzmann machine. In 22nd Inter-
national Conference on Pattern Recognition, ICPR
2014, Stockholm, Sweden, August 24-28, 2014, pages
1526–1531.
Zhai, Y., Ong, Y. S., and Tsang, I. W. (2014). The emerg-
ing ”big dimensionality”. IEEE Computational Intel-
ligence Magazine, 9(3):14–26.
APPENDIX
A. On the Normalized GerDA Criterion
In Section 2.4.1, we stated that the GerDA criterion
(21) is normalized, i.e. that J
GerDA
∈ (0|1). As this
is not straightforward to see, we give a proof in this
appendix section.
Let λ
k
for k ∈ {1,...,d
Z
} denote the eigenvalues
of (S
δ
T
)
−1
S
δ
B
. Then trace
(S
δ
T
)
−1
S
δ
B
=
∑
n
k=1
λ
k
and
we need to show that 0 < λ
k
< 1 for all k.
Therefore, let µ
k
for k ∈ {1,...,d
Z
} denote the
eigenvalues of S
−1
W
S
δ
B
and let x
x
x
k
∈ R
d
Z
denote an
eigenvector to the eigenvalue µ
k
. Then
S
−1
W
S
δ
B
x
x
x
k
= µ
k
x
x
x
k
⇔ x
x
x
tr
k
S
δ
B
x
x
x
k
= µ
k
· x
x
x
tr
k
S
W
x
x
x
k
.
(26)
IVAPP 2017 - International Conference on Information Visualization Theory and Applications
126