An example application of this procedure for the
USPS data set is based on the k t-SNE projections as
specified in the last paragraph. An SVM with Gaus-
sian kernel is trained on a subset of the data which
is not used to train the subsequent kernel t-SNE or
Fisher kernel t-SNE, respectively. A classification ac-
curacy of 99% on the training set and 97% on the test
set arises. We use two different kernel t-SNE map-
pings to obtain a training set for the inverse mapping
p
−1
: kernel t-SNE and Fisher kernel t-SNE, respec-
tively. The weights of the cost function has been cho-
sen as λ
1
= 0.1 and λ
2
= 10000, respectively. The re-
sulting visualization of the SVM classification is dis-
played in Fig. 5(top) if the procedure is based on ker-
nel t-SNE and Fig. 5(bottom) if the procedure is based
on Fisher kernel t-SNE.
Obviously, the visualization based on Fisher ker-
nel t-SNE displays much clearer class boundaries as
compared to a visualization which does not take the
class labeling into account. This visual impression is
mirrored by a quantitative comparison of the projec-
tions. For the kernel t-SNE mapping, the classifica-
tion induced in 2D as displayed in the map coincides
with the original classification with a 85% accuracy
only. If Fisher kernel t-SNE is used, the coincidence
increases to 92%.
5 CONCLUSIONS
We have reviewed discriminative dimensionality re-
duction, its link to the Fisher information matrix, and
we have discussed its difference to a direct classifica-
tion. Based on Fisher kernel t-SNE, two applications
have been proposed: a speed-up of dimensionality re-
duction on the one side and a visualization of a classi-
fier such as SVM on the other side. So far, the appli-
cations have been demonstrated using one benchmark
only, results for alternative benchmarks being similar.
Note that the proposed techniques are not restricted
to t-SNE, rather, similar techniques could be based on
top of popular alternatives such as LLE or Isomap.
ACKNOWLEDGEMENTS
Funding by DFG under grants number HA 2719/7-1,
HA 2719/4-1 and by the CITEC centre of excellence
are gratefully acknowledged. We would like to thank
the anonymous reviewers for helpful comments and
suggestions.
REFERENCES
Baudat, G. and Anouar, F. (2000). Generalized discriminant
analysis using a kernel approach. Neural Computa-
tion, 12:2385–2404.
Bekkerman, R., Bilenko, M., and Langford, J., editors
(2011). Scaling up Machine Learning. Cambridge
University Press.
Biehl, M., Hammer, B., Mer
´
enyi, E., Sperduti, A., and
Villmann, T., editors (2011). Learning in the con-
text of very high dimensional data (Dagstuhl Seminar
11341), volume 1.
Braun, M. L., Buhmann, J. M., and M
¨
uller, K.-R. (2008).
On relevant dimensions in kernel feature spaces. J.
Mach. Learn. Res., 9:1875–1908.
Bunte, K., Biehl, M., and Hammer, B. (2012a). A general
framework for dimensionality reducing data visualiza-
tion mapping. Neural Computation, 24(3):771–804.
Bunte, K., Schneider, P., Hammer, B., Schleif, F.-M., Vill-
mann, T., and Biehl, M. (2012b). Limited rank matrix
learning, discriminative dimension reduction and vi-
sualization. Neural Networks, 26:159–173.
Caragea, D., Cook, D., Wickham, H., and Honavar, V.
(2008). Visual methods for examining svm classi-
fiers. In Simoff, S. J., B
¨
ohlen, M. H., and Mazeika, A.,
editors, Visual Data Mining, volume 4404 of Lecture
Notes in Computer Science, pages 136–153. Springer.
Cohn, D. (2003). Informed projections. In Becker, S.,
Thrun, S., and Obermayer, K., editors, NIPS, pages
849–856. MIT Press.
Geng, X., Zhan, D.-C., and Zhou, Z.-H. (2005). Supervised
nonlinear dimensionality reduction for visualization
and classification. IEEE Transactions on Systems,
Man, and Cybernetics, Part B, 35(6):1098–1107.
Gisbrecht, A., Mokbel, B., and Hammer, B. (2013). Linear
basis-function t-sne for fast nonlinear dimensionality
reduction. In IJCNN.
Goldberger, J., Roweis, S., Hinton, G., and Salakhutdinov,
R. (2004). Neighbourhood components analysis. In
Advances in Neural Information Processing Systems
17, pages 513–520. MIT Press.
Hastie, T., Tibshirani, R., and Friedman, J. (2001). The
Elements of Statistical Learning. Springer Series in
Statistics. Springer New York Inc., New York, NY,
USA.
Hernandez-Orallo, J., Flach, P., and Ferri, C. (2011). Brier
curves: a new cost-based visualisation of classifier
performance. In International Conference on Machine
Learning.
Iwata, T., Saito, K., Ueda, N., Stromsten, S., Griffiths,
T. L., and Tenenbaum, J. B. (2007). Parametric em-
bedding for class visualization. Neural Computation,
19(9):2536–2556.
Jakulin, A., Mo
ˇ
zina, M., Dem
ˇ
sar, J., Bratko, I., and Zu-
pan, B. (2005). Nomograms for visualizing support
vector machines. In Proceedings of the eleventh ACM
SIGKDD international conference on Knowledge dis-
covery in data mining, KDD ’05, pages 108–117, New
York, NY, USA. ACM.
ICPRAM2013-InternationalConferenceonPatternRecognitionApplicationsandMethods
40