row of Figure 3(E) ((s
{le f t,right}|{no,both},cluster2
, where
cluster 2 corresponds to class no) is similar to the first
row of Figure 2(B) (s
le f t|both
); and the third row of
Figure 3(E) (s
{le f t,right}|{no,both},cluster1
, where cluster
1 corresponds to class both) is similar to the third row
of Figure 2(C) (s
le f t|both
). The significance of these
observations is that Procedure IV has succeeded in
providing similar information when applied to Clas-
sification Task II (the 2-class xor-problem) as the in-
formation given by Procedure III applied to Classifi-
cation Task I (the 4-class classification), even though
the available class labels in Task II are less informa-
tive and the two classes no, both and left, right are
heterogeneous. This suggests that Procedure IV can
provide useful information about the nature of non-
linear classifiers when applied to complex, heteroge-
neous classes.
5 CONCLUSIONS
The established probabilistic sensitivity map proce-
dure provides a global summary map of the relative
importance of voxels to a trained classifier (Kjems
et al., 2002). However, no sign information is present
in such a map. In the present work we have pro-
posed a procedure to allow for generation of sum-
mary maps with sign information. Furthermore, we
have proposed a clustering procedure that is applica-
ble in cases where relatively large heterogeneity be-
tween observation exists which may degrade the per-
formance of the model visualization due to cancella-
tion effects.
As a proof of concept, we have illustrated the ap-
proach on a data set from a simple fMRI experiment,
with classes deliberately defined to be heterogeneous.
Our procedure successfully recovered known struc-
ture in the classes. We also found that the maps pro-
duced for this data set are robust, in the sense that
they are reproducible as judged by the NPAIRS re-
sampling framework. We showed that reproducibility
is improved by the new clustering procedure.
Our results suggest that our new method of model
visualization may be useful in visualizing nonlinear
classifiers trained on heterogeneous classes. Further
work is needed to compare variations of the method,
in particular different possible choices of the visual-
ization function (see Section 2.2.2), and to validate
the method on a larger variety of real or synthetic data.
ACKNOWLEDGEMENTS
This work is partly supported by the Danish Lundbeck
Foundation through the program www.cimbi.org. The
Simon Spies Foundation is acknowledged for dona-
tion of the Siemens Trio scanner. Kristoffer H. Mad-
sen was supported by the Danish Medical Research
Council (grant no. 09-072163) and the Lundbeck
Foundation (grant no. R48-A4846).
REFERENCES
Baehrens, D., Schroeter, T., Harmeling, S., Kawanab, M.,
Hansen, K., and M
¨
uller, K.-R. (2010). How to ex-
plain individual classification decisions. Journal of
Machine Learning Research, 11:1803–1831.
Cox, D. D. and Savoy, R. L. (2003). Functional magnetic
resonance imaging (fMRI) ”brain reading”: detecting
and classifying distributed patterns of fMRI activity in
human visual cortex. NeuroImage 19, pages 261–270.
Davatzikos, C., Ruparel, K., Fan, Y., Shen, D., Acharyya,
M., Loughead, J., Gur, R., and Langleben, D. (2005).
Classifying spatial patterns of brain activity with ma-
chine learning methods: application to lie detection.
NeuroImage, 28(3):663–668.
Friedman, J. H. (1989). Regularized discriminant analysis.
J. Am. Statistical Assoc., 84:165 – 175.
Golland, P., Grimson, W. E. L., Shenton, M. E., and Kikinis,
R. (2005). Detection and analysis of statistical differ-
ences in anatomical shape. Medical Image Analysis,
9:69–86.
Haynes, J. D. and Rees, G. (2006). Decoding mental states
from brain activity in humans. Nature Reviews Neu-
roscience, 7(7):523–534.
Kjems, U., Hansen, L. K., Anderson, J., Frutiger, S., Mu-
ley, S., Sidtis, J., Rottenberg, D., and Strother, S. C.
(2002). The quantitative evaluation of functional neu-
roimaging experiments: mutual information learning
curves. NeuroImage, 15(4):772–786.
LaConte, S., Strother, S., Cherkassky, V., Anderson, J., and
Hu, X. (2005). Support vector machines for temporal
classification of block design fMRI data. NeuroImage,
26:317–329.
Lautrup, B., Hansen, L., Law, I., Mørch, N., Svarer, C.,
and Strother, S. (1994). Massive weight sharing: A
cure for extremely ill-posed problems. Proceedings of
the Workshop on Supercomputing in Brain Research:
From Tomography to Neural Networks. World Scien-
tific, Ulich, Germany, pages 137–148.
Mika, S., R
¨
atsch, G., Sch
¨
olkopf, B., Smola, A., Weston, J.,
and M
¨
uller, K.-R. (1999). Invariant feature extraction
and classification in kernel spaces. Advances in Neu-
ral Information Processing Systems, 12:526–532.
Misaki, M., Kim, Y., Bandettini, P., and Kriegeskorte, N.
(2010). Comparison of multivariate classifiers and re-
sponse normalizations for pattern-information fMRI.
NeuroImage, 53(1):103–118.
Mørch, N., Hansen, L. K., Strother, S. C., Svarer, C., Rot-
tenberg, D. A., Lautrup, B., Savoy, R., and Paulson,
O. B. (1997). Nonlinear versus Linear Models in
Functional Neuroimaging: Learning Curves and Gen-
eralization Crossover. IPMI ’97: Proceedings of the
BIOSIGNALS 2012 - International Conference on Bio-inspired Systems and Signal Processing
262