By using the arguments outlined in the Introduc-
tion, the role of the multiple layers can be under-
stood as a form of higher-order decorrelation of the
input, which leads to more efficient representations
with increased sparseness and more representative
key-points.
ACKNOWLEDGEMENTS
This research is supported by the Graduate School for
Computing in Medicine and Life Sciences funded by
Germany’s Excellence Initiative [DFG GSC 235/1].
We thank the reviewers for their constructive com-
ments.
REFERENCES
Barth, E. and Zetzsche, C. (1998). Endstopped operators
based on iterated nonlinear center-surround inhibition.
In Human Vision and Electronic Imaging III, volume
3299 of Proc. SPIE, pages 67–78, Bellingham, WA.
Bengio, Y. (2009). Learning deep architectures for ai. Foun-
dations and trends
R
in Machine Learning, 2(1):1–
127.
Cires¸an, D. C., Meier, U., Gambardella, L. M., and Schmid-
huber, J. (2010). Deep, big, simple neural nets for
handwritten digit recognition. Neural computation,
22(12):3207–3220.
Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray,
C. (2004). Visual categorization with bags of key-
points. In Workshop on statistical learning in com-
puter vision, ECCV, volume 1, page 22.
Hinton, G. E. (2007). Learning multiple layers of represen-
tation. Trends in cognitive sciences, 11(10):428–434.
Indiveri, G., Linares-Barranco, B., Hamilton, T., van
Schaik, A., Etienne-Cummings, R., Delbruck, T., Liu,
S.-C., Dudek, P., H
¨
afliger, P., Renaud, S., Schem-
mel, J., Cauwenberghs, G., Arthur, J., Hynna, K.,
Folowosele, F., Saighi, S., Serrano-Gotarredona, T.,
Wijekoon, J., Wang, Y., and Boahen, K. (2011). Neu-
romorphic silicon neuron circuits. Frontiers in Neuro-
science, 5:1–23.
Lowe, D. G. (1999). Object recognition from local scale-
invariant features. In Computer vision, 1999. The pro-
ceedings of the seventh IEEE international conference
on, volume 2, pages 1150–1157. Ieee.
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A.,
Matas, J., Schaffalitzky, F., Kadir, T., and Van Gool,
L. (2005). A comparison of affine region detectors.
International journal of computer vision, 65(1-2):43–
72.
Mota, C. and Barth, E. (2000). On the uniqueness of curva-
ture features. In Dynamische Perzeption, volume 9 of
Proceedings in Artificial Intelligence, pages 175–178,
K
¨
oln. Infix Verlag.
Nowak, E., Jurie, F., and Triggs, B. (2006). Sampling strate-
gies for bag-of-features image classification. In Com-
puter Vision–ECCV 2006, pages 490–503. Springer.
Parkhi, O. M., Vedaldi, A., Zisserman, A., and Jawahar, C.
(2012). Cats and dogs. In Computer Vision and Pat-
tern Recognition (CVPR), 2012 IEEE Conference on,
pages 3498–3505. IEEE.
Vedaldi, A. and Fulkerson, B. (2010). Vlfeat: An open
and portable library of computer vision algorithms. In
Proceedings of the international conference on Multi-
media, pages 1469–1472. ACM.
Vig, E., Dorr, M., and Cox, D. (2012a). Saliency-based
selection of sparse descriptors for action recognition.
In Image Processing (ICIP), 2012 19th IEEE Interna-
tional Conference on, pages 1405 – 1408.
Vig, E., Dorr, M., Martinetz, T., and Barth, E. (2012b). In-
trinsic dimensionality predicts the saliency of natural
dynamic scenes. IEEE Transactions on Pattern Anal-
ysis and Machine Intelligence, 34(6):1080–1091.
Zetzsche, C. and Barth, E. (1990). Fundamental lim-
its of linear filters in the visual processing of two-
dimensional signals. Vision Research, 30:1111–1117.
Zetzsche, C. and Nuding, U. (2007). Nonlinear encoding in
multilayer LNL systems optimized for the represen-
tation of natural images. In Human Vision and Elec-
tronic Imaging XII, volume 6492 of Proc. SPIE, pages
649204–649204–22.
Key-pointDetectionwithMulti-layerCenter-surroundInhibition
393