In related research, Ogata et al. also extract multi-
modal dynamic features of objects, while a humanoid
robot interacts with them (Ogata et al., 2005). How-
ever, there are distinct differences. Despite using
fewer objects in total, the problem posed in our ex-
periments is considerably harder. Our toy bricks have
approximately the same circumference and identical
color. Furthermore, they exist in two weight classes
with an identical in-class weight that can only be dis-
criminated via multi-modal sensory information. We
provide classification results, compare the results to
other methods (MLP and SVC) and evaluate the noise
tolerance of the architecture. In addition, we only use
prototype time series for training (in contrast to using
all single trial time series) resulting in a reduced train-
ing time. Further, we demonstrate that, if the network
has already learned sensorimotor laws of certain ob-
jects, it is able to generalize and provide fairly accu-
rate sensory predictions for unseen ones (Fig. 5 right).
In conclusion, we present a promising framework
for object classification based on active perception on
a humanoid robot, rooted in neuroscientific and philo-
sophical hypotheses.
5.1 Future Work
There are several potential applications of the pre-
sented model. As shown in Fig. 8 and 9 the network
tolerates noise very well. This fact can be used for
sensor de-noising. Despite receiving a noisy sensory
signal, the robot will still be able to determine the PB
values of the class representative based on the Eu-
clidean distance. In turn, these values can be used
to operate the RNNPB in retrieval mode (section 2.3)
generating the noise-free sensory signal previously
stored, which then can be processed further. It is also
conceivable, that the network is used for sensory (sen-
sorimotor) imagery. Due to the powerful generaliza-
tion capabilities of the network not only the trained
sensory perceptions can be recalled, but interpolated
’feelings’ can be generated (Fig. 5 right).
ACKNOWLEDGEMENTS
This work was supported by the Sino-German Re-
search Training Group CINACS, DFG GRK 1247/1
and 1247/2, and by the EU projects KSERA under
2010-248085. We thank R. Cuijpers and C. Weber
for inspiring and very helpful discussions, S. Hein-
rich, D. Jessen and N. Navarro for assistance with the
robot.
REFERENCES
Aloimonos, J., Weiss, I., and Bandyopadhyay, A. (1988).
Active vision. International Journal of Computer Vi-
sion, 1:333–356.
Bajcsy, R. (1988). Active perception. Proceedings of the
IEEE, 76(8):966 –1005.
Ballard, D. H. (1991). Animate vision. Artificial Intelli-
gence, 48(1):57–86.
Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Jour-
nal of Software Tools.
Bridgeman, B. and Tseng, P. (2011). Embodied cogni-
tion and the perception-action link. Phys Life Rev,
8(1):73–85.
Burt, P. (1988). Smart sensing within a pyramid vision ma-
chine. Proceedings of the IEEE, 76(8):1006 –1015.
Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library
for support vector machines. ACM Transactions on
Intelligent Systems and Technology, 2:27:1–27:27.
Cuijpers, R. H., Stuijt, F., and Sprinkhuizen-Kuyper, I. G.
(2009). Generalisation of action sequences in RN-
NPB networks with mirror properties. In Proceedings
of the 17th European symposium on Artifical Neural
Networks (ESANN), pages 251–256.
Dewey, J. (1896). The reflex arc concept in psychology.
Psychological Review, 3:357–370.
Fitzpatrick, P. and Metta, G. (2003). Grounding vision
through experimental manipulation. Philosophical
Transactions of the Royal Society of London. Series
A: Mathematical, Physical and Engineering Sciences,
361(1811):2165–2185.
Gibson, J. J. (1977). The theory of affordances. In Shaw,
R. and Bransford, J., editors, Perceiving, acting, and
knowing: Toward an ecological psychology, pages
67–82. Hillsdale, NJ: Erlbaum.
Held, R., Ostrovsky, Y., Degelder, B., Gandhi, T., Ganesh,
S., Mathur, U., and Sinha, P. (2011). The newly
sighted fail to match seen with felt. Nat Neurosci,
14(5):551–3.
Hu, M.-K. (1962). Visual pattern recognition by moment
invariants. Information Theory, IRE Transactions on,
8(2):179 –187.
Kolen, J. F. and Kremer, S. C. (2001). A field guide to dy-
namical recurrent networks. IEEE Press, New York.
LeCun, Y., Bottou, L., Orr, G., and M
¨
uller, K. (1998). Ef-
ficient backprop. Lecture Notes in Computer Science,
1524:5–50.
Mart
´
ın H., J. A., Santos, M., and de Lope, J. (2010). Or-
thogonal variant moments features in image analysis.
Inf. Sci., 180:846–860.
Merleau-Ponty, M. (1963). The structure of behavior. Bea-
con Press, Boston.
Ogata, T., Ohba, H., Tani, J., Komatani, K., and Okuno,
H. G. (2005). Extracting multi-modal dynamics of ob-
jects using RNNPB. Proc. IEEE/RSJ Int. Conf. on In-
telligent Robots and Systems, Edmonton, pages 160–
165.
Olsson, L. A., Nehaniv, C. L., and Polani, D. (2006). From
unknown sensors and actuators to actions grounded
in sensorimotor perceptions. Connection Science,
18(2):121–144.
ICAART 2012 - International Conference on Agents and Artificial Intelligence
72