thermore, view planning can reduce the number of
active steps needed to produce the final categorization
result. Finally, the overall computational complexity
of object categorization is significantly reduced by the
integration of various views so that we can hope to
see applications on rather slim computing platforms
like Nao in the near future. There certainly is an ex-
tremely high potential for this kind of hand-held, ac-
tive inspection of objects by a humanoid robot, in-
cluding human-robot interaction, home and service
robotics, edutainment, active inspection, and active
surveillance.
In terms of future, basic research in active ob-
ject categorization, this paper just touches a number
of interesting research issues. From a practical point
of view, more expensive humanoid robots may pro-
vide better grasping functionality so that the objects
might be grasped autonomously - avoiding the need
to hand them to the robot. This would also lead to a
wider variety of stable object poses, including the re-
quirement to treat many different view point hypothe-
ses, which cannot be done on a purely Bag of Words
based representation. In view planning, it might be
interesting to use Mutual Information instead of en-
tropy (as (Denzler and Brown, 2002) do), because this
could eliminate the need to mask out view points that
have already been visited before. With many object
categories and many pose hypotheses, view planning
might even be computationally unfeasible, so that one
might wish to resort to a random view selection strat-
egy to save overall computation time (see (de Croon
et al., 2009) for a comparative evaluation).
REFERENCES
Borotschnig, H., Paletta, L., Pranti, M., and Pinz, A. (2000).
Appearance based active object recognition. Image
and Vision Computing, 18:715–727.
Borotschnig, H., Paletta, L., Prantl, M., and Pinz, A. (1998).
Active object recognition in parametric eigenspace. In
British Machine Vision Conference.
Bosch, A., Zisserman, A., and Munoz, X. (2007). Image
classification using random forests and ferns. In In-
ternational Conference on Computer Vision.
Bustos, B., Kein, D., Saupe, D., Schreck, T., and Vranic,
D. (2005). Feature-based similarity search in 3d ob-
ject databases. ACM Computing Surveys (CSUR),
37(4):345–387.
de Croon, G.-E., Sprinkhuizen-Kuyper, I., and
E.O.Postoma (2009). Comparing active vision
models. Image and Vision Computing, 27:374–384.
Deinzer, F., Denzler, J., Derichs, C., and Niemann, H.
(2006). Integrated viewpoint fusion and viewpoint se-
lection for optimal object recognition. In British Ma-
chine Vision Conference.
Deinzer, F., Denzler, J., and Niemann, H. (2003). Viewpoint
selection - planning optimal sequences of views for
object recognition. In Computer Analysis of Images
and Patterns, pages 64–73. Springer Berlin / Heidel-
berg.
Denzler, J. and Brown, C. (2002). Information theoretic
sensor data selection for active object recognition and
state estimation. IEEE Trans. on Pattern Analysis and
Machine Intelligence, 24(2):145–157.
Dickinson, S., Leonardis, A., Schiele, B., and Tarr, M., ed-
itors (2009). Object Categorization. Cambridge Uni-
versity Press.
Leibe, B., Leonardis, A., and Schiele, B. (2004). Com-
bined object categorization and segmentation with an
implicit shape model. In European Conference on
Computer Vision Workshop on Statistical Learning in
Computer Vision.
Lowe, D. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision, 60(2):91–110.
Pinz, A. (2006). Object categorization. Foundations and
Trends in Computer Graphics and Vision, 1(4):255–
353.
Roy, S., Chaudhury, S., and Banerjee, S. (2004). Active
recognition through next view planning: A survey.
Pattern Recognition, 37:429 – 446.
Schiele, B. and Crowley, J. L. (1998). Transinformation for
active object recognition. In International Conference
on Computer Vision.
Sivic, J. and Zisserman, A. (2003). Video google: A text re-
trieval approach to object matching in videos. In Proc.
IEEE International Conference on Computer Vision
(ICCV), pages 1470–1477.
Zhang, S., Tian, Q., Hua, G., Huang, Q., and Li, S. (2009).
Descriptive visual words and visual phrases for image
applications. In Proc. ACM Int. Conf. on Multimedia,
pages 75–84.
ACTIVE OBJECT CATEGORIZATION ON A HUMANOID ROBOT
241