adapt to new and changing environments. This is es-
pecially important for robots assisting humans in their
daily life. To achieve this a system needs to deal
with problems T1 - R2 as described in our introduc-
tion. Therefore we designed a two stage pipeline, fea-
turing fast, automatic and robust learning of objects
with minimal human intervention. In the first stage
(Object detection and extraction) the robot uses 3D
information from the RGB-D sensor to automatically
retrieve objects from cluttered scenes. Projecting all
object masks to a high-resolution camera, we were
able to provide the second stage of the recognition
system (Object recognition) with accurate and de-
tailed visual information.
We tested our recognition system in two scenarios:
First with 18 objects with varying poses, illumination
and distances in 42 scenes with partial occlusion and
second on the SDU-dataset with 56 objects in arbi-
trary poses. The former is made publicly available.
Comparing results of the SDU-Benchmark to our 42-
Scenes Benchmark, one can see that our benchmark
is more challenging, although the SDU-dataset uses
more objects. The reason is twofold: First, we did not
put any constrains on object pose, distance as well as
illumination, and second, we evaluate on a collection
of labeled and masked scenes which show occlusion,
making the recognition more difficult. In both bench-
marks our novel Radial orientation scheme achieved
better than state-of-the-art results. This is because
our orientation scheme leads to signatures which do
incorporate shape information in contrast to widely
used local gradient orientation schemes. Furthermore,
using a simple fusion of Gray-SIFT and our three
dimensional CyColor feature did not only speed up
the recognition pipeline (7 s for the full training in
our 42-Scenes Benchmark), but also boosts classifica-
tion accuracy for the SDU-dataset significantly. This
shows the value of absolute color information for ob-
ject recognition, especially for few training samples.
The combination of our Radial orientation scheme
with our CyColor features leads to an improvement
over the state-of-the-art on the SDU-dataset by +37%
to a total of 78% for only a single training view and
to 98% for 11 training views.
REFERENCES
Barla, A., Odone, F., and Verri, A. (2003). Histogram in-
tersection kernel for image classification. In Image
Processing(ICIP), 2003 International Conference on,
volume 3, pages III–513–16 vol.2.
Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008).
Speeded-up robust features (surf). Computer Vision
Image Understanding, 110(3):346–359.
Binder, A., Wojcikiewicz, W., M
¨
uller, C., and Kawanabe,
M. (2011). A hybrid supervised-unsupervised vocab-
ulary generation algorithm for visual concept recogni-
tion. In Proceedings of the 10th Asian conference on
Computer vision - Volume Part III, ACCV’10, pages
95–108, Berlin, Heidelberg. Springer-Verlag.
Bo, L., Ren, X., and Fox, D. (2011). Depth kernel descrip-
tors for object recognition. In Intelligent Robots and
Systems (IROS), 2011 IEEE/RSJ International Con-
ference on, pages 821–826.
Bosch, A., Zisserman, A., and Munoz, X. (2007a). Repre-
senting shape with a spatial pyramid kernel. In Pro-
ceedings of the 6th ACM international conference on
Image and video retrieval, CIVR ’07, pages 401–408,
New York, NY, USA. ACM.
Bosch, A., Zisserman, A., and Muoz, X. (2007b). Image
classification using random forests and ferns. In Com-
puter Vision (ICCV), 2007 IEEE 11th International
Conference on, pages 1–8.
Calonder, M., Lepetit, V., Strecha, C., and Fua, P. (2010).
Brief: binary robust independent elementary features.
In Proceedings of the 11th European conference on
Computer vision: Part IV, ECCV’10, pages 778–792,
Berlin, Heidelberg. Springer-Verlag.
Csurka, G., Dance, C. R., Fan, L., Willamowski, J., and
Bray, C. (2004). Visual categorization with bags of
keypoints. In Workshop on Statistical Learning in
Computer Vision, ECCV, pages 1–22.
Ekvall, S., Jensfelt, P., and Kragic, D. (2006). Integrating
active mobile robot object recognition and slam in nat-
ural environments. In Intelligent Robots and Systems
(IROS), 2006 IEEE/RSJ International Conference on,
pages 5792–5797.
Gall, J., Fossati, A., and Van Gool, L. (2011). Func-
tional categorization of objects using real-time mark-
erless motion capture. In Computer Vision and Pat-
tern Recognition (CVPR), 2011 IEEE Conference on,
pages 1969–1976.
Gehler, P. and Nowozin, S. (2009). On feature combina-
tion for multiclass object classification. In Computer
Vision (ICCV), 2009 IEEE 12th International Confer-
ence on, pages 221–228.
Hu, X., Zhang, X., Lu, C., Park, E. K., and Zhou, X. (2009).
Exploiting wikipedia as external knowledge for doc-
ument clustering. In Proceedings of the 15th ACM
SIGKDD international conference on Knowledge dis-
covery and data mining, KDD ’09, pages 389–396,
New York, NY, USA. ACM.
Iravani, P., Hall, P., Beale, D., Charron, C., and Hicks, Y.
(2011). Visual object classification by robots, using
on-line, self-supervised learning. In Computer Vision
Workshops (ICCV Workshops), 2011 IEEE Interna-
tional Conference on, pages 1092–1099.
Kasper, A., Xue, Z., and Dillmann, R. (2012). The KIT
object models database: An object model database
for object recognition, localization and manipulation
in service robotics. The International Journal of
Robotics Research (IHRR), 31(8):927–934.
Lai, K., Bo, L., Ren, X., and Fox, D. (2011). A large-
scale hierarchical multi-view rgb-d object dataset. In
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
102