Biederman, I. (1987). Recognition-by-components: a the-
ory of human image understanding. Psychological re-
view, 94(2):115.
Collet Romea, A., Martinez Torres, M., and Srinivasa, S.
(2011). The moped framework: Object recognition
and pose estimation for manipulation. International
Journal of Robotics Research, 30(10):1284 – 1306.
Collet Romea, A. and Srinivasa, S. (2010). Efficient multi-
view object recognition and full pose estimation. In
2010 IEEE International Conference on Robotics and
Automation (ICRA 2010).
D’Apuzzo, N. (2006). Overview of 3d surface digitization
technologies in europe. In Electronic Imaging 2006,
pages 605605–605605. International Society for Op-
tics and Photonics.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn,
J., and Zisserman, A. (2010). The pascal visual ob-
ject classes (voc) challenge. International Journal of
Computer Vision, 88(2):303–338.
Fei-Fei, L. (2006). Knowledge transfer in learning to recog-
nize visual objects classes. In Proceedings of the Fifth
International Conference on Development and Learn-
ing.
Fei-Fei, L., Fergus, R., and Perona, P. (2006). One-
shot learning of object categories. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
28(4):594–611.
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014).
Rich feature hierarchies for accurate object detection
and semantic segmentation. In Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
Girshick, R. B. (2015). Fast R-CNN. CoRR,
abs/1504.08083.
Hao, Q., Cai, R., Li, Z., Zhang, L., Pang, Y., Wu, F., and
Rui, Y. (2013). Efficient 2d-to-3d correspondence fil-
tering for scalable 3d object recognition. In Computer
Vision and Pattern Recognition (CVPR), 2013 IEEE
Conference on, pages 899–906.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep
residual learning for image recognition. CoRR,
abs/1512.03385.
Irschara, A., Zach, C., Frahm, J.-M., and Bischof, H.
(2009). From structure-from-motion point clouds to
fast location recognition. In Computer Vision and
Pattern Recognition, 2009. CVPR 2009. IEEE Con-
ference on, pages 2599–2606.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
Imagenet classification with deep convolutional neu-
ral networks. In Pereira, F., Burges, C. J. C., Bottou,
L., and Weinberger, K. Q., editors, Advances in Neu-
ral Information Processing Systems 25, pages 1097–
1105. Curran Associates, Inc.
Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324.
Long, M. and Wang, J. (2015). Learning transferable
features with deep adaptation networks. CoRR,
abs/1502.02791, 1:2.
Lowe, D. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision, 60(2):91–110.
Peng, X., Sun, B., Ali, K., and Saenko, K. (2014). Explor-
ing invariances in deep convolutional neural networks
using synthetic images. CoRR, abs/1412.7122.
Ren, S., He, K., Girshick, R. B., and Sun, J. (2015). Faster
R-CNN: towards real-time object detection with re-
gion proposal networks. CoRR, abs/1506.01497.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh,
S., Ma, S., Huang, Z., Karpathy, A., Khosla, A.,
Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015).
ImageNet Large Scale Visual Recognition Challenge.
International Journal of Computer Vision (IJCV),
115(3):211–252.
Sarkar, K., Pagani, A., and Stricker, D. (2016). Feature-
augmented trained models for 6dof object recognition
and camera calibration. In Proceedings of the 11th
Joint Conference on Computer Vision, Imaging and
Computer Graphics Theory and Applications, pages
632–640.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
CoRR, abs/1409.1556.
Skrypnyk, I. and Lowe, D. (2004). Scene modelling, recog-
nition and tracking with invariant image features. In
Mixed and Augmented Reality, 2004. ISMAR 2004.
Third IEEE and ACM International Symposium on,
pages 110–119.
Snavely, N., Seitz, S. M., and Szeliski, R. (2006). Photo
tourism: Exploring photo collections in 3d. ACM
Trans. Graph., 25(3):835–846.
Snavely, N., Seitz, S. M., and Szeliski, R. (2008). Model-
ing the world from internet photo collections. Int. J.
Comput. Vision, 80(2):189–210.
Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. G.
(2015a). Multi-view convolutional neural networks
for 3d shape recognition. In Proc. ICCV.
Su, H., Qi, C. R., Li, Y., and Guibas, L. J. (2015b). Render
for cnn: Viewpoint estimation in images using cnns
trained with rendered 3d model views. In The IEEE
International Conference on Computer Vision (ICCV).
Sun, B., Feng, J., and Saenko, K. (2015). Return of frustrat-
ingly easy domain adaptation. CoRR, abs/1511.05547.
Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., and Darrell,
T. (2014). Deep domain confusion: Maximizing for
domain invariance. CoRR, abs/1412.3474.
VTK. Visualization toolkit (vtk), http://www.vtk.org/.
Trained 3D Models for CNN based Object Recognition
137