deep into convolutional nets. arXiv preprint
arXiv:1405.3531
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. and Fei-
Fei, L., 2009. Imagenet: A large-scale hierarchical
image database. In Computer Vision and Pattern
Recognition, 2009. CVPR 2009. IEEE Conference on
(pp. 248-255). IEEE.
Everingham, M., Van Gool, L., Williams, C.K., Winn, J.
and Zisserman, A., 2010. The pascal visual object
classes (voc) challenge. International journal of
computer vision, 88(2), pp.303-338.
Girshick, R., Donahue, J., Darrell, T. and Malik, J., 2014.
Rich feature hierarchies for accurate object detection
and semantic segmentation. In Proceedings of the
IEEE conference on computer vision and pattern
recognition (pp. 580-587).
Girshick, R., 2015. Fast r-cnn. In Proceedings of the IEEE
international conference on computer vision (pp.
1440-1448).
He, K., Zhang, X., Ren, S. and Sun, J., 2015. Spatial
pyramid pooling in deep convolutional networks for
visual recognition. IEEE transactions on pattern
analysis and machine intelligence, 37(9), pp.1904-
1916.
Krizhevsky, A., Sutskever, I. and Hinton, G.E., 2012.
Imagenet classification with deep convolutional neural
networks. In Advances in neural information
processing systems (pp. 1097-1105).
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.Y. and Berg, A.C., 2016. Ssd: Single shot
multibox detector. In European conference on
computer vision (pp. 21-37). Springer, Cham.
Massa, F., Marlet, R. and Aubry, M., 2016. Crafting a
multi-task CNN for viewpoint estimation. In BMVC
(pp. 1-10).
Oquab, M., Bottou, L., Laptev, I. and Sivic, J., 2014.
Learning and transferring mid-level image
representations using convolutional neural networks.
In Proceedings of the IEEE conference on computer
vision and pattern recognition (pp. 1717-1724).
Penedones, H., Collobert, R., Fleuret, F. and Grangier, D.,
2012. Improving object classification using pose
information (No. EPFL-REPORT-192574). Idiap.
Pepik, B., Stark, M., Gehler, P. and Schiele, B., 2012,
June. Teaching 3d geometry to deformable part
models. In Computer Vision and Pattern Recognition
(CVPR), 2012 IEEE Conference on(pp. 3362-3369).
IEEE.
Poirson, P., Ammirato, P., Fu, C.Y., Liu, W., Kosecka, J.
and Berg, A.C., 2016. Fast single shot detection and
pose estimation. In 3D Vision (3DV), 2016 Fourth
International Conference on (pp. 676-684). IEEE.
Schwarz, M., Schulz, H. and Behnke, S., 2015. RGB-D
object recognition and pose estimation based on pre-
trained convolutional neural network features. In
Robotics and Automation (ICRA), 2015 IEEE
International Conference on (pp. 1329-1335). IEEE.
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus,
R. and LeCun, Y., 2013. Overfeat: Integrated
recognition, localization and detection using
convolutional networks. arXiv preprint arXiv:
1312.6229.
Sharif Razavian, A., Azizpour, H., Sullivan, J. and
Carlsson, S., 2014. CNN features off-the-shelf: an
astounding baseline for recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition workshops (pp. 806-813).
Shelhamer, E., Long, J. and Darrell, T., 2017. Fully
convolutional networks for semantic segmentation.
IEEE transactions on pattern analysis and machine
intelligence, 39(4), pp.640-651.
Simonyan, K. and Zisserman, A., 2014. Very deep
convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556.
Su, Y., Allan, M. and Jurie, F., 2010. Improving object
classification using semantic attributes. In BMVC (pp.
1-10).
Su, H., Qi, C.R., Li, Y. and Guibas, L.J., 2015. Render for
cnn: Viewpoint estimation in images using cnns
trained with rendered 3d model views. In Proceedings
of the IEEE International Conference on Computer
Vision (pp. 2686-2694).
Tulsiani, S. and Malik, J., 2015. Viewpoints and
keypoints. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (pp. 1510-
1519).
Van de Sande, K.E., Uijlings, J.R., Gevers, T. and
Smeulders, A.W., 2011. Segmentation as selective
search for object recognition. In Computer Vision
(ICCV), 2011 IEEE International Conference on (pp.
1879-1886). IEEE.
Vedaldi, A. and Lenc, K., 2015. Matconvnet:
Convolutional neural networks for matlab. In
Proceedings of the 23rd ACM international
conference on Multimedia (pp. 689-692). ACM.
Wu, J., Yu, Y., Huang, C. and Yu, K., 2015. Deep
multiple instance learning for image classification and
auto-annotation. In Proceedings of the IEEE
Conference on Computer Vision and Pattern
Recognition (pp. 3460-3469).
Xiang, Y., Mottaghi, R. and Savarese, S., 2014. Beyond
pascal: A benchmark for 3d object detection in the
wild. In Applications of Computer Vision (WACV),
2014 IEEE Winter Conference on(pp. 75-82). IEEE.
Yosinski, J., Clune, J., Bengio, Y. and Lipson, H., 2014.
How transferable are features in deep neural
networks?. In Advances in neural information
processing systems (pp. 3320-3328).
Zhang, H., El-Gaaly, T., Elgammal, A.M. and Jiang, Z.,
2013, July. Joint Object and Pose Recognition Using
Homeomorphic Manifold Analysis. In AAAI(Vol. 2, p.
5).
Zhang, H., El-Gaaly, T., Elgammal, A. and Jiang, Z.,
2015. Factorization of view-object manifolds for joint
object recognition and pose estimation. Computer
Vision and Image Understanding, 139, pp.89-103.
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
184