Girshick, R. (2015). Fast R-CNN. In Proc. IEEE Interna-
tional Conference on Computer Vision, pages 1440–
1448.
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014).
Rich feature hierarchies for accurate object detection
and semantic segmentation. In Proc. IEEE Conference
on Computer Vision and Pattern Recognition, pages
580–587.
Guillemaut, J.-Y. and Hilton, A. (2011). Joint multi-layer
segmentation and reconstruction for free-viewpoint
video applications. International Journal of Computer
Vision, 93(1):73–100.
Hartley, R. and Zisserman, A. (2003). Multiple view geom-
etry in computer vision. Cambridge university press.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask R-CNN. In Proc. IEEE International Confer-
ence on Computer Vision, pages 2961–2969.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Spatial
pyramid pooling in deep convolutional networks for
visual recognition. IEEE trans. Pattern Analysis and
Machine Intelligence, 37(9):1904–1916.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proc. IEEE
Conference on Computer Vision and Pattern Recog-
nition, pages 770–778.
Hong, Z., Mei, X., Prokhorov, D., and Tao, D. (2013).
Tracking via robust multi-task multi-view joint sparse
representation. In Proc. IEEE International Confer-
ence on Computer Vision, pages 649–656.
Hosang, J., Benenson, R., Doll
´
ar, P., and Schiele, B.
(2016). What makes for effective detection propos-
als? IEEE trans. Pattern Analysis and Machine Intel-
ligence, 38(4):814–830.
Jobson, D. J., Rahman, Z.-u., and Woodell, G. A. (1997). A
multiscale retinex for bridging the gap between color
images and the human observation of scenes. IEEE
Trans. on Image processing, 6(7):965–976.
Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., and Quillen,
D. (2018). Learning hand-eye coordination for robotic
grasping with deep learning and large-scale data col-
lection. The International Journal of Robotics Re-
search, 37(4-5):421–436.
Li, Y., Wang, S., Tian, Q., and Ding, X. (2015). Feature
representation for statistical-learning-based object de-
tection: A review. Pattern Recognition, 48(11):3542–
3559.
Liao, S., Hu, Y., Zhu, X., and Li, S. Z. (2015). Person re-
identification by local maximal occurrence represen-
tation and metric learning. In Proc. IEEE Conference
on Computer Vision and Pattern Recognition, pages
2197–2206.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan,
B., and Belongie, S. (2017). Feature pyramid net-
works for object detection. In Proc. IEEE Conference
on Computer Vision and Pattern Recognition, pages
2117–2125.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Euro-
pean Conference on Computer Vision, pages 740–755.
Springer.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-
volutional networks for semantic segmentation. In
Proc. IEEE Conference on Computer Vision and Pat-
tern Recognition, pages 3431–3440.
L
´
opez-Cifuentes, A., Escudero-Vi
˜
nolo, M., Besc
´
os, J.,
and Carballeira, P. (2018). Semantic driven
multi-camera pedestrian detection. arXiv preprint
arXiv:1812.10779.
Luo, W., Xing, J., Milan, A., Zhang, X., Liu, W., Zhao, X.,
and Kim, T.-K. (2014). Multiple object tracking: A
literature review. arXiv preprint arXiv:1409.7618.
Morioka, K., Mao, X., and Hashimoto, H. (2006). Global
color model based object matching in the multi-
camera environment. In IEEE/RSJ International Con-
ference on Intelligent Robots and Systems, pages
2644–2649. IEEE.
Neelima, C., Harsh, A., Aroma, M., and Dhruv, B. (2015).
Object-proposal evaluation protocol is ’gameable’.
CoRR, abs/1505.05836.
Ozuysal, M., Lepetit, V., and Fua, P. (2009). Pose estima-
tion for category specific multiview object localiza-
tion. In Proc. IEEE Conference on Computer Vision
and Pattern Recognition, pages 778–785. IEEE.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-
CNN: Towards real-time object detection with region
proposal networks. In Advances in Neural Informa-
tion Processing Systems, pages 91–99.
Ristani, E. and Tomasi, C. (2018). Features for multi-
target multi-camera tracking and re-identification. In
Proc. IEEE Conference on Computer Vision and Pat-
tern Recognition, pages 6036–6046.
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus,
R., and LeCun, Y. (2013). Overfeat: Integrated recog-
nition, localization and detection using convolutional
networks. arXiv preprint arXiv:1312.6229.
Szegedy, C., Reed, S., Erhan, D., Anguelov, D., and Ioffe, S.
(2014). Scalable, high-quality object detection. arXiv
preprint arXiv:1412.1441.
Xiao, T., Li, S., Wang, B., Lin, L., and Wang, X. (2017).
Joint detection and identification feature learning for
person search. In Proc. IEEE Conference on Com-
puter Vision and Pattern Recognition, pages 3415–
3424.
Xu, Y., Liu, X., Liu, Y., and Zhu, S.-C. (2016). Multi-view
people tracking via hierarchical trajectory composi-
tion. In Proc. IEEE Conference on Computer Vision
and Pattern Recognition, pages 4256–4265.
Zhao, Z., Zheng, P., Xu, S., and Wu, X. (2018). Ob-
ject detection with deep learning: A review. CoRR,
abs/1807.05511.
A New Approach Combining Trained Single-view Networks with Multi-view Constraints for Robust Multi-view Object Detection and
Labelling
461