estimation. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
Bourdev, L. and Malik, J. (2009). Poselets: Body part de-
tectors trained using 3d human pose annotations. In
Computer Vision, 2009 IEEE 12th International Con-
ference on, pages 1365–1372. IEEE.
Chen, S., Bremond, F., Nguyen, H., and Thomas, H. (2016).
Exploring depth information for head detection with
depth images. In Advanced Video and Signal Based
Surveillance (AVSS), 2016 13th IEEE International
Conference on, pages 228–234. IEEE.
Chollet, F. et al. (2015). Keras.
Cortes, C. and Vapnik, V. (1995). Support vector machine.
Machine learning, 20(3):273–297.
Dalal, N. and Triggs, B. (2005). Histograms of oriented gra-
dients for human detection. In Computer Vision and
Pattern Recognition, 2005. CVPR 2005. IEEE Com-
puter Society Conference on, volume 1, pages 886–
893. IEEE.
Fanelli, G., Weise, T., Gall, J., and Van Gool, L. (2011).
Real time head pose estimation from consumer depth
cameras. In Joint Pattern Recognition Symposium, pa-
ges 101–110. Springer.
Freund, Y. and Schapire, R. E. (1995). A desicion-theoretic
generalization of on-line learning and an application
to boosting. In European conference on computatio-
nal learning theory, pages 23–37. Springer.
Frigieri, E., Borghi, G., Vezzani, R., and Cucchiara, R.
(2017). Fast and accurate facial landmark localization
in depth images for in-car applications. In Procee-
dings of the 19th International Conference on Image
Analysis and Processing (ICIAP).
Ikemura, S. and Fujiyoshi, H. (2011). Real-time human
detection using relational depth similarity features.
Computer Vision–ACCV 2010, pages 25–38.
Khan, M. H., Shirahama, K., Farid, M. S., and Grzegorzek,
M. (2016). Multiple human detection in depth images.
In Multimedia Signal Processing (MMSP), 2016 IEEE
18th International Workshop on, pages 1–6. IEEE.
Kingma, D. and Ba, J. (2014). Adam: A method for sto-
chastic optimization. arXiv preprint arXiv:1412.6980.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
Imagenet classification with deep convolutional neu-
ral networks. In Advances in neural information pro-
cessing systems, pages 1097–1105.
Levi, K. and Weiss, Y. (2004). Learning object detection
from a small number of examples: the importance of
good features. In Computer Vision and Pattern Re-
cognition, 2004. CVPR 2004. Proceedings of the 2004
IEEE Computer Society Conference on, volume 2, pa-
ges II–II. IEEE.
Lowe, D. G. (1999). Object recognition from local scale-
invariant features. In Computer vision, 1999. The pro-
ceedings of the seventh IEEE international conference
on, volume 2, pages 1150–1157. Ieee.
Nghiem, A. T., Auvinet, E., and Meunier, J. (2012). Head
detection using kinect camera and its application to
fall detection. In Information Science, Signal Proces-
sing and their Applications (ISSPA), 2012 11th Inter-
national Conference on, pages 164–169. IEEE.
Osuna, E., Freund, R., and Girosit, F. (1997). Training
support vector machines: an application to face de-
tection. In Computer vision and pattern recognition,
1997. Proceedings., 1997 IEEE computer society con-
ference on, pages 130–136. IEEE.
Rowley, H. A., Baluja, S., and Kanade, T. (1998). Neural
network-based face detection. IEEE Transactions on
pattern analysis and machine intelligence, 20(1):23–
38.
Sarbolandi, H., Lefloch, D., and Kolb, A. (2015). Kinect
range sensing: Structured-light versus time-of-flight
kinect. Computer vision and image understanding,
139:1–20.
Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Fi-
nocchio, M., Blake, A., Cook, M., and Moore, R.
(2013). Real-time human pose recognition in parts
from single depth images. Communications of the
ACM, 56(1):116–124.
Theano Development Team (2016). Theano: A Python fra-
mework for fast computation of mathematical expres-
sions. arXiv e-prints, abs/1605.02688.
Venturelli, M., Borghi, G., Vezzani, R., and Cucchiara, R.
(2016). Deep head pose estimation from depth data
for in-car automotive applications. In Proceedings
of the 2nd International Workshop on Understanding
Human Activities through 3D Sensors, ICPR works-
hop.
Venturelli, M., Borghi, G., Vezzani, R., and Cucchiara, R.
(2017). From depth data to head pose estimation: a
siamese approach. In Proceedings of the 12th Interna-
tional Joint Conference on Computer Vision, Imaging
and Computer Graphics Theory and Applications (VI-
SAPP).
Viola, P. and Jones, M. J. (2004). Robust real-time face
detection. International journal of computer vision,
57(2):137–154.
Vu, T.-H., Osokin, A., and Laptev, I. (2015). Context-aware
cnns for person head detection. In Proceedings of the
IEEE International Conference on Computer Vision,
pages 2893–2901.
Wu, B. and Nevatia, R. (2005). Detection of multiple, par-
tially occluded humans in a single image by bayesian
combination of edgelet part detectors. In Computer
Vision, 2005. ICCV 2005. Tenth IEEE International
Conference on, volume 1, pages 90–97. IEEE.
Wu, C., Zhang, J., Savarese, S., and Saxena, A. (2015).
Watch-n-patch: Unsupervised understanding of acti-
ons and relations. In The IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR).
Xia, L., Chen, C.-C., and Aggarwal, J. K. (2011). Hu-
man detection using depth information by kinect. In
Computer Vision and Pattern Recognition Workshops
(CVPRW), 2011 IEEE Computer Society Conference
on, pages 15–22. IEEE.
Zhu, X. and Ramanan, D. (2012). Face detection, pose es-
timation, and landmark localization in the wild. In
Computer Vision and Pattern Recognition (CVPR),
2012 IEEE Conference on, pages 2879–2886. IEEE.
Head Detection with Depth Images in the Wild
63