Caviar data, the head pose ground truth annotation
based tracker gives a median CLL improvement of
only 16.1% so there is very little room at the top.
However in both the datasets we achieve state-of-the-
art tracking performance.
6 CONCLUSION AND FUTURE
WORK
In this paper we presented a data-driven to low
resolution head pose estimation in the wild. We
achieved state-of-the-art results on two publicly avail-
able datasets. The model fine tuned on head pose re-
gression was able to achieve state-of-the-art perfor-
mance on intentional tracking.
REFERENCES
Caviar dataset. http://homepages.inf.ed.ac.uk/rbf/CAVIAR/.
Balasubramanian, V., Ye, J., and Panchanathan, S. (2007).
Biased manifold embedding: a framework for person-
independent head pose estimation. In Proceeding of
the IEEE Conference on Computer Vision and Pat-
ternRecognition, pages 1–7.
Baxter, R., Leach, M., Mukherjee, S., and Robertson, N.
(2015). An adaptive motion model for person tracking
with instantaneous head-pose features. Signal Pro-
cessing Letters, IEEE, 22(5):578–582.
Baxter, R. H., Leach, M., and Robertson, N. M. (2014).
Tracking with Intent. In Sensor Signal Prcoessing for
Defence.
BenAbdelkader, C. (2010). Robust head pose estimation
using supervised manifold learning. In Proceeding of
the 11th European Conference on Computer Vision,
pages 518–531.
Benfold, B. and Reid, I. (2008). Colour invariant head pose
classification in low resolution video. In Proceeding
of the British Machine Vision Conference.
Benfold, B. and Reid, I. (2011). Unsupervised learning of
a scene-specific coarse gaze estimator. In Computer
Vision (ICCV), 2011 IEEE International Conference
on, pages 2344–2351.
Blanz, V. and Vetter, T. (1999). A morphable model for
the synthesis of 3d faces. In Proceedings of the 26th
Annual Conference on Computer Graphics and Inter-
active Techniques, SIGGRAPH ’99, pages 187–194,
New York, NY, USA. ACM Press/Addison-Wesley
Publishing Co.
Cheng, C. and Odobez, J. (2012). We are not contortionists:
Coupled adaptive learning for head and body orien-
tation estimation in surveillance video. In Computer
Vision and Pattern Recognition (CVPR), 2012 IEEE
Conference on, pages 1554–1551.
Fanelli, G., Dantone, M., Gall, J., Fossati, A., and Van Gool,
L. (2013). Random forests for real time 3d face anal-
ysis. Int. J. Comput. Vision, 101(3):437–458.
Gesierich, B., Bruzzo, A., Ottoboni, G., and Finos, L.
(2008). Human gaze behaviour during action execu-
tion and observation. Acta Psychologica, 128(2):324
– 330.
Gourier, N., Maisonnasse, J., Hall, D., and Crowley, J.
(2006). Head pose estimation on low resolution im-
ages. In Proceeding of the 1st International Evalua-
tion Conference on Classification of Events, Activities
and Relationships, pages 270–280.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Delving deep
into rectifiers: Surpassing human-level performance
on imagenet classification.
Henderson, J. M. and Hollingworth, A. (1999). High-
level scene perception. Annual Review of Psychology,
50(1):243–271. PMID: 10074679.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Ac-
celerating deep network training by reducing internal
covariate shift. CoRR, abs/1502.03167.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.,
Girshick, R., Guadarrama, S., and Darrell, T. (2014).
Caffe: Convolutional architecture for fast feature em-
bedding. arXiv preprint arXiv:1408.5093.
Krizhevsky, A. (2014). One weird trick for parallelizing
convolutional neural networks. CoRR, abs/1404.5997.
Langton, S., Honeyman, H., and Tessler, E. (2004). The
influence of head contour and nose angle on the per-
ception of eye-gaze direction. Perception & Psy-
chophysics, 66(5):752–771.
Robertson, N. and Reid, I. (2006). Estimating gaze direc-
tion from low-resolution faces in video. In Proceeding
of the 9th European Conference on Computer Vision,
2006, volume 3952/2006, pages 402–415.
Simonyan, K. and Zisserman, A. (2014). Very deep convo-
lutional networks for large-scale image recognition.
Stiefelhagen, R. (2004). Estimating head pose with neural
network-results on the pointing04 icpr workshop eval-
uation data. In Proceedings of the ICPR Workshop on
Visual Observation of Deictic Gestures.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-
novich, A. (2014). Going Deeper with Convolutions.
ArXiv e-prints.
Tosato, D., Spera, M., Cristani, M., and Murino, V.
(2013). Characterizing humans on riemannian man-
ifolds. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 35(8):1972–1984.
Watch Where You’re Going! - Pedestrian Tracking Via Head Pose
581