Authors:
Oliver Lorenz
and
Ulrike Thomas
Affiliation:
Professorship of Robotics and Human-Machine-Interaction, Chemnitz University of Technology, Reichenhainer Str. 70, Chemnitz and Germany
Keyword(s):
Eye Gaze Tracking, Human-robot Interaction, Facial Features, Head Pose, Face Detection, Human Attention.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Applications and Services
;
Computer Vision, Visualization and Computer Graphics
;
Enterprise Information Systems
;
Features Extraction
;
Human and Computer Interaction
;
Human-Computer Interaction
;
Image and Video Analysis
;
Motion, Tracking and Stereo Vision
;
Pattern Recognition
;
Robotics
;
Software Engineering
;
Tracking and Visual Navigation
;
Visual Attention and Image Saliency
Abstract:
Understanding human attentions in various interactive scenarios is an important task for human-robot collaboration. Human communication with robots includes intuitive nonverbal behaviour body postures and gestures. Multiple communication channels can be used to obtain a understandable interaction between humans and robots. Usually, humans communicate in the direction of eye gaze and head orientation. In this paper, a new tracking system based on two cascaded CNNs is presented for eye gaze and head orientation tracking and enables robots to measure the willingness of humans to interact via eye contacts and eye gaze orientations. Based on the two consecutively cascaded CNNs, facial features are recognised, at first in the face and then in the regions of eyes. These features are detected by a geometrical method and deliver the orientation of the head to determine eye gaze direction. Our method allows to distinguish between front faces and side faces. With a consecutive approach for each
condition, the eye gaze is also detected under extreme situations. The applied CNNs have been trained by many different datasets and annotations, thereby the reliability and accuracy of the here introduced tracking system is improved and outperforms previous detection algorithm. Our system is applied on commonly used RGB-D images and implemented on a GPU to achieve real time performance. The evaluation shows that our approach operates accurately in challenging dynamic environments.
(More)