change the nod angle by 6.57
◦
and/or the shake angle
by 6.81
◦
, leading to a horizontal error of 30cm and/or
a vertical error of 29cm.
Thus, if the target is a 80 × 60 cm box on a 240 ×
180 cm screen on the wall of a 10 × 6m room, a 1-
pixel error from 300 cm will miss the box, a 1-pixel
error from 450–500 cm will miss the screen, and a
1-pixel error from 750–800 cm will miss the wall.
6 CONCLUSIONS
We have shown that a minimal gaze prediction sys-
tem using only four points can make reasonably reli-
able predictions for subjects with average noses who
sit within 2m of the camera. This system is easily im-
plemented, requiring only four or five Haar cascades
(all of which are bundled with OpenCV). It is easy to
modify, or even replace, any of the landmark locators.
This simplicity comes at some cost. There are
places where we could use more data points, most ob-
viously where we have to make assumptions about the
anatomical proportions of the face.
What can be done for people with small or large
noses? We could add calibration to retune the sys-
tem for each new user; the cost is ease of use. Alter-
natively, the methods for locating nose tips and nose
bridges are reasonably reliable, and we could in prin-
ciple use similar methods to identify other landmarks
on the nose, giving us extra equations; the cost is
added complexity.
We would also like to be able to weight our cal-
culations so that, when the head is turned, we give
priority to the nearer eye. This would be particularly
useful in those cases where the head is turned and the
location of the more distant eye has not been deter-
mined correctly. With only four points, there is no
redundancy, and no opportunity to give some points
higher weightings than others.
Although it may appear counter-intuitive, gross
outliers are not usually a serious problem. In a video-
processing system in which landmarks are tracked
from one frame to the next, outliers can be caught and
discarded.
The most serious problem is that of small errors
becoming large errors with increasing distance from
the camera, as this imposes a limit on the distance at
which gaze prediction can be useful.
On this basis, we can assess the potential appli-
cations listed in Section 1. Interactive display boards
used from a distance of between 1–2m should cer-
tainly be possible. Multi-user interactive boards may
be restricted in the number of users, as it will be dif-
ficult to place them so that they are less than 2.5m
from the board but more than 2m from one another.
Sadly, gaze-controlled smart homes may not yet be
realistic, as even if the screen is placed on the centre
of the longer wall of a 5 × 3m living room, there will
be locations in the room which are out of range.
At present, it seems that the best workaround is
to improve the hardware: either buy a more expen-
sive camera with higher resolution, or (better still) use
multiple cameras.
The natural progression is from still images to
video sequences. Before we make this leap, we must
ensure that our system is ready for it.
ACKNOWLEDGEMENTS
The authors wish to acknowledge the project: “Set-
ting up of transdisciplinary research and knowledge
exchange (TRAKE) complex at the University of
Malta (ERDF.01.124)”, which is co-financed by the
European Union through the European Regional De-
velopment Fund 2014–2020.
REFERENCES
Asteriadis, S., Nikolaidis, N., Hajdu, A., and Pitas, I.
(2006). An eye detection algorithm using pixel to edge
information. In ICCVW.
Baltrusaitis, T., Robinson, P., and Morency, L.-P. (2016).
Openface: An open source facial behavior analysis
toolkit. In 2016 IEEE Winter Conference on Applica-
tions of Computer Vision (WACV), Lake Placid, NY,
pages 1–10.
Cao, X., Wei, Y., Wen, F., and Sun, J. (2012). Face align-
ment by explicit shape regression. In CVPR, pages
2887–2894.
Castrill
´
on, M., D
´
eniz, O., Hern
´
andez, M., and Guerra, C.
(2007). Encara2: Real-time detection of multiple
faces at different resolutions in video streams. In Jour-
nal of Visual Communication and Image Representa-
tion Vol 18 No 2, pages 130–140.
Cheng, Y., Zhang, X., Lu, F., and Sato, Y. (2020). Gaze
estimation by exploring two-eye asymmetry. In IEEE
Transactions on Image Processing (TIP), 29(1), pages
5259–5272.
Damen, I., Lallemand, C., Brankaert, R., Brombacher, A.,
van Wesemae, P., and Vos, S. (2020). Understanding
walking meetings: Drivers and barriers. In ACM Pro-
ceedings of CHI 2020.
Doll
´
ar, P., Welinder, P., and Perona, P. (2010). Cascaded
pose regression. In CVPR, pages 1078–1085.
GazeRecorder (2020). Gazerecorder webcam eye tracking.
https://gazerecorder.com/.
Hagihara, K., Taniguchi, K., Abibouraguimane, I., Itoh,
Y., Higuchi, K., Otsuka, J., Sugimoto, M., and Sato,
IMPROVE 2021 - International Conference on Image Processing and Vision Engineering
56