In this work the algorithm used was developed by
Dirk-Jan Kroon (Mathworks, 2011), it consists of a
previously trained classifier cascade with 22 stages of
classification and spanned the images 11 times at dif-
ferent scales. To overcome processing time require-
ments, only the first two frames of the video used the
original algorithm. The role of these two frames was
to define the scale at which the face and its initial po-
sition in the image. That information was then used in
the subsequent images where a lighter version of the
algorithm was applied. This version used only a sub-
window of the original image centred in the position
detected in the previous frame and spanned this sub-
window only in the scale defined in the initial frames,
the classifier cascade was also simplified using only
the first 5 stages. As a result of this simplification,
the algorithm performed a lot faster not without losing
some of its specificity generating lots of BBox around
the real face. To select the correct BBox, the image
was binarized using a threshold of 0.45 to 0.6 high-
lighting the darker pixels. The BBox with the more
dark pixels was the one used and the sub-window was
cropped and forced to become a 50x50 picture for fur-
ther analysis.
2.2 Eye Detection
To locate the eyes in the previously detected face, two
methods were studied: Between the Eyes (BTE) and
Elliptical Gabor Filter (EGF).
2.2.1 Between-the-Eyes (BTE)
This algorithm was developed having as reference the
work done by Peng et al. (Peng, 2005). This algo-
rithm takes advantage of the unique illumination pro-
file of a face and extracts the point that lies in between
of the eyes. It works in two fundamental steps. The
first consists in computing the vertical gradient im-
age of the face to detect the increased shadow present
in the transition between the forehead and the eye-
brow region, to do this a horizontal cumulative sum
is performed and the line of pixels with higher value
is considered as the horizontal coordinate of the BTE
point. A sub-window of the original greyscale image,
centred in this line is then extracted and eroded by a
circular operator with 4 pixels in diameter. Vertical
cumulative sum is performed and again the column
with higher value is considered the vertical coordinate
of the point.
2.2.2 Gabor Elliptical Filter (GEF)
The Gabor filter used to detect the eyes was based on
the work done by Yanfang Zhang, it consists in the
use of an elliptical Gabor filter that is a 2-D band pass
filter generated by a Gaussian function modulated by
a sine function. The equation and pictures of the filter
are also presented in his work (Zhang, 2005). In this
work, the parameters used to compute the filter were
σ
x
= 15, σ
y
= 15, F=64, θ = 0 and the filter size was
15x15. Here σ
x
and σ
y
stand for variances or scale
factors along x and y axis, F for spatial central fre-
quency and θ for the rotation angle of the filter. The
real part of the filter was selected and normalized for
further use. The computed filter was then convolved
with the gray-scale face image to highlight the eyes
position. The eye region appeared in the image as
holes, therefore to select the eye regions the picture
was binarized highlighting the darkest regions. The
centroid position of these regions was then used to
mark the position of the eyes.
3 TEST SETUP
In order to compare the effectiveness of the devel-
oped algorithms, a video database was created. This
database encompassed 6 types of videos of 23 differ-
ent subjects performing different head movements on
each video. The protocol for each video is presented
in Table 1. The subjects used for this database were
all voluntary members found in the investigation cen-
tre.
Table 1: Description of the head movement performed on
each video.
Video Subject’s movement description
I Upright position at rest
II Approach the camera and stray away
III Laterally tilt the head
IV Tilt the head
V Pan the head
VI Free head movement
The videos were acquired using a Logitech C210
USB webcam at 30 frames/s and with a length of 10s
with a resolution of 120x160. The recorded videos
were then processed in Simulink where each video
was run 6 times covering all possible combinations
of face detector and eye detectors. All videos were
processed in offline using a Intel Core 2 Duo CPU
E6750 @ 2.66 GHz computer (RAM 2GB and 64-bit
Operating System).
To evaluate the results in the terms of accuracy
a score scale with three steps was created: the score
”0” was attributed to algorithms that failed to detect
the correct face or eye position, ”1” was attributed
FACE AND EYE TRACKING FOR PARAMETERIZATION OF COCHLEAR IMPLANTS
431