will be adapted decrementally and the boundary will
move until x
j
is out (α
i
≤ 0). The matrix R is up-
dated by deleting from the matrix Q the column j+1
and the line j + 1 (corresponding to x
j
that has been
removed). When a pixel x
i
is removed from D, only
g
i
is removed from G.
Decremental Unlearning Algorithm. When remov-
ing the pixel x
r
from C, the parameters
α
s−1
,ρ
s−1
are expressed in terms of the parameters
{
α
s
,ρ
s
}
, the
matrix R, and x
r
as:
If g
r
> 0, (x
r
∈ D) remove x
r
from C, G ← G −
{
g
r
}
, terminate.
If g
r
= 0, remove x
r
from S (and thus from C),
While α
r
> 0, do α
r
= α
r
− ∆α
Calculate ∆ρ (Eq.8), ρ = ρ − ∆ρ
for each x
j
∈ S,
calculate ∆α
j
(Eq.8), α
j
= α
j
− ∆α
j
for each x
i
∈ C,
calculate ∆g
i
(Eq.10), g
i
= g
i
− ∆g
i
Check if an inside vector x
i
∈ D (or several)
passes outside the boundary (g
i
≤ 0 ). If true, in-
terrupt the decremental unlearning, and apply the
incremental learning on x
i
.
Return to the decremental unlearning pro-
cedure.
Repeat as necessary (until α
r
= 0).
4 EXPERIMENTS
At first, we performed experiments on video se-
quences collected in our laboratory by a Philips
SPC900NC/00 web-cam (settings frame rate = 30fps,
image size 160x120 pixels). Each sequence is 600
frames long (20 seconds). The camera was mounted
on a laptop and volunteers were asked to sit down
in front of the laptop and perform free head motion
while we greatly vary the lighting, passing through
a very dark to a very enlightened state. We first
applied the tracking method using thresholds. This
method works quite well under constant lighting, but
fails when the lighting varies significantly. We then
applied the method using the incremental classifica-
tion. We fixed the kernel parameter σ = 5 and the
threshold angle θ
sim
= 1rad after applying several
experimentations. The unlearning procedure is
started at the 5
th
frame, i.e. when the system process
the N
th
frame of the video sequence, it unlearns the
pixels learned from the (N − 5)
th
frame. This value
proved to be efficient for a reliable on-line tracking
of skin-pixels cluster. The obtained results were very
encouraging, since the face was accurately tracked
on all the video sequences, except in the frames were
the face was in profile (because of the conditions to
find the two eyes and the mouth).
Secondly, we performed experiments on the
set of sequences collected and used by (La Cascia
et al., 2000). The set consists of 27 sequences (nine
sequences for each of three subjects) taken under
time varying illumination and where the subjects
perform free head motion. The time varying illu-
mination has a uniform component and a sinusoidal
directional component. It should be noted that the
time varying illumination is done in a non-linear
manner, by darkening the scene and specially the
right side, making the right side of the face extremely
dark. In addition, the free head motion is performed
such that the face is never completely in profile. All
the sequences are 200 frames long (approximatively
seven seconds), and were taken such that the first
frame is not always at the maximum of the illumina-
tion. The video signal was digitized at 30 frames per
second at a resolution of 320x240 non-interleaved
using the standard SGI O2 video input hardware
and then saved as Quicktime movies (M-JPEG com-
pressed). All of these sequences are available on-line:
http://www.cs.bu.edu/groups/ivc/HeadTracking/,
(and are the only available among those used in the
articles cited in the introduction). Figure 5 shows
examples of images of the three subjects from the
video sequences, showing time varying illumination
and free head motion.
Figure 6 shows the mean values of the manually
extracted skin-pixels on the 200 frames of a video
sequence, in the RGB colour space and in the THS
colour space. We can see that while there is a great
variance in the RGB, the THS is less sensitive to
lighting variation. As defined, we see that the Texture
and the Hue component have smooth values and are
quite constant through lighting variation. In addition,
the Saturation component is more dependent on great
lighting variation. In this case, it becomes clear that
the threshold method cannot obtain good results.
Thus, an adaptive classification method is needed to
track the skin-pixels cluster through the time varying
illumination. We cannot objectively compare our
results to those of (La Cascia et al., 2000), because
their system uses a texture mapped 3D rigid surface
model for the head. In addition, the output of their
system is the 3D head parameters and a 2D dynamic
texture map image. We just note that a version of the
tracker that used a planar model was unable to track
the whole sequence without losing track.
To evaluate our tracker, we first linked the nine
ON-LINE FACE TRACKING UNDER LARGE LIGHTING CONDITION VARIATIONS USING INCREMENTAL
LEARNING
641