A SELF-CALIBRATING CHROMINANCE MODEL APPLIED TO
SKIN COLOR DETECTION
J. F. Lichtenauer, M. J. T. Reinders and E. A. Hendriks
Information & Communication Theory Group, Delft University of Technology, Mekelweg, Delft, The Netherlands
Keywords:
Adaptive color modelling, chrominance, chromatic color space, skin detection.
Abstract:
In case of the absence of a calibration procedure, or when there exists a color difference between direct and
ambient light, standard chrominance models are not completely brightness invariant. Therefore, they cannot
provide the best space for robust color modeling. Instead of using a fixed chrominance model, our method
estimates the actual dependency between color appearance and brightness. This is done by fitting a linear
function to a small set of color samples. In the resulting self-calibrated chromatic space, orthogonal to this line,
the color distribution is modeled as a 2D Gaussian distribution. The method is applied to skin detection, where
the face provides the initialization samples to detect the skin of hands and arms. A comparison with fixed
chrominance models shows an overall improvement and also an increased reliability of detection performance
in different environments.
1 INTRODUCTION
Color is an important property of many objects.
Therefore, it has been of great interest to researchers
in the field of image analysis since the introduction
of digital color images. However, the lack of con-
stancy of color between different lightings, camera
equipment and settings has challenged researchers
ever since. To reduce the problem of color varia-
tion between different scenarios, color can be mod-
eled adaptively. But even in the same scenario, color
can change due to changing lighting conditions. A
significant amount of research has been conducted to
find functions of color that are invariant to illumina-
tion change. The most prominent functions that are
commonly used are the chromatic color models. E.g.
normalized rgb, YUV, YCrCb, HSV or CIELAB. In
these color spaces, the brightness factor is isolated
from the chromatic representation of color. This fa-
cilitates adaptive color modeling that is invariant to
changes in illumination brightness.
However, brightness invariance is not guaranteed
in these models. All these chromatic color space
conversions are based on certain assumptions about
color appearance in RBG space. Normalized rgb,
HSV and CIELAB assume that black is represented
by [R, G,B] = [0, 0,0] and, as a result, all colors meet
each other at this point when their brightness is re-
duced. Contradictory, YUV and YCrCb assume that
a brightness change results in a change of color par-
allel to the diagonal of RGB space. Furthermore,
HSV assumes that gray (unsaturated) colors satisfy
R = G = B (correct white balance) and CIELAB even
needs a completely calibrated RGB (XYZ) space.
Violation of these assumptions, e.g. due to in-
correct white balance, non-ideal camera sensitivity or
settings or heterogeneous illumination, can severely
degrade performance of color analysis methods based
on these color spaces. This was shown by Mar-
tinkauppi et al. (Martinkauppi et al., 2003), who have
tested robustness of different skin detection methods
under large changes of conditions. To our knowledge,
the validity in real-world digital imaging situations of
the chrominance models that are often used for these
methods, and the consequences of violating the model
assumptions, have never been studied directly.
If the color spectrum of the illumination is homo-
geneous, the assumptions can be satisfied by a cali-
bration procedure. However, in many real-world ap-
plications, such a procedure is not desirable because
115
F. Lichtenauer J., J. T. Reinders M. and A. Hendriks E. (2007).
A SELF-CALIBRATING CHROMINANCE MODEL APPLIED TO SKIN COLOR DETECTION.
In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IFP/IA, pages 115-120
Copyright
c
SciTePress
it takes extra time and effort. It may not even be
possible, e.g. in case of diversity of unknown cam-
eras, non-expert users or processing of video previ-
ously recorded under non-ideal circumstances. Fur-
thermore, calibration may not be effective, because
of a difference between spectra of direct and ambient
light, resulting in correlation between chrominance
and brightness.
To avoid the shortcomings of chromatic models in
real-world scenarios, we don’t want to rely on con-
straints on white balance, origin offset or correlation
between brightness and chrominance. Therefore, we
take the principle of chromatic color representation
one step further, by presenting our Adaptive Chromi-
nance Space (ACS) method that adaptively finds a
linear function that minimizes correlation between
brightness and the chrominance representation of the
appearance of a specific object color.
Our application for this color model is hand ges-
ture recognition. Skin color is modeled from samples
obtained from face detection and applied to detect and
track a person’s hands. Furthermore, two models are
combined, fitted to samples from the left and right
part of the face, respectively. This increases robust-
ness to variation in illumination spectra from different
horizontal directions.
Section 2 contains a summary of related research
on skin color detection, section 3 describes our
method, for which experimental results are provided
in section 4. Our conclusions are given in section 5.
2 RELATED WORK
Many methods for skin color detection have been pro-
posed in the literature. Surveys can be found in (Yang
et al., 2002; Vezhnevets et al., 2003). Most methods
learn a general skin color distribution from a database
of images (Jones and Rehg, 2002; Phung et al., 2005;
Lee and Yoo, 2002), e.g. a selection from the internet.
This results in very general models that take into ac-
count variation of camera, camera settings and illumi-
nation. However, because the variation of skin color
appearance between different situations is so large,
these general models lack precision to distinguish be-
tween real skin and skin-colored objects. This greatly
restricts the reliability of skin segmentation, since the
false positive rate will be high when other colors in
the image are close to skin color.
To overcome this problem, some methods adapt to
the specific situation by learning the skin color distri-
bution from samples taken from the same video (Raja
et al., 1998; McKenna et al., 1999; Soriano et al.,
2000; Fritsch et al., 2002; Argyros and Lourakis,
(a)
(b)
0
100
200
0
100
200
0
50
100
150
200
250
G
R
B
Figure 1: RGB scatter plots of colored balls under office
lighting without calibration. The dotted line indicates the
diagonal of RGB space.
2004), often combined with a prior skin color distri-
bution learned from a database. The problem with
skin model adaptation is that it is difficult to obtain a
large and representative sample set of skin color from
a video automatically. Adaptive methods do not gen-
eralize well to other skin regions in the video if they
lack a (realistic) chrominance model.
3 COLOR DETECTION METHOD
Instead of learning a general skin color distribution
that generalizes too much, or relying on a chromi-
nance model based on rigid, non-realistic assump-
tions, we propose to use a general model of skin color
variation for not-calibrated camera’s with fixed set-
tings and non-changing illumination. We use an au-
tomatic procedure that fits this model to a small and
noisy sample set.
The general model of skin color variation is ex-
plained in paragraph 3.1. Paragraph 3.2 explains how
a similarity measure for skin color can be computed
using this model, and 3.3 describes how the model is
fitted to the sample set.
3.1 Skin Color Appearance Model
We define x as a point at a specific image pixel loca-
tion, corresponding to a point on the surface of an ob-
ject with object color ξ and θ as the spectrum of the il-
lumination source. Our appearance model of skin as-
sumes that the appearance in RGB space
~
C
RGB
(x,ξ,θ)
of color ξ is a linear function
~
(x,ξ, θ) of reflected
light intensity I(x) plus some independent random
zero-mean noise vector
~
η(x) (assuming Lambertian
Figure 2: Appearance model of skin color p(
~
C
RGB
|skin) ac-
cording to the multiplicative noise model, represented by an
isosurface.
reflection):
~
C
RGB
(x,ξ, θ) =
~
(x,ξ, θ) +
~
η(x), (1)
~
(x,ξ, θ) :=
~
C
RGB0
+ I(x)~c(ξ,θ). (2)
where~c(ξ,θ) is a color vector with unit intensity and
~
C
RGB0
is the (unknown) calibration point for black.
I(x) depends on both the light source intensity and
the surface tangent, of which the latter depends on
x in a non-deterministic way. For brief notation,
~
C
RGB
(x,ξ, θ) is denoted by
~
C
RGB
in the remainder of
this work, silently assuming a specific object color,
illumination and image pixel coordinates.
For fixed ξ and θ,
~
(x,ξ, θ) is a straight line in
RGB space. If the camera is not calibrated,
~
(x,ξ, θ)
does not have to pass through the origin. An exam-
ple image without calibration is shown in figure 1 (a).
This is an image of uniformly colored balls captured
by a webcam under office lighting. Although the im-
age quality looks acceptable, the color distributions of
the balls are not at all directed towards the origin, as
would have been the case if black was calibrated at
~
C
RGB0
= [0,0, 0]
T
.
For specific ξ and θ, the dimensionality of the ap-
pearance model can be reduced from three to two di-
mensions by projecting the RGB values
~
C
RGB
onto a
plane perpendicular to
~
(x,ξ, θ):
~
C
S
= [
ˆ
S
1
,
ˆ
S
2
]
T
~
C
RGB
. (3)
[
ˆ
S
1
,
ˆ
S
2
,
ˆ
S
3
] is the orthonormal basis of the adapted
color space, which is a rotated RGB coordinate sys-
tem with
ˆ
S
3
in the direction of
~
(x,ξ, θ), correspond-
ing to the luminance axis, and [
ˆ
S
1
,
ˆ
S
2
] spanning the
perpendicular plane. The latter can be seen as an
intensity-invariant chrominance space for one spe-
cific object color in a specific situation (camera, cal-
ibration, illumination, etc.), referred to as ’Adaptive
Chrominance Space’ (ACS). The distribution of an
object color in ACS will be modeled by a Gaussian
distribution with mean µ
S
and 2 ×2 covariance ma-
trix Σ
S
. The Mahalanobis distance D
S
to the closest
point on the line
~
(x,ξ, θ) is computed by
D
S
=
q
(
~
C
S
µ
S
)
T
Σ
1
S
(
~
C
S
µ
S
) (4)
Because of the Gaussian approximation perpendicular
to the luminance axis, the skin color model becomes
an infinite elliptic cylinder in RGB space. To save
computational load,
ˆ
S
1
and
ˆ
S
2
can be chosen in the
directions of the eigenvectors of Σ
S
and divided by
the square roots of the respective eigenvalues, leaving
the 2x2 identity matrix instead of Σ
S
.
3.2 Skin Likelihood
Computation of skin likelihood at image position x is
performed according to Bayes’ theorem:
p(skin|
~
C
RGB
) =
p(
~
C
RGB
|skin)p(skin)
p(
~
C
RGB
)
(5)
Where skin is the event that the image really contains
skin at the measured location. The prior probability
density of p(
~
C
RGB
) can be marginalized by
p(
~
C
RGB
) =
p(
~
C
RGB
|I = α)p(I = α)δ(αI)dα
= p(
~
C
RGB
|I = β)p(I = β), (6)
because p(
~
C
RGB
) is zero for α 6= I. β is equal to the
light intensity of (
~
C
RGB
), calculated by
I = (C
R
C
R0
) + (C
G
C
G0
) + (C
B
C
B0
)). (7)
Since the real value of black
~
C
RGB0
is unknown, we
approximate it by
e
C
R0
=
e
C
G0
=
e
C
B0
= min{C
R
(X) C
G
(X) C
B
(X)} (8)
where C
R
, C
G
, C
B
are the red, green and blue values,
respectively, of
~
C
RGB
and X is the total set of pix-
els available, measured in the specific situation. Here
we assume that the black origin is on the diagonal of
RGB space, which is a reasonable assumption con-
sidering the imaging processes of most digital cam-
eras. Unfortunately, in most situations the real value
of
~
C
RGB0
is negative and the lower color values are
clipped to 0, resulting in a large estimation error in
e
C
RGB0
.
The probability density of the distance to the RGB
diagonal at the brightness plane C
R
+C
G
+C
B
= I is
assumed constant and non-zero inside positive RGB
space, but zero outside (uniformly distributed satura-
tion). The integral of p(
~
C
RGB
|I = β) over all possi-
ble values of
~
C
RGB
for which I = β must be equal to
1. This set is a plane perpendicular to the diagonal
of RGB space, which is an equilateral triangle with
its corners at
~
C
RGB
= {[β, 0,0]
T
,[0, β,0]
T
,[0, 0,β]
T
]}.
Using the area A(β) of this triangle and assuming that
the prior probability of brightness p(I = β) is con-
stant,
p(
~
C
RGB
) 1/A(β) = 2/(
3β
2
). (9)
We choose to model uncertainty about the actual color
of skin ξ, but to neglect the additive noise
~
η(x). This
results in a conical shape with its tip at
~
C
RGB0
, cen-
tered around
~
(x,ξ, θ) corresponding to the mean ξ
and θ, shown in figure 2. In this case, deviation from
the line
~
(x,ξ, θ) is only due to variation of skin or il-
lumination color. The existing appearance model can
easily be modified by normalizing the Mahalanobis
distance of equation 4 by the brightness I, similar to
computing normalized rgb:
ˆ
D
S
(x) =
D
S
I
. (10)
When p(
~
C
RGB
|skin), in equation 5, is computed with
ˆ
D
S
instead of D
S
, the prior (equation 9) has to be nor-
malized by 1/I
2
, hence becomes a constant. The log
likelihood becomes:
ln{p(skin|
~
C
RGB
)}
(
~
C
S
µ
S
)
T
Σ
1
S
(
~
C
S
µ
S
)
I
2
. (11)
Note that, if
~
C
RGB0
= [0, 0,0]
T
(i.e. in case of correct
calibration), this approach is very similar to a skin
color model in Hue/Saturation space or two dimen-
sions of the normalized rgb space.
3.3 Automatic Model Fitting
The automatic Adaptive Chrominance Space adap-
tation procedure attempts to find the best fit of the
model to a sample set taken from a specific situation.
To obtain positive RGB skin samples for model fit-
ting, a face detection method could be used. Pixels
from the left and right side of the sample area are
modeled separately. Resulting in two separate ACS
models. This is because the left and right side of the
face are often illuminated with a different θ, causing
a different
~
(x,ξ, θ). By modeling both sides of the
face separately, the combined model can detect skin
illuminated by different light sources.
Skin samples for which at least one of the color
channels has an intensity of 0 or 255 are removed.
This is because these samples might have been pro-
jected from outside the RGB space, leading to a
skewed color representation. Note that modeling
these projected values is possible, as described in our
previous work (Lichtenauer et al., 2005), however, it
is beyond the scope of this work.
The fitting procedure is based on the assumption
that the main axis of the positive sample distribution
corresponds to
~
(x,ξ, θ). This is usually the case,
since the area of the face, used for sampling, contains
both dark and light skin values (e.g. due to the nose
and its curved shape). Even in case of very uniform
illumination, some shading will still be present. Oth-
erwise, one would not see any shape of the face other
than the eyes and mouth of a person. These small nu-
ances of skin color are enough to show a main axis
in the direction of intensity in RGB space. The main
axis is found by line fitting, using RANSAC (Fischler
and Bolles, 1981). However, the search of RANSAC
is constrained by assumptions about shadowed colors,
following (Lichtenauer et al., 2005). Pairs of points
for which one of the two can never be a shaded color
of the other, are discarded. The regular least square
fitting method does not work because of the inhomo-
geneous distribution of samples along the line. There-
fore, the best result of the RANSAC search is refined
by computing the means m
L
and m
H
of two subsets
of the inlier samples: First, the samples are projected
on the initially fitted line. m
L
is the mean of the sam-
ples with the lowest 0.1 quantile of projection values,
m
H
is computed from samples in the highest 0.1 quan-
tile. From the line between m
L
and m
H
,
ˆ
S
3
is derived
and two perpendicular vectors
ˆ
S
1
and
ˆ
S
2
can be de-
termined. After projecting the positive samples onto
the AC space spanned by [
ˆ
S
1
,
ˆ
S
2
], the mean and co-
variance of the distribution are estimated by a fixed
number of EM iterations, where the 0.9 best fitting
samples are used to compute the parameters of the
next iteration. The latter is to discard outliers.
To get a combined likelihood of the ACS model
for the left and the right side of the face, the two are
combined by taking the maximum of the two log-
likelihoods, computed with normalized inverse co-
variance matrices
b
Σ
1
S
:
b
Σ
1
S
=
Σ
1
S
q
|Σ
1
S
|
. (12)
The measured main axis of the skin color distri-
bution is not always the result from the appearance
model in equation 1. For example, when multiple
light sources are present, the transition from one light
source to the other could contain correlation between
I(x) and θ. In these situations, there is no guarantee
that the appearance model will continue in the same
direction, outside of the intensity range of the train-
ing samples. In order to be more robust against these
exceptions, also a more conservative Hybrid ACS
(HACS) method is considered. This model is split
Studio Window Glass Wall
Figure 3: Some images used for the experiments. The area
inside the white rectangle was used for modeling skin color.
in three parts: a middle part for I(m
L
) < I < I(m
H
),
identical to the normal ACS model, and a lower and
higher part, for I < I(m
L
) and I > I(m
H
), respectively,
that assume that
~
(x,ξ, θ) continues from the ends of
middle part into the directions corresponding to the
estimated
e
C
RGB0
from equation 8, but using the same
covariance as the middle part.
4 EXPERIMENT
We have evaluated our ACS method by applying it
to 10 different photos of different persons in differ-
ent environments (see figure 3). Most photos were
taken with a digital photo camera, except for ’Studio’,
which was captured with a firewire video camera.
This camera was calibrated to have
~
C
RGB0
= [0,0, 0]
T
and homogeneous illumination was used (i.e. ideal
circumstances for most color models). The photo
cameras used for the other photos automatically ad-
justed white balance and intensity for each photo. All
photos were filtered using a 3x3 median filter, to re-
duce noise and erroneous colors around sharp edges.
A rectangle was manually annotated at each face, to
extract the training color samples. These rectangles
are shown in the photos of figure 3. The left and right
sides of these rectangles were used for training the
left and right ACS models, respectively. To remove
remaining non-skin pixels from the training samples
(e.g. from eyes and mouth), the training is done in
two steps. After training the initial ACS models on all
respective training pixels, the pixels with the lowest
30% of likelihoods for the respective ACS model are
removed from the rectangle and a 3x3 morphological
closing is applied on the total remaining pixel mask.
The ACS models are re-trained using only the sam-
ples selected with this skin mask. For a fair compar-
ison, also the other color models are trained on these
filtered samples.
Training and testing of the skin color model was
done separately for each image, to evaluate the av-
erage performance of ACS over different circum-
stances. The true positive (TP) rate (number of pixels
correctly detected as skin divided by the total num-
Table 1: Area Under the Curve of the ROC for FP rate
0.1.
Image ACS HACS HS rg CrCb RGB
Studio .0992 .0994 .0993 .0994 .0994 .0994
Office .0935 .0890 .0834 .0814 .0833 .0772
Hallway .0553 .0547 .0526 .0544 .0907 .0978
Sunset .0633 .0676 .0523 .0596 .0749 .0701
Restaurant .0485 .0572 .0601 .0526 .0379 .0543
Window .0726 .0654 .0486 .0173 .0280 .0257
Living room .0778 .0557 .0157 .0250 .0599 .0647
Glass wall .0781 .0832 .0821 .0789 .0711 .0579
Table tennis .0963 .0961 .0950 .0953 .0953 .0918
Street .0604 .0726 .0817 .0656 .0845 .0855
average .0745 .0741 .0671 .0630 .0725 .0724
st.dev. .0169 .0160 .0245 .0259 .0227 .0217
worst result .0485 .0547 .0157 .0173 .0280 .0257
average ranking 3.2 2.6 4.0 4.5 3.4 3.3
ber of tested skin pixels) is computed only from the
bare skin of the hands and arms in the same image,
which was also manually annotated. The false posi-
tive (FP) rate (number of pixels wrongly detected as
skin divided by the total number of tested non-skin
pixels) was computed from all areas in the same photo
not containing any bare skin or hair. Note that, al-
though training and test samples were extracted from
the same image, they correspond to physically differ-
ent parts of the scene, subject to possibly different
illumination and captured with different parts of the
camera sensor. This results in a better separation be-
tween training and test set than when they only differ
in time.
Four different color spaces were compared to
ACS: Hue-Saturation (HS) space, normalized Red-
Green (rg) space, CrCb space and RGB space. His-
tograms of the face samples were used as the skin
likelihood models. For the first three color spaces, the
histogram sizes are 100×100 bins and for RGB space
the histogram was 100×100×100 bins. The reason
for using histograms is because histograms do not as-
sume any type of distribution. Because histograms
do not generalize as well as parameterized distribu-
tions, the histograms are all smoothed with a Gaus-
sian kernel of 21 bins in all dimensions, with standard
deviation 1. HACS and HSV color models and skin
detection for the image ’Window’ are shown in figure
4.
To get an objective and comparable measure-
ment of performance, the Area Underneath the Curve
(AUC) is computed for the Receiver Operation Char-
acteristic (ROC) curves of all methods and photos.
Only the part of the curve with small FP rate is rel-
evant, since too many false positives will make it im-
possible to separate hands from the rest of the image.
Therefore, only the AUC for FP rates between 0 and
0.1 is used for comparison. The results are shown in
table 1 (the closer to 0.1, the better). The ACS method
shows the best mean AUC, however, HACS has the
best average ranking over the test images. Further-
HACS model HACS skin
HSV model HSV skin
Figure 4: Results for the image ’Window’. The upper left
figure shows detection of the main axes of the left (black
dots, solid line) and right (blue dots, dashed line) side of
the face. The hulls in RGB space correspond to skin color
using a threshold that detects 75% of the training samples.
more, these results show that the relative performance
between all color models greatly depends on the sit-
uation. The low standard deviation of the HACS re-
sults indicates the high robustness of the method. Al-
though the other methods have higher results in some
specific images, they also have significantly lower re-
sults for other images. Surprisingly, the RGB model
outperforms the HS, rg and CrCb models, contrary
to what could be expected from modeling skin color
with prior knowledge about intensity invariance. The
violation of the assumptions of these models in ev-
eryday situations clearly has a negative effect on their
performance.
5 CONCLUSIONS & FUTURE
WORK
We have proposed an adaptive chrominance model
and automatic fitting procedure that can be used to
detect skin color more accurately than the compared
methods when no color space calibration is performed
and/or heterogeneous illumination is present. This
makes our (H)ACS model especially useful for real-
world applications. Besides a better overall skin de-
tection performance, HACS also showed a lower stan-
dard deviation between different situations, while the
other methods showed more unstable results.
Further improvements of the model are possible.
First of all, over- or under-saturated colors can be
accounted for and assumptions about the model ori-
entation and shape outside of the intensity range of
the training samples can be improved. Furthermore,
many improvements are possible on the prior prob-
ability model of background color. A histogram of
the complete image and/or an off-line image database
could be used to exclude colors with a high prior prob-
ability.
ACKNOWLEDGEMENTS
We we would like to thank the VSB fund and the
NSDSK (Dutch Foundation for the Deaf and Hard of
hearing Child) for making this research possible.
REFERENCES
Argyros, A. and Lourakis, M. (2004). Real-time tracking of
multiple skin-colored objects with a possibly moving
camera. In ECCV04, pages Vol III: 368–379.
Fischler, M. and Bolles, R. (1981). Random sample consen-
sus: A paradigm for model fitting with applications to
image analysis and automated cartography. Commu-
nications of the ACM, 24(6):381–395.
Fritsch, J., Lang, S., Kleinehagenbrock, M., Fink, G. A.,
and Sagerer, G. (2002). Improving adaptive skin color
segmentation by incorporating results from face de-
tection. In ROMAN, pages 337–343, Berlin, Germany.
Jones, M. J. and Rehg, J. M. (2002). Statistical color mod-
els with application to skin detection. International
Journal of Computer Vision, 46(1):81–96.
Lee, J. and Yoo, S. (2002). An elliptical boundary model
for skin color detection. In CISST’02.
Lichtenauer, J., Hendriks, E., and Reinders, M. (2005).
A shadow color subspace model for imperfect not-
calibrated cameras. In ASCI 05, Heijen, The Nether-
lands.
Martinkauppi, B., Soriano, M., and Pietikainen, M. (2003).
Detection of skin color under changing illumination:a
comparative study. In CIAP03, pages 652–657.
McKenna, S., Raja, Y., and Gong, S. (1999). Tracking
colour objects using adaptive mixture models. IVC,
17(3/4):225–231.
Phung, S., Bouzerdoum, A., and Chai, D. (2005). Skin
segmentation using color pixel classification: analy-
sis and comparison. PAMI, 27(1):148–154.
Raja, Y., McKenna, S., and Gong, S. (1998). Tracking and
segmenting people in varying lighting conditions us-
ing colour. In AFGR98, pages 228–233.
Soriano, M., Martinkauppi, B., Huovinen, S., and Laakso-
nen, M. (2000). Skin detection in video under chang-
ing illumination conditions. In ICPR00, pages Vol I:
839–842.
Vezhnevets, V., Sazonov, V., and Andreeva, A. (2003). A
survey on pixel-based skin color detection techniques.
In Proc. Graphicon-2003, pages 85–92.
Yang, M., Kriegman, D., and Ahuja, N. (2002). Detecting
faces in images: A survey. PAMI, 24(1):34–58.