A New Robust Color Descriptor for Face Detection

Eyal Braunstain and Isak Gath

Faculty of Biomedical Engineering, Technion - Israel Institute of Technology, Haifa, Israel

Keywords:

Face detection, Object recognition, Descriptors, Color.

Abstract:

Most state-of-the-art approaches to object and face detection rely on intensity information and ignore color

information, as it usually exhibits variations due to illumination changes and shadows, and due to the lower

spatial resolution in color channels than in the intensity image. We propose a new color descriptor, derived

from a variant of Local Binary Patterns, designed to achieve invariance to monotonic changes in chroma. The

descriptor is produced by histograms of encoded color texture similarity measures of small radially-distributed

patches. As it is based on similarities of local patches, we expect the descriptor to exhibit a high degree of

invariance to local appearance and pose changes. We demonstrate empirically by simulation the invariance

of the descriptor to photometric variations, i.e. illumination changes and image noise, geometric variations,

i.e. face pose and camera viewpoint, and discriminative power in a face detection setting. Lastly, we show

that the contribution of the presented descriptor to face detection performance is signiﬁcant and superior to

several other color descriptors, which are in use for object detection. This color descriptor can be applied in

color-based object detection and recognition tasks.

1 INTRODUCTION

Most object and face detection algorithms rely on

intensity-based features and ignore color information.

This is usually due to its tendency to exhibit varia-

tions due to illumination changes and shadows (Khan

et al., 2012a), and also to the lower spatial resolution

in color channels than in the intensity image (e.g. the

works of (Viola and Jones, 2004; Mikolajczyk et al.,

2004; Zhang et al., 2007; Li et al., 2013)). Face detec-

tion performance by a human observer declines when

color information is removed from faces (Bindemann

and Burton, 2009). It has been argued that a detector

which is based solely on spatial information derived

from an intensity image, e.g. histograms of gradients,

may fail when the object exhibits changes in spatial

structure, e.g. pose, non-rigid motions, occlusions

etc. (Wei et al., 2007). Speciﬁcally, an image color

histogram is rotation and scale-invariant.

We hereby review the topic of color representa-

tions and descriptors for object detection. Color in-

formation has been successfully used for object detec-

tion and recognition (Khan et al., 2012a; Gevers and

Smeulders, 1997; Weijer and Schmid, 2006; Diplaros

et al., 2006; Khan et al., 2013; Van de Sande et al.,

2010; Wei et al., 2007; Khan et al., 2012b).

Color can be represented in various color spaces,

e.g. RGB, HSV and CIE-Lab, in which uniform

changes are perceived uniformly by a human observer

(Jain, 1989). Various color descriptors can be de-

signed. The color bins descriptor (Wei et al., 2007)

is composed of multiple 1-D color histograms by pro-

jecting colors on a set of 1-D lines in RGB space at

13 different directions. These histograms are concate-

nated to form the color bins features.

Two color descriptors were examined by (Wei-

jer and Schmid, 2006) for object detection, the Ro-

bust Hue descriptor, invariant with respect to the il-

luminant variations and lighting geometry variations

(assuming white illumination), and Opponent Angle

(OPP), invariant with respect to illuminant and dif-

fuse lighting (i.e. light coming from all directions).

The trade-off between photometric invariance and

discriminative power was examined in (Khan et al.,

2013), where an information theoretic approach to

color description for object recognition was proposed.

The gains of photometric invariance are weighted

against the loss in discriminative power. This is done

by formulation of an optimization problem with ob-

jective function based on KL-Divergence between vi-

sual words and color clusters.

Deformable Part Model (DPM) is used to model

objects using spring-like connections between object

parts (Felzenszwalb et al., 2010; Zhu and Ramanan,

2012). Although DPM achieves very good detec-

tion results, in particular through its ability to handle

Braunstain E. and Gath I..

A New Robust Color Descriptor for Face Detection.

DOI: 10.5220/0005177400130021

In Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM-2015), pages 13-21

ISBN: 978-989-758-077-2

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

challenging objects (e.g. deformations, view changes

and partially occluded objects), the general compu-

tational complexity of part-based methods is higher

than global feature-based methods (Bergtholdt et al.,

2010; Heisele et al., 2003).

The Three-Patch Local Binary Patterns (TPLBP)

(Wolf et al., 2008) is a robust variant of the Local Bi-

nary Patterns (LBP) descriptor (Ojala et al., 2002),

based on histograms of encoded similarity measures

of local intensity patches. This descriptor was exam-

ined for the face recognition task.

In the present work the focus is not on the de-

sign of a new face detection framework, but rather on

the design of a novel color descriptor, investigating its

possible contribution to face detection. We design a

new color descriptor, based on Three-Patch LBP. Our

descriptor is computed from histograms of encoded

similarities of small local patches of chroma chan-

nels in a compact form, utilizing the inter-correlation

between image chroma channels. Consequently, the

representation of color in an image window is global,

i.e. not part-based. We examine the descriptor by

ways of its robustness to photometric and geomet-

ric variations and discriminative power. We evaluate

the contribution of the descriptor in a face detection

setting, using the FDDB dataset (Jain and Learned-

Miller, 2010), and show that it exhibits signiﬁcant

contribution to detection rates.

The paper is organized as follows. In Section 2

the Three-Patch LBP (TPLBP) descriptor is described

brieﬂy, and a multi-scale variant is proposed; in sec-

tion 3 the new color descriptor is described; in section

4 invariance and discriminative power are evaluated,

compared to the Robust Hue and Opponent Angle de-

scriptors (Weijer and Schmid, 2006); in section 5 we

evaluate the color descriptor in a face detection set-

ting, and in section 6 conclusions to this work are pro-

vided.

2 THREE-PATCH LBP

DESCRIPTOR AND A MULTI

SCALE VARIANT

The Three-Patch LBP (Wolf et al., 2008) descrip-

tor was inspired by the Self-Similarity descriptor

(Shechtman and Irani, 2007), which compares a cen-

tral intensity image patch to surrounding patches from

a predeﬁned area, and is invariant to local appearance.

For each central pixel, a w ×w patch is considered,

centered at that pixel, and S additional patches dis-

tributed uniformly in a ring of radius r around that

pixel. Given a parameter α (where α < S), we take S

pairs of patches, α-patches apart, and compare their

values to the central patch. A single bit value for the

code of the pixel is determined according to which of

the two patches is more similar to the central patch.

The code has S bits per pixel, and is computed for

pixel p by:

T PLBP (p) =

∑

i=1

(d (C

) −d (C

)) ·2

= (i + α) mod S

(1)

where C

and C

(i+α)mod S

are two w ×w patches

along the patches-ring, α-patches apart, C

is the cen-

tral patch, d (·, ·) is a distance measure (metric), e.g.

norm, and the function f

is a step threshold func-

tion, f

(x) = 1 iff x ≥ τ. The threshold value τ is

chosen slightly larger than zero, to provide stability

in uniform regions. The values in the TPLBP code

image are in the range



0, 2

−1



. Different code

words designate different patterns of similarity. Once

the image is TPLBP-encoded, the code image is di-

vided into non-overlapping cells, i.e. distinct regions,

and a histogram of code words with 2

bins is con-

structed for each cell. The histograms of all cells are

normalized to unit norm and concatenated to a single

vector, which constitutes the TPLBP descriptor.

We propose a Multi-Scale TPLBP descriptor

(termed TPLBP-MS), capturing spatial similarities

at various scales and resolutions, by concatenating

TPLBP descriptors with various parameters r and w.

The scale is affected by the radius r and patch res-

olution by patch size w. Three sets of parameters

are used for the encoding operator of Eq. (1), i.e.

(r, S, w) =

{

(2, 8, 3), (3,8, 4), (5, 8, 5)

}

, all with S = 8

and α = 2, as in (Wolf et al., 2008). These 3 TPLBP

descriptors are concatenated to produce the TPLBP-

MS descriptor. Parameters r and w are changed in

similar manner in the 3 sets above, thus observing

larger scales at lower resolutions.

3 A NEW COLOR DESCRIPTOR -

COUPLED-CHROMA TPLBP

Many color descriptors are histograms of color values

in some color space, e.g. rg-histogram and Opponent

Colors histograms (Van de Sande et al., 2010). Im-

age color channels contain texture information that is

disregarded by color histograms. Our motivation is to

formulate a color descriptor that captures the texture

information embedded in color channels in a robust

manner.

Color descriptors can be evaluated by several main

properties: (1) Invariance to photometric changes

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

(e.g. illumination, shadows etc.); (2) Invariance to ge-

ometric changes (e.g. camera viewpoint, object pose,

scale etc.); (3) Discriminative power, i.e. the ability to

distinguish a target object from the rest of the world;

(4) Stability, in a sense that the variance of a certain

dissimilarity measure between descriptor vectors of

samples from a speciﬁc distribution (or class) is low.

We would like to formulate a color descriptor that ad-

heres to these properties.

We represent color in CIE-Lab space, due to its

perceptual uniformity to a human observer. Using

Euclidean distance in CIE-Lab space approximates

the perceived distance by an observer, hence a detec-

tor based on this color space can in some sense ap-

proximate the perception of human color vision. In

CIE-Lab space, L is the luminance, a and b are the

chroma channels. We consider ﬁrst a color descriptor

produced by applying TPLBP to both chroma chan-

nels and concatenating the single-channel descriptors

to a single descriptor. Images in JPEG format are ana-

lyzed, in which the chroma channels are sub-sampled

(Guo and Meng, 2006), thus spatial resolution in

chroma channels is lower than in intensity. Hence,

to extract meaningful features from chroma, the ap-

propriate operator should be applied at a coarse res-

olution, relative to the operator applied to the inten-

sity image. The values of the parameters are chosen

accordingly, (r, S, w) = (5, 8, 4), i.e. both the radius

and patch dimension are increased. This descriptor is

termed Chroma TPLBP (C-TPLBP). It has twice the

size of TPLBP.

A degree of correlation exists between the chroma

channels in CIE-Lab space. This can be observed

either from the derived equations of CIE-Lab color

space from CIE-XYZ space, or from an experimental

perspective, by constructing a 2-D chroma histogram

of face images. Elliptically cropped face images from

the FDDB dataset (Jain and Learned-Miller, 2010)

with 2500 images are used to ﬁt a 2-D Gaussian den-

sity of chroma values a and b by mean and covari-

ance of the data. From the covariance matrix, we

have that σ

= 53.7, i.e. nonzero correlation be-

tween the chroma channels. We presume that cou-

pling the chroma channels information may lead to a

robust descriptor, which is also more compact than C-

TPLBP, where chroma channels descriptions are com-

puted separately. We propose the following operator:

CC −T PLBP (p) =

∑

i=1



∑

k=a,b



k,i

k,p



−d



k,i

k,p



·2

= (i + α) mod S

(2)

where C

k,i

is the ith patch of chroma channel k

and the inner summation is over chroma channels,

a and b. The thresholding function f

operates on

the sum of differences of patches distance functions,

for both chroma channels. Given a parameter α, we

take S pairs of patches from each chroma channel,

α-patches apart, and for each pair we compare dis-

tances to the central patch of the appropriate chan-

nel. A single bit value for the code of a pixel is de-

termined as follows - if similarities in both chroma

channels correlate, e.g. if in both chroma channels

patch C

is more similar to the central patch C

than

patch C

i+α

, then the appropriate bit will be assigned

value 0 (value 1 in the opposite case). Conversely,

if dissimilarities of the two channels do not corre-

late, then by viewing the argument of the function f

∑

k=a,b



k,i

k,p



−

∑

k=a,b



k,(i+α)mod S

k,p



the patch with lower sum of distances in both chroma

channels is more similar to the center, and the code

bit is derived accordingly. The computed code has S

bits per pixel, and this descriptor is of the same size as

TPLBP, i.e. half the size of C-TPLBP. This descrip-

tor is termed Coupled-Chroma TPLBP (CC-TPLBP).

The parameters are chosen in accordance with those

of C-TPLBP, (r, S, w) = (5,8, 4) and α = 2. We em-

phasize that different values for the radius (r), num-

ber of patches (S), patch dimension (w) and α may

be chosen, however, preliminary experiments showed

that good discriminative ability was obtained with the

parameter values speciﬁed above. The histograms are

computed on small cells of (20, 20) pixels, thus main-

taining the spatial binding of color and shape infor-

mation in the image by cells delimitation, i.e. late

fusion of color and shape (Snoek, 2005; Khan et al.,

2012a). CC-TPLBP is invariant to monotonic varia-

tions of chroma and luminance. Such variations do

not cause any change to the resulting descriptor. In

Fig. 1 we present the CC-TPLBP operator, where the

index k =

{

a, b

}

designates the chroma channel, as

in Eq. (2), with an example code computation for a

color face image. CC-TPLBP can be combined with

intensity-based shape features for classiﬁcation tasks.

4 EVALUATION OF COLOR

DESCRIPTORS

CC-TPLBP is invariant to monotonic changes of both

luminance and color channels. Moreover, we expect

it to exhibit a high degree of robustness to geomet-

rical changes, e.g. pose, local appearance and cam-

era viewpoint, as it is computed by similarities of

radially-distributed image patches. We evaluate CC-

TPLBP with respect to properties (1) - (4) described

in section 3, compared to the Robust Hue and Op-

ponent Angle (OPP) color descriptors (Weijer and

ANewRobustColorDescriptorforFaceDetection

(a) (b)

Figure 1: CC-TPLBP code computation. (a) CC-TPLBP operator for a single chroma channel, with parameters α = 2, S=8

and w=3. (b) An example of CC-TPLBP code for a color face image. Upper left - face image; upper right - CIE-Lab a

chroma; lower left - CIE-Lab b chroma (a and b are presented as gray-level images); lower right - CC-TPLBP code image.

The parameters used are r=5, S=8, w=4

Schmid, 2006). Opponent Colors are invariant with

respect to lighting geometry variations, and are com-

puted from RGB by:

O1 =

√

(R −G)

O2 =

√

(R + G −2B)

(3)

The Robust Hue descriptor is computed as histograms

on image patches over hue, which is computed from

the corresponding RGB values of each pixel, accord-

ing to:

hue = arctan





= arctan

√

3(R −G)

R + G −2B

(4)

Hue is invariant with respect to lighting geometry

variations when assuming white illumination. Hue

is weighted by the saturation, to reduce error. The

Opponent Derivative Angle descriptor (OPP) is com-

puted on image patches, by the histogram over the

opponent angle:

ang

= arctan





(5)

where O1

and O2

are spatial derivatives of the

chromatic opponent channels. OPP is weighted by the

chromatic derivative strength, i.e. by

+ O2

and is invariant with respect to diffuse lighting and

spatial sharpness. Color histograms are generally

considered more invariant to pose and viewpoint

changes than shape descriptors (Diplaros et al.,

2006), but are sensitive to changes of illumination

and shading.

We evaluate invariance and discriminative

power by the Kullback-Leibler Divergence, a

non-symmetric dissimilarity measure between two

probability distributions, p and q, expressed as:







∑

log





(6)

where q is considered a model distribution.

We consider descriptors that are constructed from

M histograms of M distinct image cells. Referring

to CC-TPLBP, each histogram has 2

bins, produc-

ing a descriptor of size M × 2

. Given two im-

ages, each with M cells, we compute M histograms

for each image. To compare CC-TPLBP descrip-

tors of these two images, we compute the KL Diver-

gence for each pair of appropriate histograms from

both images, i.e.

{

1,m

, h

2,m

)

}

m=1,..,M

, where

{

i,m

}

m=1,..,M

i=1,2

is the mth histogram of image i. We

deﬁne the KL Divergence of image 1 with respect

to image 2 by averaging over all image cells, i.e.

1,2

∑

m=1

1,m

, h

2,m

)). Each single-cell

histogram contains 2

= 256 bins.

We evaluate the CC-TPLBP, Hue and OPP de-

scriptors by three experiments, described as follows:

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

Figure 2: Example of face and background images used to examine discriminative power. Top line - sample face images,

bottom line - sample background images.

4.1 Invariance to Photometric and

Geometric Variations

In the ﬁrst experiment we evaluate invariance to com-

bined photometric and geometric variations, i.e. illu-

mination and background, face pose and viewpoint.

While this does not allow for independent evaluations

of invariance to photometric and geometric variations,

it simulates a realistic setting for face detection. We

use several groups of images of single persons from

the LFW Face Recognition dataset (Huang et al.,

2007), each group displays a single person with the

above variations. We compute the CC-TPLBP, Hue

and OPP histograms for all images in a set, normal-

ized to unit sum, and the KL Divergence between his-

tograms of all image pairs (which is non-symmetric,

i.e. D

, p

) 6= D

, p

)). Table 1 contains

statistics of KL Divergence values of all descriptors

for several image sets. While the number of images

is relatively small, the number of resulting pairing is

large and therefore indicative. CC-TPLBP appears to

be most robust to these variations, as its mean KL Di-

vergence is by far the lowest from all descriptors on

all image sets. CC-TPLBP also exhibits a higher de-

gree of stability than other descriptors, by its lowest

variance.

4.2 Invariance to Gaussian Noise

In the second experiment, we test the effects of

added noise, using 2500 face images from the

FDDB dataset (Jain and Learned-Miller, 2010), nor-

malized to size 63 × 39 pixels. According to

(Diplaros et al., 2006), sensor noise is normally dis-

tributed, as additive Gaussian noise is widely used

to model thermal noise, and is a limiting behav-

ior of photon counting noise. High Gaussian noise

is added to R, G and B channels of all images,

i.e.

{

R, G, B

}



R + n

, G +n

, B +n



, where



= n

(x, y)



k=R,G,B

, n (x, y) ∼ N (0, σ

), with

= 5. We calculate KL Divergence between de-

scriptor histograms of original and corrupted images.

Statistics of the KL Divergence values are displayed

in Table 2. While Hue has an average KL Divergence

slightly lower than CC-TPLBP, the latter has signiﬁ-

cantly lower variance than other descriptors, indicat-

ing higher stability under addition of Gaussian noise.

4.3 Discriminative Power

In the third experiment, we examine discriminative

power. A descriptor based on color histograms would

be effective in distinguishing face patches from dis-

tinct objects, e.g. trees or sky patches, but may be

less effective in distinguishing a face from skin, e.g.

neck, torso. Here a color texture descriptor may be

more efﬁcient. We choose randomly 200 face images

from the FDDB dataset, and pick 200 background im-

ages (see supplementary material) that give a degree

of diversity and challenge for the considered descrip-

tors, i.e. versatility of chroma and texture. Half of the

background images do not contain skin at all, and the

other half partially contain skin, with variable back-

grounds. This image set is constructed to represent

the kind of natural setting where the function of the

descriptor is to be able to discriminate face patches

from non-face skin patches together with versatile

non-skin background. Several examples are presented

in Fig. 2.

To evaluate discriminative power, we use the KL

Divergence similar to (Khan et al., 2012a). We deﬁne

a KL-ratio for face sample, considering all face and

background samples in the set:

ANewRobustColorDescriptorforFaceDetection

Table 1: Statistics of KL-Divergence, combined evaluation of photometric and geometric invariance, for several sets of single-

person images. KL-Divergence is calculated for all pairs of images in a set. For further explanation, see text.

Person set (No. images / No. pairs) Descriptor Mean Median STD

Jennifer Aniston (21 / 420)

CC-TPLBP 0.1034 0.096 0.0353

Hue 0.9666 0.8321 0.5885

OPP 0.3555 0.2637 0.288

Arnold Schwarzenegger (42 / 1722)

CC-TPLBP 0.1154 0.1 0.0519

Hue 1.3682 1.2901 0.6468

OPP 0.6535 0.482 0.551

Vladimir Putin (49 / 2352)

CC-TPLBP 0.1124 0.0988 0.0515

Hue 1.1799 1.0515 0.6268

OPP 0.467 0.3582 0.3737

Table 2: Statistics of KL-Divergence, noisy images.

Desc. Mean Median STD

CC-TPLBP 0.0553 0.0543 0.0168

Hue 0.0492 0.0397 0.0405

OPP 0.0968 0.0818 0.0579

Table 3: Statistics of KL-ratios; discriminative power. CC-

TPLBP is found most discriminative.

Desc. Mean Median STD

CC-TPLBP 1.7402 1.7506 0.1678

Hue 1.456 1.3835 0.361

OPP 1.6671 1.6712 0.2185

KL −ratio

∑

j∈B

KL (p

, p

)

−1

∑

i∈F, i6=k

KL (p

, p

)

∀k ∈F

(7)

where p

is the descriptor of face patch k ∈ F, p

is the descriptor of background patch j ∈ B, N

and

are the number of face and background samples,

respectively. For a face sample k, Eq. (7) deﬁnes the

ratio of the average KL Divergence with all non-face

patches, divided by the average KL Divergence with

all face patches. The higher this ratio for a face patch

k ∈ F , the more discriminative the descriptor with re-

spect to this face and data set, as the intra-class KL

Divergence is lower than the inter-class KL Diver-

gence. The KL-ratio values of all descriptors on the

dataset are displayed in Fig. 3, after low-pass ﬁlter-

ing by a uniform averaging ﬁlter of size 7. Smoothing

is performed in order to reduce the noisiness in the

original KL-ratio curves. Statistics of the KL-ratios

(prior to low-pass ﬁltering) are given in Table 3. We

observe that the average KL-ratio for CC-TPLBP is

higher than that of Hue and OPP (i.e. higher discrim-

inative power), and that the variance of CC-TPLBP is

the lowest, indicating high stability (i.e. low variabil-

ity of KL-ratios for data samples from a speciﬁc class

in a dataset).

5 EVALUATION OF THE COLOR

DESCRIPTOR IN A FACE

DETECTION SETTING

We evaluate the CC-TPLBP color descriptor in a face

detection setting.

5.1 Dataset

We use the FDDB benchmark (Jain and Learned-

Miller, 2010), which contains annotations of 5171

faces in 2845 images, divided into 10 folds. ﬁve

folds are used for training, and ﬁve for testing. Train-

ing face images are normalized to size 63 ×39. The

background set is constructed from random 63 ×39 -

sized patches from background images of the NICTA

dataset (Overett et al., 2008), i.e. of same size as the

face patches.

5.2 Evaluation Protocol

In our face detection system, we use Support Vector

Machines (Cortes and Vapnik, 1995), a classiﬁcation

method that has been successfully applied for face de-

tection (Romdhani et al., 2004; Osuna et al., 1997),

as the face classiﬁer. We examine various descrip-

tors combinations, i.e. (1) TPLBP, (2) TPLBP-MS,

(3) TPLBP-MS + Hue, (4) TPLBP-MS + OPP, (5)

TPLBP-MS + C-TPLBP and (6) TPLBP-MS + CC-

TPLBP. For each of (1)-(6) we train a linear-kernel

SVM classiﬁer with Soft Margin, where the regular-

ization parameter C is determined by K-fold cross-

validation (K=5). To reduce false alarm rate, we add

a conﬁdence measure for an SVM classiﬁer decision,

as a probability for a single decision (Platt, 1999):

p(w, x, y) =

1 + exp (−y(w ·x + b))

(8)

where w is the SVM separating hyperplane nor-

mal vector, x is a test sample and y is the classiﬁ-

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

Figure 3: Discriminative power measure. KL-ratios of 200 face images with 200 background images. Horizontal axis: face

sample numbers; vertical axis: KL-ratio values computed by Eq. (7). The displayed KL-ratios are smoothed using a uniform

averaging ﬁlter of size 7, for further explanation see text. It can be seen that CC-TPLBP (blue curve) has the highest mean

KL-ratio and lowest variance, as also seen in Table 3.

cation label. This logistic (sigmoid) function assigns

high conﬁdence (i.e. close to 1) to correctly-classiﬁed

samples which are distant from the hyperplane.

Preprocessing of an image is performed by ap-

plying skin detection in CIE-Lab color space, to re-

duce image area to be scanned by a sliding window

method. Various skin detection methods and color

spaces can be used (hsuan Yang and Ahuja, 1999;

Jones and Rehg, 2002; Zarit et al., 1999; Terrillon

et al., 2000; Braunstain and Gath, 2013). We train

ofﬂine a skin histogram based on chroma (a, b), omit-

ting the luminance L as it is highly dependent on

lighting conditions (Cai and Goshtasby, 1999). Skin

detection in a test image is performed pixel-wise, by

the application of threshold τ

, i.e. for pixel p =

, y

) with quantized chroma values



¯a



and

histogram value h



¯a



= h

, the pixel is classi-

ﬁed as skin if h

> τ

. After skin is extracted, we

perform a sliding window scan to examine windows

at various positions and scales. The conﬁdence mea-

sure of Eq. (8) is used by applying a threshold, i.e. if

p(w, x, y) > p

, the window is classiﬁed as a face.

5.3 Results

Face detection performance was evaluated by fol-

lowing the evaluation scheme proposed in (Jain and

Learned-Miller, 2010). Receiver Operating Charac-

teristic (ROC) were computed, with True Positive rate

(T PR ∈ [0, 1]) vs. number of False Positives (FP). In

Fig. 4, ROC curves of continuous score (Jain and

Learned-Miller, 2010) are depicted for various de-

scriptor combinations. We observe that each of the

descriptor combinations, TPLBP-MS, C-TPLBP and

CC-TPLBP produce signiﬁcant improvements in de-

tection rates, compared to TPLBP. CC-TPLBP leads

to similar performance as C-TPLBP, but with a more

compact representation.

6 CONCLUSIONS

In the present work the focus is not on the design or

optimization of a face detection framework, but rather

on color representation, or description, for the task of

face detection. We proposed a novel color descriptor,

CC-TPLBP, which captured the texture information

embedded in color channels. CC-TPLBP is by deﬁni-

tion invariant to monotonic changes in chroma and lu-

minance channels. A multi-scale variant of TPLBP is

designed, termed TPLBP-MS. All experiments were

performed in a face detection setting. We examined

the invariance of CC-TPLBP, jointly for photomet-

ric and geometric variations, i.e. illumination, back-

ANewRobustColorDescriptorforFaceDetection

Figure 4: Face detection ROC curves on FDDB, for various descriptors combinations. It is clearly discerned that both

CC-TPLBP and C-TPLBP (red and green lines, respectively) outperform all the other descriptor combinations. In addition,

CC-TPLBP is twice more compact than C-TPLBP, making it the more efﬁcient representation.

ground, face pose and viewpoint changes, and sepa-

rately for addition of Gaussian noise, and compared to

the Robust Hue and Opponent Angle (OPP) descrip-

tors. Discriminative power was evaluated with re-

spect to the above mentioned descriptors. CC-TPLBP

is superior to the other two descriptors. It achieves

higher discriminative power and much higher invari-

ance to combined photometric and geometric varia-

tions, compared to Hue and OPP, as demonstrated in

section 4. The evaluation experiments in a face de-

tection setting demonstrated that (1) TPLBP-MS im-

proves detection rates compared to TPLBP, (2) the ad-

dition of CC-TPLBP produces a sharp improvement

over TPLBP-MS and (3) CC-TPLBP leads to supe-

rior detection rates compared to Hue and OPP.

The CC-TPLBP color-based descriptor can be in-

tegrated into face detection frameworks to achieve a

substantial improvement in performance using exis-

tent color channels information. It can also be used in

general color-based object recognition tasks.

REFERENCES

Bergtholdt, M., Kappes, J., Schmidt, S., and Schn

orr, C.

(2010). A study of parts-based object class detection

using complete graphs. Int. J. Comput. Vision, 87(1-

2):93–117.

Bindemann, M. and Burton, A. M. (2009). The role of

color in human face detection. Cognitive Science,

33(6):1144–1156.

Braunstain, E. and Gath, I. (2013). Combined supervised /

unsupervised algorithm for skin detection: A prelimi-

nary phase for face detection. In Image Analysis and

Processing - ICIAP 2013 - 17th International Confer-

ence, Naples, Italy, September 9-13, 2013. Proceed-

ings, Part I, pages 351–360.

Cai, J. and Goshtasby, A. A. (1999). Detecting human faces

in color images. Image Vision Comput., 18(1):63–75.

Cortes, C. and Vapnik, V. (1995). Support-vector networks.

Machine Learning, 20(3):273–297.

Diplaros, A., Gevers, T., and Patras, I. (2006). Com-

bining color and shape information for illumination-

viewpoint invariant object recognition. IEEE Trans-

actions on Image Processing, 15:1–11.

Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and

Ramanan, D. (2010). Object detection with discrimi-

natively trained part-based models. IEEE Trans. Pat-

tern Anal. Mach. Intell., 32:1627–1645.

Gevers, T. and Smeulders, A. (1997). Color based object

recognition. Pattern Recognition, 32:453–464.

Guo, L. and Meng, Y. (2006). Psnr-based optimization of

jpeg baseline compression on color images. In ICIP,

pages 1145–1148. IEEE.

Heisele, B., Ho, P., Wu, J., and Poggio, T. (2003).

Face recognition: Component-based versus global ap-

proaches.

hsuan Yang, M. and Ahuja, N. (1999). Gaussian mixture

model for human skin color and its applications in im-

age and video databases. In Its Application in Image

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

and Video Databases. Proceedings of SPIE 99 (San

Jose CA, pages 458–466.

Huang, G. B., Ramesh, M., Berg, T., and Learned-Miller,

E. (2007). Labeled faces in the wild: A database for

studying face recognition in unconstrained environ-

ments. Technical Report 07-49, University of Mas-

sachusetts, Amherst.

Jain, A. K. (1989). Fundamentals of Digital Image Pro-

cessing. Prentice-Hall, Inc., Upper Saddle River, NJ,

USA.

Jain, V. and Learned-Miller, E. (2010). Fddb: A benchmark

for face detection in unconstrained settings. Tech-

nical Report UM-CS-2010-009, University of Mas-

sachusetts, Amherst.

Jones, M. J. and Rehg, J. M. (2002). Statistical color mod-

els with application to skin detection. Int. J. Comput.

Vision, 46(1):81–96.

Khan, F. S., Anwer, R. M., van de Weijer, J., Bagdanov,

A. D., Vanrell, M., and Lopez, A. M. (2012a). Color

attributes for object detection. In CVPR, pages 3306–

3313. IEEE.

Khan, F. S., van de Weijer, J., and Vanrell, M. (2012b).

Modulating shape features by color attention for ob-

ject recognition. International Journal of Computer

Vision, 98(1):49–64.

Khan, R., van de Weijer, J., Khan, F. S., Muselet, D., Ducot-

tet, C., and Barat, C. (2013). Discriminative color de-

scriptors. In CVPR, pages 2866–2873. IEEE.

Li, H., Hua, G., Lin, Z., Brandt, J., and Yang, J. (2013).

Probabilistic elastic part model for unsupervised face

detector adaptation. In The IEEE International Con-

ference on Computer Vision (ICCV).

Mikolajczyk, K., Schmid, C., and Zisserman, A. (2004).

Human detection based on a probabilistic assembly

of robust part detectors. In ECCV (1), volume 3021

of Lecture Notes in Computer Science, pages 69–82.

Springer.

Ojala, T., Pietik

ainen, M., and M

aenp

a, T. (2002). Mul-

tiresolution gray-scale and rotation invariant texture

classiﬁcation with local binary patterns. IEEE Trans.

Pattern Anal. Mach. Intell., 24(7):971–987.

Osuna, E., Freund, R., and Girosi, F. (1997). Training sup-

port vector machines: an application to face detection.

pages 130–136.

Overett, G., Petersson, L., Brewer, N., Pettersson, N., and

Andersson, L. (2008). A new pedestrian dataset for

supervised learning. In IEEE Intelligent Vehivles Sym-

posium, Eindhoven, The Netherlands.

Platt, J. C. (1999). Probabilistic outputs for support vector

machines and comparisons to regularized likelihood

methods. In ADVANCES IN LARGE MARGIN CLAS-

SIFIERS, pages 61–74. MIT Press.

Romdhani, S., Torr, P., and Sch

olkopf, B. (2004). Efﬁcient

face detection by a cascaded support-vector machine

expansion. Royal Society of London Proceedings Se-

ries A, 460:3283–3297.

Shechtman, E. and Irani, M. (2007). Matching local self-

similarities across images and videos. In IEEE Con-

ference on Computer Vision and Pattern Recognition

2007 (CVPR’07).

Snoek, C. G. M. (2005). Early versus late fusion in semantic

video analysis. In In ACM Multimedia, pages 399–

402.

Terrillon, J.-C., Fukamachi, H., Akamatsu, S., and Shirazi,

M. N. (2000). Comparative performance of different

skin chrominance models and chrominance spaces for

the automatic detection of human faces in color im-

ages. In FG, pages 54–63.

Van de Sande, K. E. A., Gevers, T., and Snoek, C. G. M.

(2010). Evaluating color descriptors for object and

scene recognition. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 32(9):1582–1596.

Viola, P. and Jones, M. (2004). Robust real-time face de-

tection. International Journal of Computer Vision,

57:137–154.

Wei, Y., Sun, J., Tang, X., and Shum, H.-Y. (2007). Interac-

tive ofﬂine tracking for color objects. In ICCV, pages

1–8.

Weijer, J. V. D. and Schmid, C. (2006). Coloring local fea-

ture extraction. In In ECCV, 2006. MENSINK et al.:

TMRF FOR IMAGE AUTOANNOTATION.

Wolf, L., Hassner, T., and Taigman, Y. (2008). Descriptor

based methods in the wild. In Real-Life Images work-

shop at the European Conference on Computer Vision

(ECCV).

Zarit, B. D., Super, B. J., and Quek, F. K. H. (1999). Com-

parison of ﬁve color models in skin pixel classiﬁca-

tion. In In ICCV99 Intl. Workshop on, pages 58–63.

Zhang, L., Chu, R., Xiang, S., Liao, S., and Li, S. Z. (2007).

Face detection based on multi-block lbp representa-

tion. In Proceedings of the 2007 International Con-

ference on Advances in Biometrics, ICB’07, pages 11–

18, Berlin, Heidelberg. Springer-Verlag.

Zhu, X. and Ramanan, D. (2012). Face detection, pose es-

timation, and landmark localization in the wild. In

CVPR, pages 2879–2886.

ANewRobustColorDescriptorforFaceDetection