SHADOW MODELING AND DETECTION FOR ROBUST

FOREGROUND SEGMENTATION IN HIGHWAY SCENARIOS

Katherine Batista, Rui Caseiro and Jorge Batista

Institute of Systems and Robotics, DEEC-FCTUC, University of Coimbra, Portugal

Keywords:

Foreground segmentation, Shadow modelling and detection, Trafﬁc surveillance.

Abstract:

This paper presents a method to automatically model and detect shadows on highway surveillance scenarios.

This approach uses a cascade of two classiﬁers. The ﬁrst stage of this method uses a weak classiﬁer to ascertain

the color information of possibly shadowed pixels which will be used by the second stage of this method

(strong classiﬁer). The weak classiﬁer estimates the Color Normalized Cross-Correlation (CNCC) and the

color information of the pixels identiﬁed as shadow, will be used to build or update multi-layered statistical

shadow models of the RGB appearance of shadow. These models will then be used, by the strong classiﬁer,

to correctly distinguish shadow. To prevent misclassiﬁcations from corrupting the results of both classiﬁers,

spatial dependencies are also taken into account. For this purpose, nonparametric kernel density estimators in

a pyramidal decomposition (PKDE), as well as, Markov Random Fields (MRF) were independently employed.

This technique is being used in a real outdoor trafﬁc surveillance system in order to minimize the effects of

cast vehicle shadows as well as shadows induced by illumination changes. Several results are presented in this

paper to prove its effectiveness and the advantages of applying spatial contextualization methods to the weak

and strong classiﬁers.

1 INTRODUCTION

Advances in camera technology as well as in scien-

tiﬁc areas such as computer vision have lead to the

development of efﬁcient and robust real time trafﬁc

surveillance systems. In this sort of systems, the pre-

cise detection of moving objects is essential. Gen-

erally, most foreground segmentation processes (e.g.

(Zhong and Sclaroff, 2003)) are sensitive to illumina-

tion changes or cast vehicle shadows which can lead

to faulty detections which seriously reduces the efﬁ-

ciency of other dependent processes. More precisely,

cast vehicle shadows can signiﬁcantly increase the

detected vehicle’s area and lead to its merging with

nearby vehicles. This fact gravely affects the outcome

of any tracking or vehicle counting systems used in

trafﬁc surveillance. Illumination changes, induced by

clouds or camera auto gain control processes can also

generate false positives.

To overcome these problems, several methods have

been developed where some are based on the use of

color (e.g. (Cucchiara et al., 2003) or (Horprasert

et al., 1999)), brightness, reﬂectance and geometry

information to identify shadowed pixels. In (Cuc-

chiara et al., 2003) shadows are identiﬁed by assum-

ing that the difference of the hue and saturation com-

ponents, between the pixel and corresponding back-

ground pixel, change within certain limits. Nonethe-

less, this technique is not ﬂexible, seeing as it requires

the prior deﬁnition of parameters which are not adapt-

able to illumination changes and are not constant for

different scenarios. The authors in (Prati et al., 2003)

also refer that this method presents low robustness

in noisy scenarios. T. Horprasert (Horprasert et al.,

1999) analyze the pixel’s normalized brightness and

normalized chromaticity distortions in the RGB color

space and classiﬁes a pixel as shadow using a set of

thresholds. An automatic method is presented to es-

timate these limits, nonetheless, it is computationally

too expensive to be used in a real time trafﬁc surveil-

lance system. Cavallaro (Cavallaro et al., 2005) also

analyze pixel color information, yet in order to over-

come the previously referred problems, combines this

information with spatial constraints based on edge de-

tection. However, this method doesn’t remove shad-

ows whose edge pixels are adjacent to objects and

background pixels without the use of an heuristic

analysis of a temporal shadow tracking procedure.

Other approaches use statistical models to learn and

describe the appearance of cast shadow. For instance,

Liu (Liu et al., 2007) use information obtained from

a color based classiﬁer and employ GMM (Gaussian

Mixture Models) to model shadowed pixels in the

HSV color space. To improve the classiﬁer, they use

148

Batista K., Caseiro R. and Batista J. (2010).

SHADOW MODELING AND DETECTION FOR ROBUST FOREGROUND SEGMENTATION IN HIGHWAY SCENARIOS.

In Proceedings of the International Conference on Computer Vision Theory and Applications, pages 148-157

DOI: 10.5220/0002823401480157

 SciTePress

local region level information to update these mod-

els. Brisson (Martel Brisson and Zaccarin, 2007) also

use pixel color information in the YUV color space

to build a Gaussian mixture shadow model (GMSM).

Nevertheless, seeing as the approach is pixel-based,

the obtained model’s accuracy is dependent on the

color-based classiﬁer’s results throughout time. In

(Huang and Chen, 2008) shadows are identiﬁed by

building pixel-based local region shadow models us-

ing GMMs. A global model is also estimated and used

to update the local region models when movement is

rare. The background, foreground and shadow mod-

els are built into a MRF energy function. However,

this method’s weak classiﬁer which provides infor-

mation for the learning of cast shadow, requires the

deﬁnition of several parameters that are not adaptable

to illumination changes.

On the other hand, Porikli (Porikli and Thornton,

2005) models shadows by multivariate Gaussians us-

ing RGB color information provided by a pre-ﬁlter.

This approach does not require color space transfor-

mation and, seeing as it uses multiple independent

layers to model each shadow pixel, it is more ﬂexible

than the standard GMMs approach to model shadow.

The shadow models are achieved using color informa-

tion provided by a pre-ﬁlter that evaluates color vari-

ation such as in (Horprasert et al., 1999). Shadow

pixels are distinguished using these models and mis-

classiﬁcation are corrected using shadow ﬂow, which

once again is a color-based analysis. One of the main

drawbacks of this method, is the fact that it does not

perform any sort of spatial contextualization of the

pixel’s label. Therefore, foreground pixels which pos-

sess similar color information to modeled shadow are

misclassiﬁed.

The method here presented overcomes this ﬂaw by

considering that a pixel’s label is inﬂuenced by its

neighboring pixel labels. In a general matter, this

method is composed by a cascade of two classiﬁers.

To the results of these classiﬁers, spatial contextual-

ization is induced to correct misclassiﬁcations. The

ﬁrst classiﬁer is a weak classiﬁer, which purpose is

analyzing every segmented foreground pixel and de-

termining whether a pixel is possibly shadow by mea-

suring the similarity between color and texture of

the foreground and corresponding background. This

is done by estimating the Color Normalized Cross-

Correlation (CNCC). This information is used to

build or update statistical models that describe the

RGB appearance of shadow pixels. These multi-

layered pixel-based models are used by the second

classiﬁer (strong classiﬁer) to identify cast shadows.

Nonetheless, erroneous classiﬁcations may seriously

compromise the foreground segmentation process. To

minimize the number of misclassiﬁcations, the pixel’s

neighboring labels are taken into account. To do so,

two distinct and independent approaches were imple-

mented and compared. One, consists on a pyramidal

decomposition of kernel density estimators (PKDE),

which has as main goal ascertaining probabilistic rep-

resentations of the surrounding pixel labels to im-

prove the results given by the pre-ﬁlter. Another tech-

nique also analyzes the spatial label dependencies us-

ing a Markov Random Field (MRF) energy function

which is minimized by the graph cut algorithm.

2 WEAK SHADOW CLASSIFIER

The weak classiﬁer evaluates the segmented fore-

ground pixels to determine whether a pixel is a possi-

ble shadow pixel. The main goal of this classiﬁer is

not to detect shadows accurately, but to ﬁlter out some

impossible shadow pixels. The results of this classi-

ﬁer will be used further on by the strong classiﬁer.

The approach here presented estimates the CNCC be-

tween each segmented pixel I

and the corresponding

background pixel Bp

(see subsection 2.1). To im-

prove the results of this classiﬁer, two distinct tech-

niques were independently applied and compared.

One, uses a PKDE method (presented in subsection

5.1), while another method ascertains the pixel’s la-

bel by using a MRF approach (presented in subsec-

tion 5.2). A quantitative and qualitative analysis of

the results of these two techniques can be found in

subsection 6.1.

2.1 Color Normalized

Cross-Correlation (CNCC)

This classiﬁer measures the similarity of color and

texture between foreground and background, by es-

timating the CNCC (Grest et al., 2003). More pre-

cisely, a pixel is classiﬁed as shadow if its texture

is correlated with the corresponding texture of Bp(t).

In order to estimate the CNCC, the brightness infor-

mation is split from the color values, which is done

by representing the pixel’s color in the bi-conic HSL

space (Grest et al., 2003) (see Figure 1.(a)). To mea-

sure the similarity, the correlation between both can

be estimated by projecting the RGB color vector onto

the chromatic HS plane in order to calculate the Eu-

clidean values of hue (h) and saturation (s). This al-

lows the estimation of the scalar product between the

referred pixels (h,s,L) which is proportional to their

correlation. This is quite simple to understand, see-

ing as if they have similar hues (small angle between

SHADOW MODELING AND DETECTION FOR ROBUST FOREGROUND SEGMENTATION IN HIGHWAY

SCENARIOS

149

Figure 1: (a) Representation of the hue and saturation com-

ponents of the HSL color space. (b) Chromatic plane of the

hsL color space (

and

represent the projection of two

color pixels on this plane).

them) and high saturations, the resulting correlation

will be high (see Figure 1.(b)).

Hence, being c

and c

the color vectors, in the

hsL space, of a foreground and background pixel at

(x,y), the CNCC is estimated over a window M ×N

surrounding those pixels, using the following equa-

tion:

CNCC =

∑

i j

•c

) −MN

√

VAR

; (1)

where,

VAR

= (

∑

i j

•c

) −MNL

), (2)

and L

is the average intensity of the

foreground pixels inside that window,

k ∈ {F( f oreground),B(background)} and

• c

) = (h

i j

) ◦ (h

i j

) + L

i j

, where

the operator ◦ represents the scalar product.

For

gray-level pixels, the CNCC will present similar

values to the normalized cross correlation (NCC).

The resulting values of CNCC lie within [0...1], which

can be interpreted as probabilistic measurements, and

the higher they present themselves, the more likely

the pixel is a shadowed pixel. Consequently, a pixel

can be identiﬁed as shadow if these values are larger

than a given threshold. Several examples of results of

this procedure are presented in Figure 2.

3 SHADOW MODELING

Shadow pixels can be distinguished by using statisti-

cal models of their RGB appearance. Basically, each

image pixel possesses multiple layers, where each

one of these, represent a different shadow appearance

for that pixel. In this section, a method proposed in

(Porikli and Thornton, 2005), is described to achieve

these shadow models. The process becomes more dis-

criminative the larger the number of layers. However,

The negative values are set to zero.

Figure 2: Results of the CNCC weak classiﬁer (red=shadow,

green=foreground).

seeing as this technique is meant to be applied in a real

time system, a large number of layers may compro-

mise the system’s framerate. Thus, we chose to use

three layers seeing as these proved sufﬁcient to statis-

tically describe the color information of shadow. Each

one of these layers can be represented by a multivari-

ate Gaussian distribution corresponding to each color

channel. In other words, for every given pixel, three

layers are estimated, where for each one of these lay-

ers, each RGB channel is modeled by a Gaussian dis-

tribution. To build or update a layer at a given time t,

this method uses the RGB information (x = [r, g,b]) of

a pixel identiﬁed as shadow by the weak shadow clas-

siﬁer. More precisely, to update a layer this method

performs recursive Bayesian estimation using this in-

formation (Porikli and Thornton, 2005). Assuming

the layer follows a normal-inverse-Wishart distribu-

tion, this update can be done using the following ex-

pressions:

= υ

t−1

+ n; κ

= κ

t−1

+ n; (3)

= θ

t−1

+ n

+ ¯x

t−1

+ n

(4)

= Λ

t−1

+ Σ

i=1

− ¯x))(x

− ¯x))

t−1

( ¯x −θ

t−1

)( ¯x −θ

t−1

)

(5)

= (υ

−4)

−1

(6)

where, v

and Λ

are the degrees of freedom and scale

matrix for inverse Wishart distribution, θ

the mean,

the covariance and κ

t−1

the number of prior obser-

vations. When the update is performed at each time

frame (i.e. n = 1), the mean of the new samples,

¯x, becomes the pixel’s color information x. These

parameters are re-estimated when a layer is updated

and they describe the pixel’s appearance by combin-

ing the prior information with the new color informa-

tion. When each layer is initialized, the following pa-

rameters are assumed (Porikli and Tuzel, 2005): κ

10; υ

= 10; θ

= x

; Λ

= (υ

−4)16

I; where I is

a three dimensional identity matrix.

Each layer has also associated a conﬁdence mea-

surement, given by: C =

(υ

−2)

(υ

−4)|Λ

, which decreases

with larger variances. This parameter is used in the

layer update algorithm, by sorting the three different

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

150

Algorithm 1: Layer update algorithm.

Input:

1. All pixels identiﬁed as foreground;

2. L

= different layers (i = 1,..., NU M LAY ERS);

for All pixels identiﬁed as foreground: do

1. x = [r, g,b] = pixel identiﬁed as shadow by the

weak shadow classiﬁer;

2. Sort layers L

(θ

t−1

,Σ

t−1

,υ

t−1

...) according to

conﬁdence measurements.

3. for i ≤ NUM LAY ERS do

(a) Estimate Mahalanobis distance:

= (x −θ

t−1,i

)

−1

t−1,i

(x −θ

t−1,i

);

(b) if

(Sample x in 99% con f idence interval)

• Update layer L

model parameters

(eq. 3

4, 5 and 6);

• Analyze next Pixel x;

else

• Decrease the number of observations:

= κ

t−1

−n;

endif

end

4. if No layer updated :

Delete layer L with lowest conﬁdence

measurement and initialize new layer with

new sample and new initial parameters.

endif

end

(a) (b) (c) (d)

Figure 3: Background image represented in (a) and the cor-

responding shadow models (ordered from most conﬁdent to

least conﬁdent (b) to (d)).

layers according to their variances. The layer updat-

ing process is recapitulated in algorithm 1. Figure 3

shows several examples of different shadow models

for a shadowy scenario.

4 SHADOW /FOREGROUND

SEGMENTATION

Foreground pixels are those that are not identiﬁed as

background or shadow. The process used to estimate

the background model is not exploited in this paper,

however for more details see (Monteiro et al., 2008).

Having statistically modeled the color appearance of

shadow it is possible to correctly identify the shadow

and true foreground pixels. To do so, the different

shadow layers are sorted accordingly to their conﬁ-

dence measurements and the Mahalanobis distance is

calculated between the pixel’s color and each layer.

Unlike (Porikli and Thornton, 2005), a pixel is not

immediately labelled as shadow if its color informa-

tion lies within the 99% conﬁdence interval of one of

the shadow model layers. This would lead to erro-

neous classiﬁcations due to noisy less conﬁdent lay-

ers.

Algorithm 2: Shadow classiﬁer algorithm

(Strong Classiﬁer).

Input:

1. L

= different shadow layers

(i = 1, ...,NUM LAY ERS);

2. C

min

=Minimum normalized conﬁdence

(threshold);

3. C

∑

min

=Minimum sum of normalized conﬁdence

(threshold);

for All pixels identiﬁed as foreground (x = [r, g,b])

1. Sort layers L

(θ

t−1

,Σ

t−1

,υ

t−1

...) according to

conﬁdence measurements.

2. for i ≤ NUM LAY ERS do

(a) Estimate Mahalanobis distance;

(b) if (x in 99% con fidence interval)

• Normalize the L

conﬁdence measurement: C

norm

• Re-estimate the layer’s sum of

normalized conﬁdence:

sum

= C

sum

norm

endif

end

3. if ( The layer’s C

norm

≥C

min

and C

sum

≥C

∑

min

)

• Pixel classiﬁed as shadow.

endif

4. if (Above conditions not satis fied f or no L

• Pixel classiﬁed as foreground.

endif

end

The method here presented avoids labeling as

shadow, pixels that lie within the conﬁdence inter-

val of low conﬁdent shadow models. To do so,

each pixel’s layer conﬁdence is normalized (C

norm

∑

) and the sum of normalized conﬁdence mea-

surements of model layers with which the sample is

within the 99% conﬁdence interval C

∑

norm

is esti-

mated. If one of these values is lower than two thresh-

olds (i.e. C

norm

< C

min

or C

∑

norm

< C

∑

min

, where

min

NUM LAY ERS

and the default C

∑

norm

= 0.5),

then the pixel is not labeled as shadow. This is done

mainly to avoid noisy less conﬁdent models from in-

SHADOW MODELING AND DETECTION FOR ROBUST FOREGROUND SEGMENTATION IN HIGHWAY

SCENARIOS

151

ducing erroneous classiﬁcations in the segmentation

process. In subsection 6.2 a quantitative analysis is

presented to prove the beneﬁts of introducing these

thresholds (C

min

and C

∑

min

) in this procedure.

Figure 4: (a) Captured image. (b) Most conﬁdent shadow

model. (c) Results of the strong classiﬁer.

Vehicle pixels which possess similar color infor-

mation as the shadow models can be mislabelled as

shadow (which can be seen in Figure 4.(c)). To over-

come this drawback, spatial context can be introduced

into this ﬁnal classiﬁcation.

5 SPATIAL

CONTEXTUALIZATION

METHODS

By using the pixel’s neighboring label information it

is possible to minimize the number of wrongly classi-

ﬁed pixels. For this purpose, two distinct techniques

were independently employed and compared. One,

uses a PKDE approach (presented in subsection 5.1),

while another method ascertains the pixel’s label by

using a MRF approach (presented in subsection 5.2).

A quantitative and qualitative analysis of the results

of these two techniques can be found in subsection

6.1 and 6.2.

5.1 Pyramidal Decomposition of

Nonparametric Kernel Density

Estimators (PKDE)

The main goal of this method, is to use statistical rep-

resentations of the surrounding pixel labels to correct

erroneous classiﬁcations. Having to chose a model

and estimate the distribution parameters is avoided us-

ing nonparametric kernel density estimators (KDE).

Therefore, the distribution’s probability density func-

tion (pdf) can be given by:

p(z) =

∑

i=1

(z −z

) (7)

where, N represents the number of data points and

is a kernel function with bandwidth h. Choosing

a Gaussian kernel function, the density model can be

achieved by placing a Gaussian over each data point

and adding up the contributions over the whole data

set and normalizing it by dividing this result by the

number of data points (Bishop, 2006), which gives:

p(z) =

∑

i=1

(2πh

)

1/2



−kz−z



(8)

Given the fact we are statistically modeling a

pixel’s classiﬁcation based on its neighboring infor-

mation, the pdf is estimated for a two dimensional

domain space and therefore z becomes z = [x y]. The

resulting two dimensional kernel density model can

then be represented by:

p(z) =

∑

i=1

2π(detΣ)

1/2

−

(

[z−z

]

−1

[z−z

]

)

(9)

where, Σ is the covariance matrix.

By estimating the pdf functions over a M by M

window surrounding the pixel (N = M × M), the

pixel’s label is ascertained. The choice of an appro-

priate kernel function’s bandwidth, h is rather tricky.

If it is too small the kernel density model will be un-

dersmoothed but, on the other hand, if it is too large it

will become over-smoothed. Several automatic band-

width selector methods have been developed through-

out the years, such as MISE (mean integrated square

error) or AMISE (asymptotic MISE) driven meth-

ods (Wand and Jones, 1994). Oversmoothing, least

squares cross-validation, biased cross-validation or

plug-in methods are several examples of AMISE

bandwidth estimator techniques. These methods are

computationally too expensive for real time systems

or require the speciﬁcation of a pilot bandwidth (plug-

in methods). Nonetheless, simpler methods have

emerged, such as the balloon estimator (Mittal and

Paragios, 2004), where the bandwidth is calculated in

function of the distance from the point to the nearest

data point. However, this method is subject to discon-

tinuities and integration at inﬁnity problems. Another

strategy, known as sample point estimator, calculates

the bandwidth in function of the sample points (Mit-

tal and Paragios, 2004). However, in the method here

proposed, the bandwidth is obtained using a proce-

dure similar to the one proposed in (Mittal and Para-

gios, 2004), and is calculated as the covariance of the

data within the M ×M window (h = Σ). By doing this,

areas with larger uncertainty will be given less weigh-

tage in the pdf function. To minimize the inﬂuence of

the size of this M ×M window, a pyramidal image

structure is used, where multi-scaled subsampling is

preformed on the probabilistic data. It is important to

state that the KDE’s bandwidth and probabilities are

calculated for each level of the pyramidal structure in

order to prevent over or under smoothing. The proba-

bilities of a pixel being shadow or foreground are then

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

152

analyzed in a logarithmic scale and the pixel’s label is

ﬁnally ascertained.

5.1.1 PKDE Applied to the Results of the Weak

Classiﬁer

Using the CNCC classiﬁer’s results, a pixel can be

identiﬁed as shadow when the estimated CNCC is

larger than a given threshold. This threshold can

be empirically set accordingly to the desired shadow

detection rate, i.e. a high threshold will sub-detect

shadow, while a low one will over detect it. However,

threshold driven classiﬁers are bound to lead to mis-

classiﬁcations, such as the ones represented in Fig-

ure 2. To prevent these erroneous classiﬁcations from

corrupting the statistical models, the pixel spatial de-

pendencies are analyzed. The result of applying the

PKDE technique to the results of the CNCC weak

classiﬁer are exempliﬁed in Figure 5.

Figure 5: Examples of this weak classiﬁer’s

(CNCC+PKDE) results.

5.1.2 PKDE Applied to the Results of the Strong

Classiﬁer

To improve the results of the strong classiﬁer, the

PKDE (presented in subsection 5.1) was employed.

Figure 6 shows an example of the outcome produced

by this method.

Figure 6: (a) Strong classiﬁer Results. (b) PKDE applied to

the strong classiﬁer results.

5.2 Markov Random Fields (MRF)

Segmentation is a typical vision problem that can

be naturally expressed in terms of energy minimiza-

tion. More speciﬁcally, problems that require the es-

timation of spatially varying quantity (intensity, tex-

ture) from noisy measurements, can be formulated

in a Bayesian framework using MRF. Spatial con-

text is an important constraint when making decisions

about a pixel’s label, i.e., a pixel’s label is not inde-

pendent of the pixels neighborhood labels. So, in-

stead of using only likelihood information of mod-

els, MAP (maximum a posteriori)-MRF framework

provides a convenient prior for modelling this spa-

tial interaction. This fact allows a global inference

to be made, using local information, since labels con-

ditional independence rarely exists between proximal

sites. The objective is to assign a binary label l

from

the set l

∈

{

f oreground,shadow

}

to each of the sites

p ∈ P, where P is the set of segmented pixels, and

L =

{

|p ∈ P

}

is the global labelling ﬁeld of random

variables (or conﬁguration of the ﬁeld). The goal is to

ﬁnd a L conﬁguration, which minimize a energy func-

tion. In our case the function considered belongs to

class of energy functions, deﬁned in (Kolmogorov

and Zabih, 2004) as a sum of function of up to two bi-

nary variables at a time. Seeing as it satisﬁes the con-

ditions proposed in (Kolmogorov and Zabih, 2004)

the optimization of L can be achivied by ﬁnding the

minimum cut of a capacitated graph. First order MRF

energy function can be decomposed as follows:

E(L) =

∑

p∈P

) +

∑

q∈N

p,q

)]

(10)

where D

) is the term derived from the ob-

served data that measures the cost of assigning the

label l

to the pixel p (evaluates the likelihood of each

pixel belonging to one of the two classes), V

p,q

)

measures the cost of assigning the labels l

, l

the adjacent pixeis p, q, and is used to impose spa-

tial smoothness (spatial coherence of labels throught

a pairwise interaction MRF prior, by penalizing dis-

continuities between neighboring pixels), and N

the set of interacting pairs of pixeis (eight-connected

neighboring). In a real time system, the computa-

tion time is a crucial factor, Greig (Greig et al., 1989)

showed that the MAP solution of a two label pairwise

MRF can be efﬁciently obtained in polynomial time

by ﬁnding the st-mincut on the equivalent graph, pro-

viding an exact global optimial solution. The prior

takes the form of the Ising model, a particular case of

the generalized Potts model, for two label problems.

The piecewise constant smoothness prior is used to

stress spatial context, by assigning penalties for la-

bel discontinuities between neighboring pixels. The

penalty used does not depend on the assigned labels,

as long as they are different, and is spatially invariant.

The data cost term D

), is deﬁned as the negative

log likelihood of a pixel p belonging to foreground or

shadow class. In the following sections, we will show

how to determine these likelihoods, when MRF is ap-

plied to the weak or to the strong classiﬁer. Graph

cut techniques from combinatorial optimization can

be used to ﬁnd the global minimum for a multidi-

mensional energy function. MAP estimate of a MRF

can be obtained by solving a multiway minimum cut

problem on a graph. The minimum s/t cut prob-

lem can be solved by ﬁnding a maximum ﬂow from

SHADOW MODELING AND DETECTION FOR ROBUST FOREGROUND SEGMENTATION IN HIGHWAY

SCENARIOS

153

the source s to the sink t (Boykov and Kolmogorov,

2004), so energy function of equation 10 can be efﬁ-

ciently minimized by the graph cut algorithms. The

minimum cut of the graph can be computed through

a variety of approaches, like the Ford-Fulkerson al-

gorithm (Ford and Fulkerson, 1962), but in our case

we performed the cut using the min-cut/maxﬂow al-

gorithm, based on augmenting paths formulated by

Kolmogorov (Boykov and Kolmogorov, 2004).

5.2.1 MRF Applied to the Results of the Weak

Classiﬁer

The CNCC weak classiﬁer provides results that can

be taken as independent probabilistic measurements

for the pixel’s label. Thus, these results can be used

as the likelihood of a pixel’s label in the MRF energy

function. In this case, the procedure does not require

the deﬁnition of thresholds. Figure 7 shows several

examples of the application of this technique.

Figure 7: (a) Weak CNCC Classiﬁer results. (b) Weak Clas-

siﬁer + MRF results.

5.2.2 MRF Applied to the Results of the Strong

Classiﬁer

To apply the MRF approach, the sum of normal-

ized conﬁdences of layers with which the sample lies

within the conﬁdence interval, is used as the likeli-

hood of the pixel’s label. The higher this sum presents

itself, the more likely the pixel is indeed a shadow

pixel and therefore, it provides a spatial independent

probabilistic measurement for that pixel’s label. Fig-

ure 8 shows several results of applying this techniques

to the results of the model driven shadow classiﬁer.

Figure 8: (a) Strong Classiﬁer results. (b) Strong Classiﬁer

+ MRF results.

6 RESULTS

To analyze the effectiveness of this method, ground

truth shadow and foreground pixels were identiﬁed

on a sequence of 450 frames of a highway scenario.

Table 1: η and ξ rates of several weak classiﬁer methods for

three different thresholds.

Thresholds

0.3 0.5 0.7

Methods η ξ η ξ η ξ

CNCC 88.0 74.0 76.7 90.6 54.3 97.9

CNCC +KDE 88.1 75.6 76.8 91.3 53.9 98.2

CNCC +PKDE 83.6 88.9 71.6 97.6 41.6 99.7

CNCC +MRF 78.3 96.6 78.3 96.6 78.3 96.6

Not all vehicles were identiﬁed, only the ones on the

lanes closest to the surveillance camera. In this sec-

tion several results of applying these methods to this

sequence are presented. To evaluate the accurateness

of each method, the metrics presented in (Prati et al.,

2003) are used to estimate the rates of false positives

and negatives. More precisely, to measure the number

of false negatives (i.e. shadow pixels wrongly clas-

siﬁed as foreground) the shadow detection rate η is

estimated and to measure the false positives (i.e. fore-

ground pixels classiﬁed as shadow) the shadow dis-

crimination rate ξ is calculated, using the following

expressions:

η =

T P

+FN

ξ =

T P

+FN

(11)

where, T P is the number of true positives (pixels cor-

rectly classiﬁed), FN is the number of false nega-

tives, T P

is the number of ground-truth points of

the foreground objects minus the number of points

detected as shadows, but belonging to foreground ob-

jects, while F and S represent foreground and shadow,

respectively.

This section is composed of two main parts, where the

ﬁrst presents a quantitative and qualitative analysis of

the performance of the weak shadow classiﬁer. In the

second part, the same analysis is performed for the

results obtained by the overall process for foreground

segmentation and shadow detection.

6.1 Weak Shadow Classiﬁer Results

In this subsection, the performance of four indepen-

dent weak shadow classiﬁer methods are compared.

The ﬁrst is the result of estimating each pixel’s CNCC

and classifying it as shadow if this value is above a

pre-deﬁned threshold. The second method employs a

kernel density estimator (KDE) to the results of this

classiﬁer, while the third applies the PKDE approach.

The last uses the estimated CNCC in a MRF frame-

work. To analyze the outcome of these methods, three

different thresholds where applied and their perfor-

mances compared.

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

154

Table 2: η and ξ rates obtained by taking or not into account

thresholds (C

min

and C

∑

min

Strong Classiﬁer

η ξ

No thresholds 94.4 69.4

Thresholds 87.7 80.1

The weak classiﬁer’s main goal is to correctly

identify as many shadow pixels as possible. There-

fore, the aim is to achieve a high shadow discrim-

ination rate, ξ, seeing as it indicates a low number

of false positives. The shadow detection rate, η, is

as well important seeing as if it is low, the shadow

models will take too long to converge. Analyzing

the results obtained for the threshold driven methods

(CNCC, KDE, PKDE) presented in Table 1, it is pos-

sible to see a clear improvement using the KDE ap-

proach and an even higher ξ using the PKDE tech-

nique. However, the shadow detection rate is clearly

low in this last method (41.59%). It is important

to state that these methods are threshold driven and

their performances are clearly dependent of the de-

ﬁned threshold. Therefore, the wisest choice for this

weak classiﬁer is the MRF approach, seeing as its efﬁ-

ciency does not rely on the chosen threshold and its ξ

and η are quite high (96.6% and 78.3% respectively).

6.2 Foreground/Shadow Segmentation

Results

The main goal of a good shadow classiﬁer is too

achieve high ξ and η. The results presented in this

section were achieved throughout the ﬁnal 100 frames

of the test sequence, seeing as the ﬁrst 350 where

used to estimate the multi-layered shadow models.

This was done in order to correctly evaluate the im-

pact of using these models in the shadow classiﬁca-

tion process. Due to all the reasons explained in sub-

section 6.1, the chosen weak classiﬁer is the MRF

approach. Table 2 presents the results obtained by

the foreground segmentation process (Strong Classi-

ﬁer which corresponds to algorithm 2) when thresh-

olds (C

min

and C

∑

min

) are taken into account.

The performance of this classiﬁer improves sig-

niﬁcantly by introducing these thresholds, seeing as ξ

increases, which indicates a lower percentage of fore-

ground pixels identiﬁed as shadow. The shadow de-

tection rate η decreases slightly but is, nevertheless,

still quite high (87.7%).

Introducing spatial context to the results of this strong

classiﬁer, several misclassiﬁcations are going to be

corrected, namely, false positives induced by pixels

belonging to vehicles that present similar color infor-

mation as the shadow models. Table 6.3 presents the

Table 3: η and ξ rates for the Strong Classiﬁer methods

using the last 100 frames.

Methods η ξ

Strong Classiﬁer 87.68 77.06

Strong Classiﬁer+KDE 89.10 79.65

Strong Classiﬁer+PKDE 83.01 94.34

Strong Classiﬁer+MRF 91.91 84.50

results of independently applying the KDE technique,

as well as the PKDE estimator and yet, the MRF ap-

proach to the results of the strong classiﬁer. Examin-

ing this table, it is possible to see that employing the

PKDE and MRF techniques to the results of the strong

classiﬁer, ξ increases and therefore, the percentage

of false positives is reduced signiﬁcantly. Comparing

more thoroughly the results of both these methods, it

is possible to see that the PKDE procedure presents

a higher ξ (lower percentage of false positives) but

a lower η when compared with the MRF approach.

Figure 9 shows an example of this behavior.

(a) PKDE approach (b) MRF approach

Figure 9: Results of Strong Classiﬁer associated with

PKDE and MRF approaches.

However, both methods present good detection

rates and can be efﬁciently applied to induce spatial

context to the results given by the strong classiﬁer.

Nonetheless, a qualitative analysis of both methods

indicates that the MRF approach preforms a more ac-

curate spatial contextualization than the PKDE tech-

nique.

To support this claim, a quantitative interpretation

was made by applying the Distance Transform to im-

ages containing the ground truth and classiﬁed fore-

ground and shadow pixels. The distance transform

sets each image pixel as the distance to the nearest

boundary pixel. Therefore false positives located in

the outmost regions of the blob will possess lower

distance transform values seeing as these are closer

to the nearest boundary pixel. By comparing these

values and the one obtained for the ground truth, it

is possible to ascertain an estimate on the errors of

the spatial contextualization made by each method.

Table 6.4 presents the shadow detection and shadow

discrimination rates obtained using the equations in

expression 11, but considering the distance transform

values instead of the actual number of true positives

or false negatives. As expected, η and ξ of the MRF

SHADOW MODELING AND DETECTION FOR ROBUST FOREGROUND SEGMENTATION IN HIGHWAY

SCENARIOS

155

Table 4: Spatial context efﬁciency analysis of the strong

classiﬁer techniques (using the Distance Transform).

Methods η ξ

Strong Classiﬁer 93.05 77.17

Strong Classiﬁer+KDE 94.27 80.30

Strong Classiﬁer+PKDE 88.36 95.69

Strong Classiﬁer+MRF 95.90 93.95

(a) (b)

Figure 10: (a) Weak classiﬁer + MRF results. (b) Strong

Classiﬁer + MRF results.

technique are quite high.

Nevertheless, the PKDE is computationally less

expensive than the MRF process, which is an impor-

tant factor in a real time surveillance system. Looking

at the results given by the weak classiﬁer (present in

Table 1), one question may arise: what is the need of

statistical shadow models, seeing as this classiﬁer al-

ready presents acceptable results? Yet, it is possible

to see that using a strong classiﬁer aided by the MRF

technique, clearly improves the shadow detection rate

η, which means that the percentage of false negatives

is signiﬁcantly reduced. Although the number of false

positives increased, these are mainly located on the

borders of the blobs and do not signiﬁcantly deterio-

rate the vehicle segmentation process. An example of

this behavior is shown in Figure 10 where, due to the

non identiﬁcation of shadow, it is possible to see that

the leftmost vehicle’s area is remarkably larger, when

employing only the weak classiﬁer.

The Distance transform was applied to carry out

the same spatial context analysis performed previ-

ously. Looking at the results presented in Table 6.5,

it is possible to conclude that this technique identiﬁes

shadow more accurately, seeing as, in a global mat-

ter, the weak classiﬁer identiﬁes less shadow which

can lead to discontinuous blobs or blobs with consid-

erably larger areas.

7 CONCLUSIONS

This paper presented an automatic method to iden-

tify cast vehicle shadow. This method pre-identiﬁes

shadow pixels and uses their color information to de-

velop multi-layered statistical models that describe

the RGB appearance of shadow. These shadow

Table 5: Spatial context efﬁciency analysis of the strong

classiﬁer and weak classiﬁer techniques (using the Distance

Transform).

Methods η ξ

Weak Classiﬁer + MRF approach 90.40 98.20

Strong Classiﬁer + MRF approach 95.90 93.95

models were used to correctly label shadow pixels.

To overcome misclassiﬁcation, two independent pro-

cesses (PKDE and MRF), that induce spatial con-

text to the results of the classiﬁers, were employed

and compared. The PKDE technique presented ac-

ceptable results and is computationally less expen-

sive than the MRF approach. Nevertheless, its re-

sults proved to be slightly poorer than the MRF ap-

proach, which is also threshold invariant and there-

fore a more reliable approach. In order to employ this

technique in a real time highway surveillance system,

multiprocessing systems were used in its implementa-

tion. The method was thoroughly tested on a highway

scenario sequence, where the ground truth foreground

and shadow pixels were identiﬁed. The obtained re-

sults are quite satisfactory (91.91% shadow detection

rate and 84.5% shadow discrimination rate).

REFERENCES

Bishop, C. M. (2006). Pattern Recognition in Machine

Learning. Springer.

Boykov, Y. and Kolmogorov, V. (2004). An experi-

mental comparison of min-cut/max- ﬂow algorithms

for energy minimization in vision. Pattern Analy-

sis and Machine Intelligence, IEEE Transactions on,

26(9):1124–1137.

Cavallaro, A., Salvador, E., and Ebrahimi, T. (2005).

Shadow-aware object-based video processing. IEEE

Proceedings - Vision, Image and Signal Processing,

152(4):398–406.

Cucchiara, R., Grana, C., Piccardi, M., and Prati, A. (2003).

Detecting moving objects, ghosts, and shadows in

video streams. IEEE Transactions on Pattern Anal-

ysis and Machine Intelligence, 25(10):1337–1342.

Ford, L. R. and Fulkerson, D. R. (1962). Flows in Networks.

Princeton University Press.

Greig, D. M., Porteous, B. T., and Seheult, A. H. (1989).

Exact maximum a posteriori estimation for binary im-

ages. Royal Statistical Soc., Series B, 51:271–279.

Grest, D., michael Frahm, J., and Koch, R. (2003). A color

similarity measure for robust shadow removal in real

time. In Proc. of Vision, Modeling, and Visualization

(VMV).

Horprasert, T., Harwood, D., and Davis, L. S. (1999). A sta-

tistical approach for real-time robust background sub-

traction and shadow detection. In ICCV Frame-Rate

WS.

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

156

Huang, J. B. and Chen, C. S. (2008). Learning Mov-

ing Cast Shadows for Foreground Detection. In The

Eighth International Workshop on Visual Surveillance

- VS2008, Marseille France. Graeme Jones and Tieniu

Tan and Steve Maybank and Dimitrios Makris.

Kolmogorov, V. and Zabih, R. (2004). What energy func-

tions can be minimized via graph cuts? IEEE Trans-

actions on Pattern Analysis and Machine Intelligence,

26:147–159.

Liu, Z., Huang, K., Tan, T., and Wang, L. (2007). Cast

shadow removal combining local and global fea-

tures. Computer Vision and Pattern Recognition,

IEEE Computer Society Conference on, 0:1–8.

Martel Brisson, N. and Zaccarin, A. (2007). Learning

and removing cast shadows through a multidistribu-

tion approach. IEEE Trans. Pattern Anal. Mach. In-

tell., 29(7):1133–1146.

Mittal, A. and Paragios, N. (2004). Motion-based back-

ground subtraction using adaptive kernel density esti-

mation. volume 2, pages II–302–II–309 Vol.2.

Monteiro, G., Marcos, J., Ribeiro, M., and Batista, J.

(2008). Robust segmentation for outdoor trafﬁc

surveillance. In ICIP, pages 2652–2655.

Porikli, F. and Thornton, J. (2005). Shadow ﬂow: A recur-

sive method to learn moving cast shadows. Computer

Vision, IEEE International Conference on, 1:891–

898.

Porikli, F. and Tuzel, O. (2005). Bayesian background mod-

eling for foreground detection. In ACM International

Workshop on Video Surveillance and Sensor Networks

(VSSN), pages 55–28.

Prati, A., Mikic, I., Trivedi, M. M., and Cucchiara, R.

(2003). Detecting moving shadows: Algorithms and

evaluation. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 25(7):918–923.

Wand, M. P. and Jones, M. C. (1994). Kernel Smoothing

(Monographs on Statistics and Applied Probability).

Chapman & Hall/CRC.

Zhong, J. and Sclaroff, S. (2003). Segmenting foreground

objects from a dynamic textured background via a ro-

bust kalman ﬁlter. Computer Vision, IEEE Interna-

tional Conference on, 1:44.

SHADOW MODELING AND DETECTION FOR ROBUST FOREGROUND SEGMENTATION IN HIGHWAY

SCENARIOS

157