AN ARTICULATED MODEL WITH A KALMAN FILTER FOR REAL

TIME VISUAL TRACKING

Application to the Tracking of Pedestrians with a Monocular Camera

Youssef Rouchdy

CEREMADE, Univerist´e Paris Dauphine, Place du Mar´echal De Lattre De Tassigny, 75775 Paris Cedex 16, France

Keywords:

Visual tracking, Pedestrian tracking, Articulated models, Kalman ﬁlter, Sequential ﬁltering.

Abstract:

This work presents a method for the visual tracking of articulated targets in image sequences in real time.

Each part of the target object is considered as a region of interest and tracked by a parametric transformation.

Prior geometric and dynamic informations about the target are introduced with a Kalman ﬁlter to guide the

evolution of the tracking process of regions. An articulated model with two areas is proposed and applied to

track pedestrians in the urban image sequences.

1 INTRODUCTION

The tracking approaches can be distinguished by sev-

eral criteria, for example:

1. • 2D approach without an explicit shape model,

• 2D approach with an explicit shape model,

• 3D approaches.

2. • tracking of primitives

• tracking of a region of interest (ROI)

3. • deterministic approach

• probabilistic approach

• classiﬁcation approach

The choice of the tracking approach depends on the

application:

• the target objects are rigid or not,

• the camera used is monocular, stereo, ﬁxed, mo-

bile,

• the precision and the computing time required for

the application.

The aim of this work is to develop a real-time al-

gorithm that allows the tracking of a deformable and

articulated target in an image sequence acquired by a

mobile mono-camera. The principal application is the

tracking of pedestrians in an urban environment and

to warn the driver should a pedestrian move into the

security area (see ﬁgure 1) around the vehicle.

This application is difﬁcult to achieve for several

reasons: the camera is mounted on the vehicle (so a

Figure 1: The security area is deﬁned by the red (dark) re-

gion, where Vp is the pedestrian velocity and Vh is the ve-

hicle velocity.

simple background subtraction does not apply), oc-

clusions are frequent and real time computing is re-

quired. Also, the appearence and resolution of the

target object -e.g. the pedestrian- change due to de-

formation of clothes, changes in the posture and the

motion of the camera. Figure 2 gives an idea of the

variation of the resolution. In the diagrams the width

and height (in pixels) of the window that contains the

pedestrian in the image are plotted against the dis-

tance of the camera.

The most popular techniques for the estimation of

motion are based on the parametric model (Bergen

et al., 1992), which is adapted to real time tracking.

These techniques model the motion of a ROI in an

image for example by the afﬁne (6 DOF) or the

homographic (8 DOF) transformation. Problems

occur when the motion model that is used does

686

Rouchdy Y. (2008).

AN ARTICULATED MODEL WITH A KALMAN FILTER FOR REAL TIME VISUAL TRACKING - Application to the Tracking of Pedestrians with a

Monocular Camera.

In Proceedings of the Third International Conference on Computer Vision Theory and Applications, pages 686-693

DOI: 10.5220/0001089706860693

 SciTePress

0 5 10 15 20 25 30 35 40

500

1000

1500

Height

Distance camera/pedestrian (meters)

Projection of the bounding window (pixels)

First pedestrian

Second pedestrian

0 5 10 15 20 25 30 35 40

100

150

200

250

300

350

400

450

500

Width

Distance camera/pedestrian (meters)

Projection of the bounding window width (pixels)

First pedestrian

Second pedestrian

Figure 2: Dimensions of pedestrian in the image (in pix-

els) according to the variation of the distance between the

camera and a pedestrian.

not describe well the motion of the ROI. In (Weiss

and Adelson, 1996), the motion is segmented into

independent multiple motion areas. This method is

problematic when some of the areas do not contain a

high enough number of pixels or when they contain

noise. Others have introduced a constraint. The target

segments are connected to each other by joints. In

(Murray et al., 1994; Bregler et al., 2004), the twist

and product exponential map are used to introduce

constraints. In (Gavrila and Davis, 1995; Kakadiaris

and Metaxas, 1996), the human body is modeled by

rigid segments which are connected by joints. See

(Gavrila, 1999) for a comprehensive bibliography of

approaches used to analyse human motion, and see

(Zhang et al., 2006) for an analysis of motion with an

articulated model.

In this work, we model the target by rigid seg-

ments which are connected by a priori informations

representing the joints. The constraints are intro-

duced as a priori informations using a Kalman ﬁlter.

This permits a higher reach connection between the

multiple areas without increasing the DOF of the

system. The contribution of each segment and of

each joint is regulated by the covariance matrices of

the Kalman Filter. So the Kalman ﬁlter proposed

introduces a priori geometric (e.g. the connection

between the multiple areas) and dynamic (constant

velocity) informations about the target. Furthermore,

the Kalman ﬁlter smooths the trajectory of the target.

In the next section, a different approach to model the

dynamic evolution in the context of visual tracking

with sequential ﬁltering is presented. Subsequently,

we will focus on the Kalman ﬁltering. In section

3, an articulated model based on a Kalman ﬁltering

is proposed. In section 4, an articulated model is

proposed to track a pedestrian in image sequences

and experimental results are given.

2 SEQUENTIAL FILTERING AND

TRACKING IN IMAGE

SEQUENCES

In this section, an introduction of the use of the se-

quential ﬁltering in the context of the visual tracking

is given. The measurements acquired up to framet are

denoted Y

and X

represents the conﬁguration of the

target objects at the time t. The process {X

; t ∈ N}

is modeled as a Markov process of initial distribu-

tion p(X

) and transition equation p(X

t−1

). The

observations {Y

; t ∈ N} are assumed to be condition-

ally independent given the process {X

; t ∈ N} and

of marginal distribution p(Y

). The principle of se-

quential ﬁltering is to apply Bayes’s theorem at each

time-step, obtaining a posteriori p(X

) based on all

available information:

p(X

) =

p(Y

)p(X

t−1

)

p(Y

)

(1)

where we can write p(Y

) instead of p(Y

t−1

)

due to the conditional independence assumption. Ac-

cording to custom in ﬁltering theory, a model for the

expected motion between time-steps is adopted. This

takes the form of a conditional probability distribution

p(X

t−1

) termed the dynamics. Using the dynamics

equation, (1) becomes

p(X

) =

p(Y

)

p(X

t−1

)p(X

t−1

)dX

t−1

p(Y

)

(2)

It is assumed that the predicted values of the states

and the observations, X

and Y

, respectively, evolve in

time according to:

= f

t−1

), (3)

= g

). (4)

where f

and g

are the state and the observation func-

tions, respectively, which are supposed to be known.

The state noise V

and the measurement noise W

have

known distributions.

In visual tracking the choice of the dynamical

model p(X

t−1

) depends on the type of the images,

the a priori information available and the applica-

tion. Typically, the elasticity model is used to track

the elastic structure (Rouchdy et al., 2007) and the

Navier-Stockes model is used to track a ﬂuid struc-

ture (Cuzol et al., 2007). When a good a priori infor-

mation is available the prediction can be introduced

by learning (Blake et al., 1999). The most popular

dynamical model used is autoregressive (Black and

Fleet, 1999; Perez et al., 2002) and corresponds in the

AN ARTICULATED MODEL WITH A KALMAN FILTER FOR REAL TIME VISUAL TRACKING - Application to the

Tracking of Pedestrians with a Monocular Camera

687

ﬁrst order to the model of constant velocity, which is

the model adopted in this work.

The choice of the observations depends on the ap-

plication and the image and can be subjective in some

cases. The cues usually used are edge information

(Blake and Isard, 1998) and color distributions (Perez

et al., 2002). There exists also a model based on mo-

tion and appearance (Sidenbladh and Black, 2003). In

(Sidenbladh and Black, 2003), several cues are com-

bined to make the model robust to a change of appear-

ance.

For a nonlinear system (e.g. the function f

or g

in equations (3) is nonlinear) the probability p(X

)

is approximated by a Monte Carlo (MC) method. Un-

fortunately, the classical sampling for the MC method

is guaranteed to fail as time increases. To deal with

this problem a step of selection is added -(Gordon,

1993) gives the ﬁrst operationally effective method.

Theoritical convergence results of this algorithm are

given in (Del Moral, 1997). A good reference and co-

herent treatement of these techniques including con-

vergence results and applications to visual tracking

are presented in (Doucet et al., 2002).

When the observation density p(Y

) is assumed

to be Gaussian and the dynamics are assumed lin-

ear with additive Gaussian noise the solution is ob-

tained analytically and this method corresponds to the

Kalman ﬁlter. In this case the dynamical system is

written as:

= A

t−1

+ B

, (5)

= C

+ D

, (6)

where A

, B

, C

and D

are matrices and V

t−1

, W

t−1

are vectors of i.i.d standard normal variants. The

state noise V

and the measurement noise W

are sup-

posed to be Gaussian and independent with the matri-

ces covariances Q

and R

, respectively. In this case,

p(X

t−1

), p(Y

) and p(X

) have a Gaussian dis-

tribution with the covariance matrices Q

, R

and Γ

respectively. Where the covariance matrix Γ

and the

estimation of the vector state X

are computed recur-

sively with the Kalman ﬁlter, the Kalman recursion is

given in section 3.5 and documented in (Kalman and

Bucy, 1961).

3 ARTICULATED MODEL BASED

ON KALMAN FILTER

3.1 Motion Estimation

The image is a projection of 3D points of the space on

an image plane. Let a rectangle that moves in the 3D

space be such that the deformationsin the image plane

are described by a rigid transformation. Let I

be the

image of this object at the time t

and let I

be the

image at t

. In (Faugeras et al., 2001), it is shown that

the points on the rectangle of the two frames are re-

lated by a homographic transformation and that they

are deﬁned by eight parameters. Subsequently, it is

supposed that the deformation of a target is obtained

by a homographic transformation. Otherwise, the ob-

ject is approximated by a set of rigid links. We re-

strict ourselves to this type of motion to reduce the

complexity.

3.2 Modelisation and Predictions

Let {R

}

l=1

be a set of supposed rigid areas of an

articulated target, let c

be the coordinates of the

barycenter of the region R

, and let s

be the surface

of R

. The elements of the set {R

}

l=N

are correlated

by their barycenters with the relations

, ··· , c

, s

, ··· , s

) = d

; . . . ;

, ··· , c

, s

, ··· , s

) = d

(7)

where m is the number of the constraint functions ψ.

These constraints are introduced as a priori informa-

tions with the Kalman ﬁlter. An example of the track-

ing of a pedestrian in an image sequence using two

correlated areas is given in section 4.

The constraintes (7) are supposed to be linear. If

this is not the case, they can be linearized. The con-

straints are introduced into the dynamic system of the

Kalman ﬁlter. The constraints (7) are introduced in

the ﬁlter with a function g

= (g

, ··· , g

). At the time

t, the state vector X

is deﬁned by



, ··· , c

, v

, ··· , v

, s

, ··· , s

, g

, ··· , g



and follows the state equations:

= AX

t−1

+ BV

t−1

, (8)

= CX

+ DW

t−1

, (9)

where A, B,C and D are ﬁxed matrices and V

t−1

, W

t−1

are vectors of i.i.d standard normal variants. The ma-

trix A introduces dynamic, e.g. constant velocity, and

geometric, e.g. the correlation between the areas, a

priori informations about the target.

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

688

3.3 Measurement

For each set R

, the estimation of the parameters of the

transformation model is achieved by the minimization

problem:

= arg min

λ∈R

∑

p∈R



(φ

(λ, p)) − I

(p)



(10)

where I

is the reference image, I

the current image,

are the parametric transformations determined by

the parameters λ

for each l ∈ {1, ··· , N} and where

α is the parameters number. The measurement vector

is computed from the transformations φ

and deﬁned

mes



, ··· , z

, v

, ··· , v

, s

, ··· , s

, g

, ··· , g



where z

and s

are the barycenters and the surfaces,

respectively, of the areas φ

(λ

, R

). From z

and s

compute the quantity g

for each (i, l) ∈ {1, ··· ,m} ×

{1, ··· , N}. The image motion of the point z

i,l

from

time t − 1 to time t is :

− z

t−1

∆t

The minimization of the problem (10) is achieved by

the ESM algorithm, see (Malis, 2004; Benhimane and

Malis, 2004).

3.4 Initialization

Using the ﬁrst reference image, the user segments the

target manually into a set of areas. The surface of the

areas, the distance between the barycenters of the ar-

eas and other geometric characteristics are computed

from the initial set of areas to initialize the Kalman

ﬁlter.

3.5 Filtering

This step allow us to introduce a goodness of ﬁt crite-

rion between a reference template and possible candi-

dates in the current image. If the additional noise v

supposed to be Gaussian, then the observation density

p(Y

), associated to the prediction and measure-

ment described in the previous section, has a Gaussian

distribution with a Covariance matrix error R. The

density p(X

) has also a Gaussian distribution with

a covariance matrix error Q. So the estimation of the

state is computed by a Kalman ﬁlter with the rela-

tions:

• initialization

– X

, P

, R and Q are given

• prediction

–

= AX

t−1

–

= C

–

= AP

t−1

∗

+ Q

• ﬁltering

– K

C(C

·C + R)

−1

– X

+ K

mes

−

– P

= (1− K

is called the gain. The difference Y

mes

−

called the measurement innovation. The innovation

reﬂects the discrepancy between the predicted mea-

surement C

and the actual measurement Z

mes

. Let

Q =





0 0

0 Q

0 0 Q





, R =





0 0

0 R

0 0 R





The covariance matrix (Q

, R

), (Q

, R

) and

, R

) are associated to the barycenters, the surface

and the constraints, respectively.

4 APPLICATION TO TRACK A

PERSON

4.1 Measurements and Predictions

To track a pedestrian in an image sequencee, we track

the head and the torse which are supposed to be con-

nected: the head stays close to the torse. Two areas

and F

are used, one coresponding to the head and

other to the torse.

4.1.1 Method 1

The vector state X

is deﬁned by the centers of the

windows F

and F

and by their velocity v

and v

respectively:

= (c

, v

, c

, v

)

The prediction matrices are deﬁned by

A =







1 0 ∆t 0 0 0 0 0

0 1 0 ∆t 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0

1 0 ∆t 0 0 0 0 0

0 0 0 0 0 1 0 ∆t

0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 1







AN ARTICULATED MODEL WITH A KALMAN FILTER FOR REAL TIME VISUAL TRACKING - Application to the

Tracking of Pedestrians with a Monocular Camera

689

C =







1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0

0 0 0 0 1 0 0 0

0 0 0 0 0 1 0 0

0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 1







These predicions introduce the a priori informa-

tion that the ﬁrst component of the window center c

is equal to the ﬁrst component of c

. Figure 3 gives

the variations of the predicted value of the ﬁrst com-

ponent of one window when the ﬁrst component of

the other window is ﬁxed to be zero. The amplitude

variation of the predicted value of the ﬁrst component

depends on the variance of the considered gaussian

noise.

0 200 400 600 800 1000

−4

−3

−2

−1

x 10

−3

Figure 3: Variation of the predicted value of the ﬁrst compe-

nents of one window when the ﬁrst component of the second

window is ﬁxed at zero.

4.1.2 Method 2

Here, the vector state is deﬁned by only one window

= (c

, v

, d

where d

is a vecor of R

. The prediction matrices are

deﬁned by

A =







1 0 ∆t 0 0 0

0 1 0 ∆t 0 0

0 0 1 0 0 0

0 0 0 1 0 0

0 0 0 0 1 0

0 0 0 0 0 1







C =







1 0 0 0 0 0

0 1 0 0 0 0

0 0 1 0 0 0

0 0 0 1 0 0

1 0 0 0 1 0

0 1 0 0 0 1







We remark that if

= c

− c

, (11)

the observations Y

correspond to the barycenters of

the zones targeted. So the initial values of the vector

permit to introduce a constraint on the evolution of

the two windowsin the Kalman ﬁlter. The coordinates

of the barycenter of the window F

are deduced from

the estimation of d

and c

by the relation

= d

+ c

The initial value of the vector d

is computed from

the initial barycenters of the windows F

and F

4.2 Initialization of the Optimization

Algorithm

The user chooses the initial size of the windows F

and F

manually The window size is updated with the

estimated values of d

with the relations:

l,x

= L

l,x

t−1

||d

t−1

, L

l,y

= L

l,y

t−1

||d

t−1

where L

l,x

is the length of the window l and L

l,y

is the

width of the window F

at the time t. The size of the

windows and their barycenters are used to initialize

the values of the transformations for the next step of

tracking. The initialization of the transformation for

the minimization routine ESM is achieved by using

the results of ﬁltering to construct new homographic

transformations deﬁned by

l,0







1 0 min



l,x

−

l,x

, c

l,x



0 1 min



l,y

, c

l,y

−

l,y



0 0 1







We note that, these homographies are computed from

the estimation at the previous iteration of the window

center. The update of the transformation by the trans-

formation H

l,0

gives more stable results than when

the initialization is performed with the transforma-

tions that we measured directely at the previous step.

The necessity to update the transformation is due to

the non-rigid motion of the target and changes of their

appearence.

4.3 Change of Appearance and

Resolution

Where the change of appearence and resolution be-

comes very large due to the deformation of the

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

690

clothes, changes in the posture of the pedestrian and

the mobility of the camera, the update of the trans-

formations proposed in the previous section is insufﬁ-

cient. It is necessary to update the reference template

which is composed of the set of the targeted areas. In

this work, we have used the estimation of d

to update

the reference template. Indeed, when the condition



||d

|| − ||d

t−1



< Th

is not satisfyed, the set of areas is update from the

set of areas obtained at the previous iteration. This

criterion can be combined with another similarity

measurement that compars the reference and current

templates for example by using the sum-of-squared-

differences (SSD). In (Arnaud et al., 2004), the eigen-

values of the covariance error in tke Kalman ﬁlter is

used to deﬁne a threshold.

Intuitively, a more pertinent strategy to update the

reference template consists to accumulate the refer-

ence templates (Morency et al., 2003). The problem

with such an approach is to deﬁne a criterion of sim-

ilarity that can be applied to the reference data and

also the computing time can be expensive. In (Wu

and Huang, 2001), a particle ﬁlter is used to evolve

the reference template by a dynamical model. To deal

with the change of apparence, Headvig et al. (Siden-

bladh and Black, 2002) are use a learning approach

based on the cues edge, ridge and motion. The cues

are combined with a bayesian model.

4.4 Experiments

4.4.1 Data

We tested the proposed articulated models on data

from an urban trafﬁc environment. The two video se-

quences used were recorded by Renault in the con-

text of the LOVe project

by a SMAL camera from

CYPRESS company. The following table gives the

main characteristics of the used camera. Since our

model is adapted to a monocular camera, only one of

the two sets of images obtained by the stereo camera

were used.

The ﬁrst image sequence was aquired from an

immobile vehicle. The trajectory of the pedestrian

was perpendicular to the road. The second image se-

quence is acquired by a camera mounted on a moving

vehicle and shows pedestrians crossing the road.

http://www.love.univ-bpclermont.fr/

4.4.2 Results and Discussion

In the following section, some experimental results

obtained with the urban image sequences presented

in the last section are given to evaluate the articulated

model 4.1.1. We compare it to the results obtained

when only one area is trached with the ESM algo-

rithm. Figure 4 shows the result obtained by the ESM

algorithm: the tracker has lost the target at the sec-

ond image due to a non-rigid motion and due to the

background present in the initial area (pixels not be-

longing to the pedestrian). Figure 5 shows that the

articulated model has succeeded in tracking the tar-

get. These results were obtained without updating the

reference template.

Figure 6 shows the results of the tracking. The

pedestrian was correctly tracked until he made a 90

degree turn in relation to the camera in image t

2.0630s. The update of the reference template us-

ing the distance correlation between the barycenters

was insufﬁcient to deal with the large change of ap-

pearance in frame t

= 2.0630s when the pedestrian

changed his posture completely. It will be interesting

to combine the proposed update method to another

update strategy based on the measurement of similar-

ity, see section 4.3.

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

Figure 4: Results of the ESM tracking algorithm.

AN ARTICULATED MODEL WITH A KALMAN FILTER FOR REAL TIME VISUAL TRACKING - Application to the

Tracking of Pedestrians with a Monocular Camera

691

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

100 200 300 400 500 600

100

150

200

250

300

350

400

450

Figure 5: Results obtained with the ﬁrst articulated model.

5 CONCLUSIONS

In this work, we propose a method for the tracking of

an articulated target with a monocular camera using

the Kalman ﬁlter. The advantages of this approach

are: the joints are introduced as a priori information

and not as constraints -e.g. the a priori informations

can be left unsatisﬁed if the measurements (extracted

from the images) are more pertinent-, a real time im-

plementation is possible, it is easy to implement, it is

stable and it smooths the trajectory of the target. For a

large number of areas the algorithm can be easily par-

allelized: the measurement is computed separately for

each area. However, this model is sensitive to large

changes in appearence. Further work is necessary to

tackle this problem, some directions are proposed in

section 4.3.

ACKNOWLEDGEMENTS

We thank A. Doucet, E. Malis, D. M. Pierre and P.

Rives for stimulating and useful discussions. We are

very grateful to F. Solanet for providing the image se-

quences, the two video sequences used were recorded

100 200 300 400 500 600

100

150

200

250

300

350

400

450

= 0.4601s

100 200 300 400 500 600

100

150

200

250

300

350

400

450

= 0.6348s

100 200 300 400 500 600

100

150

200

250

300

350

400

450

= 0.9395s

100 200 300 400 500 600

100

150

200

250

300

350

400

450

= 1.0747s

100 200 300 400 500 600

100

150

200

250

300

350

400

450

= 1.2444s

100 200 300 400 500 600

100

150

200

250

300

350

400

450

= 1.8285s

100 200 300 400 500 600

100

150

200

250

300

350

400

450

= 1.8540s

100 200 300 400 500 600

100

150

200

250

300

350

400

450

= 2.0630s

Figure 6: Tracking with the ﬁrst articulated model ap-

plied to an an urban image sequence recorded with 30 im-

ages/seconde.

by Renault in the context of the LOVe project

REFERENCES

Arnaud, E., Memin, E., and Cernuschi-Frias, B. (2004).

Conditional ﬁlters for image sequence based tracking

- application to point tracker. In IEEE trans. On Im.

Proc.

Benhimane, S. and Malis, E. (2004). Real-time image-

based tracking of planes using efﬁcient second-order

minimization. In In IEEE/RSJ International Confer-

ence on Intelligent Robots Systems, Sendai, Japan,

October 2004.

Bergen, J. R., Anandan, P., Hanna, K. J., and Hingorani,

R. (1992). Hierarchical model-based motion estima-

tion. In ECCV ’92: Proceedings of the Second Euro-

pean Conference on Computer Vision, pages 237–252,

London, UK. (Springer-Verlag).

Black, M. J. and Fleet, D. J. (1999). Probabilistic detection

and tracking of motion discontinuities. In ICCV (1),

pages 551–558.

http://www.love.univ-bpclermont.fr/

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

692

Blake, A. and Isard, M. (1998). Active Contours. Springer,

Berlin Heidelberg New York.

Blake, A., North, B., and Isard, M. (1999). Learning multi-

class dynamics. Advances in Neural Information Pro-

cessing Systems, 11:389–395.

Bregler, C., Malik, J., and Pullen, K. (2004). Twist based

acquisition and tracking of animal and human kine-

matics. Int. J. Comput. Vision, 56(3):179–194.

Cuzol, A., Hellier, P., and Mmin, E. (2007). A low dimen-

sional ﬂuid motion estimator. Int. Journ. on Computer

Vision.

Del Moral, P. (1997). Nonlinear ﬁltering: interacting par-

ticle resolution. C. R. Acad. Sci. Paris S´er. I Math.,

325(6):653–658.

Doucet, A., de Freitas, N., and Gordon, N., editors

(2002). Sequential Monte Carlo Methods in Prac-

tice. Statistics for Engineering and Information Sci-

ence. Springer-Verlag, New York Berlin Heidelberg.

Faugeras, O., Luong, Q.-T., and Papadopoulou, T. (2001).

The Geometry of Multiple Images: The Laws That

Govern The Formation of Images of A Scene and Some

of Their Applications. MIT Press, Cambridge, MA,

USA.

Gavrila, D. M. (1999). The visual analysis of human move-

ment: A survey. Computer Vision and Image Under-

standing: CVIU, 73(1):82–98.

Gavrila, D. M. and Davis, L. S. (1995). Towards 3d model-

based tracking and recognition of human movement.

In Proc. of the IEEE International Workshop on Face

and Gesture Recognition, pages 272–277, Zurich,

Switzerland.

Gordon, N. (1993). Bayesian methods for tracking. PhD

thesis, University of London.

Kakadiaris, I. A. and Metaxas, D. (1996). Model-based es-

timation of 3D human motion with occlusion based

on active multi-viewpoint selection. In Proceedings

of the 1996 Conference on Computer Vision and Pat-

tern Recognition (CVPR ’96), page 81, Washington,

DC, USA.

Kalman, R. E. and Bucy, R. S. (1961). New results in linear

ﬁltering and prediction theory. Trans. ASME Ser. D. J.

Basic Engrg., 83:95–108.

Malis, E. (April 2004). Improving vision-based control us-

ing efﬁcient secondorder minimization techniques. In

ICRA’04, New Orleans.

Morency, L., Rahimi, A., and Darrell, T. (2003). Adaptive

view-based appearance models. In Proc. IEEE Conf.

on Comp. Vision and Pattern Recogn., pages 803–810.

Murray, R. M., Sastry, S. S., and Zexiang, L. (1994). A

Mathematical Introduction to Robotic Manipulation.

CRC Press, Inc., Boca Raton, FL, USA.

Perez, P., Hue, C., Vermaak, J., and Gangnet, M. (2002).

Color-based probabilistic tracking. In ECCV, number

2350 in LNCS, pages 661–675.

Rouchdy, Y., Pousin, J., Schaerer, J., and Clarysse, P.

(2007). A nonlinear elastic deformable template for

soft structure segmentation. Application to the heart

segmentation in MRI. Inverse Problems, 23:1017–

1035.

Sidenbladh, H. and Black, M. (2003). Learning the statistics

of people in images and video. Int. Journ. on Com-

puter Vision, 54(1-3):183–209.

Sidenbladh, H. and Black, M. J. (2002). Learning the statis-

tics of people in images and video. Int. Journal of

Computer Vision, 54.

Weiss, Y. and Adelson, E. H. (1996). A uniﬁed mixture

framework for motion segmentation: Incorporating

spatial coherence and estimating the number of mod-

els. In Proceedings of the 1996 Conference on Com-

puter Vision and Pattern Recognition (CVPR ’96),

page 321, Washington, DC, USA.

Wu, Y. and Huang, T. (2001). A co-inference approach to

robust visual tracking. In Proc. IEEE Conf. on Comp.

Vision, pages 26–33.

Zhang, X., Liu, Y., and Huang, T. S. (2006). Motion anal-

ysis of articulated objects from monocular images.

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 28(4):625–636.

AN ARTICULATED MODEL WITH A KALMAN FILTER FOR REAL TIME VISUAL TRACKING - Application to the

Tracking of Pedestrians with a Monocular Camera

693