Swapping-based Annealed Particle Filter with Occlusion Handling
for 3D Human Body Tracking
Xuan Son Nguyen
INRIA, Loria, Campus Scientifique, Vandœuvre-L
`
es-Nancy, France
Keywords:
Particle Filter, Human Body Tracking, Bayesian Network.
Abstract:
In this paper, we propose a new approach for 3D human body tracking. We first extend the idea of Swapping-
based Partitioned Sampling (SBPS), which was introduced by Dubuisson et al. for solving the articulated
object tracking problem in high dimensional state spaces. This extension aims to deal with self-occlusion and
constraints between parts of the human body, which are not taken into account in SBPS. We prove that, under
the same assumptions required by SBPS, the posterior distribution are correctly estimated in our framework.
We then introduce a new approach for 3D human body tracking, based on this new framework and Annealed
Particle Filter (APF). Experiments with multi-camera walking sequences from the HumanEva I dataset show
the efficiency of the proposed approach in terms of both accuracy and computation time.
1 INTRODUCTION
Tracking the human body with accuracy and within a
reasonable time is challenging due to the high com-
plexity of the problem to solve. Various approaches
have been proposed for this problem. One class of
approaches is known as optimization-based methods
(Deutscher and Reid, 2005; Zhang et al., 2010). Typ-
ically, these methods are based on the optimization of
an objective function corresponding to the matching
function between the model and the observed image
features.
Another way of reducing the dimensionality of
the configuration space is to use decomposition meth-
ods within the particle filter framework (MacCormick
and Isard, 2000; Rose et al., 2008). Particle Filter
(PF) (Gordon et al., 1993) has been shown to be an ef-
fective method for solving visual tracking problems.
This is due to its ability to deal with non-linear, non-
Gaussian and multimodal distributions encountered in
such problems. The key idea behind decomposition
methods is similar: decompose the state space of the
target object into a set of subspaces where particle
filter can be applied. Since the dimension of these
subspaces is smaller than that of the original state
space, sampling in these subspaces will be more ef-
ficient than sampling in the original space and there-
fore, fewer particles are needed to achieve a good per-
formance. Recently, a decomposition approach called
Swapping-based Partitioned Sampling (SBPS), which
is based on the state-of-the-art algorithm Partitioned
Sampling (PS) (MacCormick and Isard, 2000) for
tracking in high dimensional state spaces, has been in-
troduced in (Dubuisson et al., 2011; Dubuisson et al.,
2013). Under some assumptions, SBPS is guaranteed
to produce a correct estimation of the posterior dis-
tribution. However, one of the important assumptions
required by SBPS is that no self-occlusion occurs dur-
ing tracking. This assumption is often violated in real-
world problems, and the posterior distribution can be
poorly approximated by SBPS in such cases. An-
other disadvantage of SBPS is that it does not take
into account constraints between different parts of the
articulated object, which have been shown to be very
important in human tracking (Sigal et al., 2010). To
address these problems, we first introduce an exten-
sion of SBPS, which is more flexible than SBPS and
allows us to better estimate the posterior distribution
when self-occlusion is present. We then introduce a
new approach for 3D human body tracking, which is
a combination of the new framework and Annealed
Particle Filter (APF) (Deutscher and Reid, 2005).
The paper is organized as follows. In Section 2,
we give a brief introduction to PF, PS and SBPS. Sec-
tion 3 presents the proposed approach. Section 4 re-
ports the results of our experimental evaluation. Fi-
nally, Section 5 offers some conclusions and ideas for
future work.
576
Nguyen X..
Swapping-based Annealed Particle Filter with Occlusion Handling for 3D Human Body Tracking.
DOI: 10.5220/0004686905760583
In Proceedings of the 9th International Conference on Computer Vision Theory and Applications (VISAPP-2014), pages 576-583
ISBN: 978-989-758-009-3
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)
2 RELATED WORK
2.1 Particle Filter
In this paper, human tracking consists of estimat-
ing a state sequence {x
t
}
t=1,...,T
, whose evolution is
given by equation x
t
= f
t
(x
t1
, n
x
t
), from observations
{y
t
}
t=1,...,T
related to the states by y
t
= h
t
(x
t
, n
y
t
).
Usually, f
t
and h
t
are nonlinear functions, and n
x
t
and n
y
t
are i.i.d. noise sequences. From a proba-
bilistic viewpoint, it amounts to estimate, for any t,
p(x
1:t
|y
1:t
) where x
1:t
denotes the tuple (x
1
, . . . , x
t
).
The PF framework (Gordon et al., 1993) approxi-
mates the posterior densities using weighted samples
{x
(i)
t
, w
(i)
t
}, i = 1, . . . , N, where each x
(i)
t
is a possible
realization of state x
t
called a particle. In its predic-
tion step, PF propagates the particle set {x
(i)
t1
, w
(i)
t1
}
using a proposal function q(x
t
|x
(i)
1:t1
,y
t
) which may
differ from p(x
t
|x
(i)
t-1
) (but, for simplicity, we will as-
sume they do not); in its correction step, PF weights
the particles using a likelihood function, so that
w
(i)
t
w
(i)
t1
p(y
t
|x
(i)
t
)
p(x
(i)
t
|x
(i)
t1
)
q(x
(i)
t
|x
(i)
1:t1
,y
t
)
, with
N
i=1
w
(i)
t
= 1.
The particles can then be resampled: those with the
highest weights are duplicated while the others are
eliminated. The estimation of the posterior density
p(x
t
|y
1:t
) is then given by
N
i=1
w
(i)
t
δ
x
(i)
t
(x
t
), where
δ
x
(i)
t
are Dirac masses centered on particles x
(i)
t
.
2.2 Partitioned Sampling and
Swapping-based Partitioned
Sampling
Partitioned Sampling (PS) has been introduced by
MacCormick (MacCormick and Isard, 2000). PS’s
key idea is to exploit some natural decomposition
of the system dynamics w.r.t. subspaces of the state
space in order to apply PF only on those subspaces.
This leads to a significant reduction in the number of
particles required for tracking. So, assume that state
space X and observation space Y can be partitioned
as X = X
1
× ··· × X
P
and Y = Y
1
× ··· × Y
P
re-
spectively. For instance, a system representing a hand
could be defined as X
hand
= X
palm
×X
thumb
×X
index
×
X
middle
× X
ring
× X
little
. Assume in addition that the
dynamics of the system follows this decomposition,
i.e., that:
f
t
(x
t1
, n
x
t
) = f
P
t
f
P1
t
··· f
2
t
f
1
t
(x
t1
), (1)
where is the usual function composition operator
and where each function f
i
t
: X 7→ X modifies the par-
3
6
9
11
12
14
1
2
4 7
5
8
15
1310
x
3
t1
x
6
t1
x
3
t
x
6
t
x
4
t1
x
4
t
x
7
t1
x
7
t
x
1
t1
x
1
t
x
10
t1
x
13
t1
x
11
t1
x
14
t1
x
11
t
x
14
t
x
13
t
x
9
t
time t 1 time t
x
15
t1
x
2
t1
x
9
t1
x
12
t1
x
5
t1
x
8
t1
x
5
t
x
8
t
x
10
t
x
2
t
x
12
t
x
15
t
Figure 1: Full body tracking.
ticles’ states only on subspace X
i 1
.
In PS, one assumes that the global image likeli-
hood can be factorized as product of local likelihoods:
p(y
t
|x
t
) =
P
i=1
p
i
(y
i
t
|x
i
t
), (2)
where y
i
t
and x
i
t
are the projections of y
t
and x
t
on Y
i
and X
i
respectively.
Swapping-based Partitioned Sampling (SBPS) has
been introduced in (Dubuisson et al., 2011). It ex-
ploits conditional independences encoded in Dynamic
Bayesian Networks (DBNs) (Murphy, 2002) to im-
prove the tracking accuracy and reduce the computa-
tional cost of PS. Figure 1 show an example of hu-
man body tracking and a DBN used by SBPS for
modeling this problem, where x
k
t
, k = 1, ...15 repre-
sent the parts of the human body. Denote pa(x
k
t
) and
pa
t
(x
k
t
) the parent of node x
k
t
in all time instants and in
time instant t, respectively. The assumptions required
by SBPS is that the proposal transition function of a
given part x
i
t
at time t depends only on that part in
time t 1 (and possibly on other parts in time t) and
the observations depend only on their corresponding
state. The set {1, . . . , P} of parts of the target ob-
jects can be partitioned into some sets {P
1
, . . . , P
K
}
such that those parts in each P
j
are all independent
conditionally to
h< j
P
h
. For instance, in Figure 1,
P = 15 and K = 5, P
1
= {1} corresponds to the pelvis,
P
2
= {2, 3, 6} to the torso, the right and left thighs,
P
3
= {4, 7, 9, 12, 15} to the right and left calves, the
right and left upper arms, the head, P
4
= {5, 8, 10, 13}
to the right and left foots, the right and left lower
arms, P
5
= {11, 14} to the right and left hands. At
1
Note that, in (MacCormick and Isard, 2000), func-
tions f
i
t
are more general since they can modify states on
X
i
× ··· × X
M
. However, in practice, particles are often
propagated only one X
j
at a time.
Swapping-basedAnnealedParticleFilterwithOcclusionHandlingfor3DHumanBodyTracking
577
f
P
1
t
P
2
p(x
t1
|y
1:t1
)
p(x
t
|y
1:t
)
f
B(4)
t
f
B(7)
t
f
B(10)
t
f
B(13)
t
×p
P
1
t
×p
F
1
P
2
t
×p
F
2
P
2
t
Figure 2: New diagram for SBPS.
the jth stage, SBPS performs the prediction/correction
steps for the parts in P
j
in parallel instead of part after
part as in PS. This enables to produce better parti-
cles by swapping their subparts. This is achieved by
an operation called swapping or permutation, which
guarantees that the target distribution is correctly es-
timated. More precisely, whenever two particles are
such that they have the same states on some nodes
pa
t
(x
k
t
), then swapping their states on x
k
t
and its de-
scendants cannot alter the density estimated by the
particle set. For instance, on Figure 1, if two particles
have the same value for node pa
t
(x
1
t
), their values on
node x
3
t
, x
4
t
and x
5
t
can be safely swapped. The set of
subparts to permute similarly to x
k
t
in called a swap-
ping set. In addition, subpart permutation between
two particles can only be performed if they have the
same states on some nodes pa
t
(x
k
t
), which leads to
the definition of what is admissible permutation. In
SBPS, the definitions of swapping set and admissible
permutation guarantee that densities are correctly es-
timated.
The soundness of SBPS can be justified by d-
separation analysis on DBNs (see (Dubuisson et al.,
2013)).
3 PROPOSED APPROACH
SBPS is based on an important assumption that the
global image likelihood can be expressed as a prod-
uct of individual local likelihoods. The likelihood
for each part of the target object is then evaluated in-
dependently of other parts without taking into con-
sideration self-occlusion. When self-occlusions are
present however, this assumption is violated and the
product of local likelihoods gives a poor approxima-
tion to the global likelihood. As a result, SBPS tracker
does not give good performances when tracking un-
der self-occlusions. To cope with this problem, we
propose a new diagram for SBPS. We first introduce a
theoretical framework for this diagram in the next sec-
tion. Then, in section 3.2, we propose a new approach
for 3D human body tracking, based on this diagram.
3.1 Theoretical Framework
For computational reason, we will assume from this
point on that, within each time slice, the DBN struc-
ture is a directed tree.
For any u, u = 1, . . . , P, denote T (u) the set of
indices of nodes in the subtree whose root is x
u
t
(the indice of node x
u
t
for any t is u). Denote P
1
any subset of {1, . . . , P} such that: if u
1
P
1
then
u
2
P
1
where x
u
2
t
= pa
t
(x
u
1
t
). For any j 2, as-
sume that P
1
, . . . , P
j1
have been defined. Let S
j
be any subset of the set of nodes that have not
been selected in any set P
1
, . . . , P
j1
and any node
in S
j
is some child node of a node in P
1
. . .
P
j1
: S
j
{u {1, . . . , P}\
j1
h=1
P
h
: pa
t
(x
u
t
)
S
j1
h=1
S
vP
h
{x
v
t
}}. For all u S
j
, denote B(u) any
subset of T (u) such that: if u
1
6= u and u
1
B(u) then
u
2
B(u) where x
u
2
t
= pa
t
(x
u
1
t
). Then P
j
is defined
as: P
j
=
S
uS
j
B(u). In order to determine interest-
ing permutations, denote F
1
P
j
, . . . , F
n
j
P
j
a partition of P
j
such that any F
h
P
j
, h = 1, . . . , n
j
is a union of some sets
B(u), u S
j
. We call F
1
P
j
, . . . , F
n
j
P
j
an interesting parti-
tion of P
j
.
As an example, consider the human tracking prob-
lem in Figure 1. Here we have P = {1, . . . , 15}.
Assume that P
1
= {1, 2, 3, 6, 9, 12, 15} and now we
want to define P
2
. We can define S
2
= {4, 7, 10, 13}
since any node in S
2
is a child node of a node
in P
1
. Then we can define the following sets:
B(4) = {4, 5}, B(7) = {7, 8}, B(10) = {10, 11},
B(13) = {13, 14} and hence P
2
= B(4) B(7)
B(10) B(13) = {4, 5, 7, 8, 10, 11, 13, 14}. We par-
tition P
2
into F
1
P
2
, F
2
P
2
where: F
1
P
2
= B(4) B(7),
F
2
P
2
= B(10) B(13). In this case, a new dia-
gram for SBPS can be described in Figure 2, where
f
P
1
t
, f
B(4)
t
, f
B(7)
t
, f
B(10)
t
, f
B(13)
t
refer to propaga-
tion of nodes in P
1
, B(4), B(7), B(10), B(13) in topo-
logical orders, respectively. ×p
P
1
t
, ×p
F
1
P
2
t
, ×p
F
2
P
2
t
re-
fer to the correction step where particle weights are
multiplied by p
P
1
t
, p
F
1
P
2
t
, p
F
2
P
2
t
, respectively.
P
2
refers
to the particle subpart swappings, where we permute
the substates in F
1
P
2
among particles having the same
states for x
3
t
x
6
t
, and where we permute the sub-
states in F
2
P
2
among particles having the same states
for x
9
t
x
12
t
. In general case, the swapping set and
admissible permutation are defined as follows:
Definition 1. Let F be a set in an interesting partition
of P
j
. Let u
1
, . . . , u
l
S
j
are such that F =
S
l
i=1
B(u
i
).
The set x
S
l
i=1
B(u
i
)
t
S
x
S
l
i=1
(T (u
i
)\B(u
i
))
t1
is called a swap-
ping set. A permutation σ : {1, . . . , N} 7→ {1, . . . , N}
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
578
is said to be admissible if and only if x
(i),h
s
= x
(σ(i)),h
s
for all i {1, . . . , N} and for all nodes x
h
s
t
s=1
l
i=1
pa
s
(x
u
i
s
).
The correctness of the diagram in Figure 2 follows
from Proposition 1 (see Appendix for the proof).
Proposition 1. Under the same assumptions required
by SBPS, the set of particles resulting from the dia-
gram of Figure 2 represents probability distribution
p(x
t
|y
1:t
).
It should be noted that the application of our the-
oretical framework is not limited for DBNs whose
structures are directed trees. However, since the
swapping sets and admissible permutations can be
identified more easily in directed tree cases, an effi-
cient implementation of the tracking algorithm can be
obtained for these cases.
The main advantage of the diagram in Figure 2
compared to that of SBPS is that self-occlusion or
constraints between parts of the articulated object can
be efficiently taken into account. Let us consider the
case of self-occlusion. Actually, there are two inter-
esting cases where the diagram in Figure 2 are partic-
ularly useful when dealing with self-occlusion. The
first case is when we do not have observation for each
part but only for a group of parts. In this case, the
evaluation of the likelihood for each body part, as re-
quired by SBPS, often gives poor approximations, re-
sulting in bad tracking results due to poor approxima-
tions of the global likelihood in Equation 2. Now, sup-
posing that P
1
, F
1
P
2
and F
2
P
2
are groups of parts where
we can obtain good observation for each group. In
other words, we can obtain good approximations for
p
P
1
t
, p
F
1
P
2
t
and p
F
2
P
2
t
. Then, the global likelihood can be
better approximated as follows:
p(y
t
|x
t
) = p
P
1
t
× p
F
1
P
2
t
× p
F
2
P
2
t
The second case where the diagram in Figure 2
is useful for dealing with self-occlusion is when
the observation on some parts (guiding parts) pro-
vides search constraints for some other parts (primary
parts) that are estimated just before the guiding parts
when tracking the object. In this case, it could be
more efficient to combine the guiding parts and the
primary parts into some set F (for instance, F
1
P
2
or F
2
P
2
in the diagram of Figure 2) in order to improve the es-
timates for the primary parts and thus for the guiding
parts.
The diagram in Figure 2 also allows us to apply
some types of hard prior which SBPS finds it difficult
to deal with. These hard priors eliminate any particle
that violates some constraint between parts of the ar-
ticulated object, thus reduce the search space. Let us
1
2
3
4
8
9
10
11
12
13
14
15
6
7
5
Figure 3: Human body model.
consider the case when one wants to impose a pair-
wise constraint between two parts belonging to two
different branches of the tree representing the articu-
lated object. In the case their parent parts are poorly
estimated, then at the prediction step of these parts, it
could happen that the process of generating new parti-
cles for them must be repeated a lot of times to obtain
the required number of particles satisfying the con-
straint. This process may increase the computational
time by unpredictable amounts. In such situation, one
might want to backtrack and regenerate new configu-
rations for their parent parts. Such a task can not be
achieved in SBPS since any part in SBPS is always
processed after its parent parts and before its child
parts. In the diagram of Figure 2, the capability of
processing in parallel a group of parts belonging to
the same branch of the tree enables an efficient way
of dealing with some types of hard prior.
3.2 Our Approach for Human Tracking
3.2.1 Body Model and Likelihood Model
We use the human body model in (Sigal et al., 2010),
which is shown in Figure 3. The model consists of 15
parts where body parts are connected by joints. Track-
ing consists of estimating at each time instant a vector
of 34 parameters comprising the global position and
orientation of the pelvis and the relative joint angles
between neighboring limbs.
The likelihood model consists of a edge-based
part and a silhouette-based part, which is commonly-
used likelihood model for human body tracking. We
refer the reader to (Deutscher and Reid, 2005; Sigal
et al., 2010). Our human body tracking problem can
be modeled by the DBN in Figure 1.
3.2.2 Self-occlusion Handling
Like SBPS, our idea is to partition P into some sub-
sets. Body parts in these subsets are estimated se-
Swapping-basedAnnealedParticleFilterwithOcclusionHandlingfor3DHumanBodyTracking
579
quentially, while within each subset, body parts or
groups of body parts are processed in parallel. Here,
we partition P into 2 subsets: P
1
= {1, 2}, P
2
=
{3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}. P
2
is then par-
titioned into F
1
P
2
and F
2
P
2
where: F
1
P
2
= {3, 4, 5, 6, 7, 8},
F
2
P
2
= {9, 10, 11, 12, 13, 14, 15}. It is easy to see that
the sets P
1
, P
2
, F
1
P
2
, F
2
P
2
follow the definitions in sec-
tion 3.1. This partition is motivated by the fact that
during normal human activities like walking or jog-
ging, the left and right legs often occlude each other
and should be processed in parallel. Also, the left and
right arms often occlude each other and should be pro-
cessed in parallel.
3.2.3 Dealing with Hard Prior
In (Sigal et al., 2010), to reduce the search space, a
hard prior is applied that eliminates any particle that
corresponds to implausible body poses such as hav-
ing angles exceeding anatomical joint limits or inter-
penetrating limbs. This has been shown to improve
significantly the performance of the tracking algo-
rithms. By partitioning P into P
1
, P
2
, such a hard
prior can naturally be applied in our approach. More
precisely, when processing P
2
, we can test for inter-
sections between the left and right calves, since P
2
contains all parameters describing the configuration
of the legs. Similarly, test for intersections between
the lower arms and the torso can be performed when
processing P
2
, since P
2
contains all parameters de-
scribing the configuration of the arms, and at the time
we process P
2
, the configuration of the torso has been
previously determined in P
1
. Finally, constraints on
anatomical joint limits can be imposed at any time we
process P
1
and P
2
.
3.2.4 Algorithm
Our approach is based on the approach in (Dubuisson
and Gonzales, 2012) (we call it SB-PSAPF), which
combines the idea of SBPS and Partitioned Sampling
Annealed Particle Filter (PSAPF) (Bandouch et al.,
2008). In this approach, an annealing run consists of
parallel propagation/correction steps for a set of parts,
followed by a swapping operation over this set, and fi-
nally by a resampling. By testing on a human tracking
problem, this approach has been shown to be effec-
tive. However, the tracking conditions in this prob-
lem is quite simple, where no cluttered background is
present and no self-occlusion occurs during tracking.
Actually, these conditions are necessary for the as-
sumptions in Equation 2 to be satisfied. In real world
problems, when these assumptions do not hold, the
global likelihood might be poorly approximated, re-
sulting in poor tracking results. Furthermore, the use
of the swapping operation in multiple layers frame-
work, as proposed in SB-PSAPF, creates another is-
sue. Although this operation generates more particles
near the modes of the target distribution, it creates the
well-known sample impoverishment problem. The
reason is that after applying it on a particle set, it in-
creases the differences between particle weights. Af-
ter resampling, only particles with highest weight are
multiplied, while the remaining particles (those with
medium weights and low weights) have little chance
to survive. In SB-PSAPF, the sample impoverishment
problem is worst since at each layer, the swapping op-
eration is performed once and the diversity of the par-
ticle set decreases as the number of layers increases.
Another drawback of SB-PSAPF is that it esti-
mates the first set of body parts, without taking into
account their relation with the remaining body parts.
In our case, it estimates the pelvis and the torso with-
out looking at the head and the limbs. In practice,
however, some body parts, such as and the legs and
the head often provide important constraints for find-
ing the pelvis and the torso, and therefore it is not
always possible to localize the pelvis and the torso
separately from other body parts. When the pelvis
and the torso are poorly estimated, this affects the es-
timates of the head and the limbs and the performance
of the tracking algorithm degrades.
To address the problems discussed above, we pro-
pose a two-stages tracking strategy, where in the first
stage, human body parts are tracked using APF and
in the second stage, the estimates of the head and
the limbs are refined using SB-PSAPF, except that we
omit the optimization step for the pelvis and the torso.
At each time instant, an annealing run in the second
stage of our algorithm consists of the following steps:
Step 1: The particle set is resampled
Step 2: The parts in P
2
are propagated using their
dynamic functions. The hard prior, which is discussed
in this section, is applied. This step is repeated until
obtaining the required number of particles.
Step 3: For each particle, the likelihoods of the
substates corresponding to the body parts in F
1
P
2
and
F
2
P
2
are evaluated (for the sake of convenience, we call
them the likelihoods of F
1
P
2
and F
2
P
2
, respectively). For
F
1
P
2
, the legs of the human body model are first pro-
jected into edge images and silhouette images and
then the likelihood of F
1
P
2
is computed as in (Sigal
et al., 2010). In this way, one evaluation of the like-
lihood function in this step requires less computation
time than that in APF, since only body parts related to
the evaluation are projected.
Step 4: The substates corresponding to the body
parts in F
1
P
2
can be permuted among particles having
the same value for the pelvis. Also, the substates cor-
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
580
Table 1: Tracking errors and standard deviations (mm)
of APF, SB-PSAPF and our approach from tracking the
walking sequences of S1, S2, S3 with 590, 438 and 448
frames, respectively. The processing times shown below
each method are computed for one frame and one camera.
Sequence S1 S2 S3
SB-PSAPF 242±40 230±34 285±45
(12s)
SB-PSAPF2 220±32 211±25 265±35
(11s)
APF 125±37 105±20 199±41
(6s)
Our approach 120±26 95±12 182±30
(4s)
responding to the body parts in F
2
P
2
can be permuted
among particles having the same value for the torso.
The new weight of each particle is computed by tak-
ing the product of the likelihoods of F
1
P
2
and F
2
P
2
. In
order to generate a better particle set, at each step, the
particle with highest weight resulting from all pos-
sible permutations is constructed and added into the
new particle set. This particle is then eliminated from
the current particle set and the process is repeated.
Step 5: The particle weights are annealed as
in (Deutscher and Reid, 2005) so that about a half of
the particle set will survive the next resampling step.
4 EXPERIMENTAL RESULTS
In this section, we quantitatively compare our ap-
proach with APF, SB-PSAPF and SB-PSAPF2. For
SB-PSAPF, we estimate the pelvis and the torso at
the first stage, the head, the left and right upper arms,
the left and right thighs at the second stage, the left
and right lower arms, the left and right calves at the
third stage, and finally the left and right foots, the left
and right hands at the forth stage. SB-PSAPF2 is a
variant of SB-PSAPF where we use the same parti-
tion as our approach. For both SB-PSAPF and SB-
PSAPF2, the same hard prior used in our approach is
applied to improve their performance. The compari-
son between SB-PSAPF and SB-PSAPF2 will show
the interest of the diagram in Figure 2 when dealing
with self-occlusion. In order to ensure a fair compari-
son, we use the HumanEva dataset (Sigal et al., 2010),
which provides an implementation of APF. We im-
plement our algorithm, SB-PSAPF and SB-PSAPF2
within their framework, while keeping the remaining
codes unchanged. We use the error measure in (Sigal
et al., 2010), which averages the Euclidean distance
between 15 markers on the true pose and the corre-
sponding points computed from the estimated pose.
The likelihood model is constructed from edges and
silhouettes (Sigal et al., 2010). The image sequences
from 5 camera views: C1, C2, C3, BW1, BW2 are
used. First-order temporal dynamics are used where
the sampling covariance is learnt from all training
data in the dataset.
For our approach, we use 400 evaluations of the
likelihood function per frame, which is equal to 5 lay-
ers with 50 particles per layer for the first stage, and
3 layers with 50 particles per layer for the second
stage. For APF, we use 500 evaluations of the like-
lihood function per frame, which is equal to 5 layers
with 100 particles per layer. For SB-PSAPF and SB-
PSAPF2, we use 1000 evaluations of the likelihood
function per frame, which is equal to 5 layers with 50
particles per layer in 4 stages for SB-PSAPF, and 5
layers with 100 particles per layer in both stages for
SB-PSAPF2.
We present in this section the tracking results for
the walking sequences of S1, S2 and S3. Table 1
shows the tracking errors and standard deviations of
4 approaches from tracking S1, S2 and S3. As can
be observed, SB-PSAPF has the worst performance.
Due to self-occlusions between the legs and between
the torso and the arms, the evaluation of the likeli-
hood for each body part, as required by SB-PSAPF,
gives poor approximations. This leads to poor per-
formances of SB-PSAPF. By taking into account self-
occlusion, SB-PSAPF2 improves the performance of
SB-PSAPF. For 3 sequences, our approach achieves
better performances than the other approaches in term
of tracking error, standard deviation and computation
time. We can observe that the tracking errors obtained
from tracking S3 are higher than those obtained from
tracking S1 and S2. This is due to larger movements
of the legs of S3 compared to S1 and S2. We can also
observe that the difference between the tracking errors
of our approach and APF from tracking S3 are larger
that those obtained from tracking S1 and S2. This re-
sult suggests that our approach is robust in tracking
people with strong motions.
Figure 4 shows some frames from tracking S3, us-
ing APF and our approach. At frame 100, our ap-
proach fails to track the right leg, as APF does (see
the 2nd column). However, at the next frames, our
approach can recover from tracking failure and tracks
the legs quite well, while APF fails to track one of
the two legs correctly. This highlights the effective-
ness of the second stage of our approach in refining
the estimate for the limbs. Figure 5 shows the track-
Swapping-basedAnnealedParticleFilterwithOcclusionHandlingfor3DHumanBodyTracking
581
Figure 4: Tracking results of the sequence S3 walking for frames 51,100, 240 303,340, 382. APF (1th row), our approach
(2nd row). The images in the first 5 frames are obtained from one view and those in the last 2 frames are obtained from
another view to better visualize the tracking results for the legs of 2 approaches.
Figure 5: Tracking errors (mean over 10 runs) of SB-
PSAPF, SB-PSAPF2, APF and our approach for the walk-
ing sequence of S3.
ing error curves of 4 approaches from tracking S3.
There are two intervals where we can not obtain the
groundtruth, due to bad data. Hence, we set the track-
ing errors of 4 approaches to zero for these intervals.
As can be observed, our approach is clearly better
than the other approaches at most of the frames.
5 CONCLUSIONS
We have presented a new particle filter based ap-
proach for 3D human body tracking. Our approach
consists of 2 stages. At the first stage, the human
body is tracked using Annealed Particle Filter. Then,
the limbs are refined using another stage, where more
particles near the modes of the likelihood function are
generated by swapping the substates of the particles.
This second stage is also based on the framework of
Annealed Particle Filter. Our quantitative evaluation
on real sequences has shown the robustness of the pro-
posed approach. It should be noted that our theoreti-
cal framework proposed in section 3.1, can be applied
not only for tracking articulated objects but also for
tracking multiple objects interacting with each oth-
ers. Here, the advantage compared to SBPS is that
occlusions and constraints between objects can be ef-
fectively taken into account. Although our approach
improves the estimate of the limbs thanks to its sec-
ond stages, the obtained tracking results are not satis-
factory due to self-occlusion. Our current work focus
on developing more efficient methods for dealing with
these problems within our framework.
REFERENCES
Bandouch, J., Engstler, F., and Beetz, M. (2008). Evalua-
tion of Hierarchical Sampling Strategies in 3D Human
Pose Estimation. In BMVC, pages 925–934.
Deutscher, J. and Reid, I. (2005). Articulated body motion
capture by stochastic search. International Journal of
Computer Vision, 61(2):185–205.
Dubuisson, S. and Gonzales, C. (2012). An optimized dbn-
based mode-focussing particle filter. In Proceedings
of the 2012 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR). IEEE Computer Society.
Dubuisson, S., Gonzales, C., and Nguyen, X. S. (2011).
Swapping-based partitioned sampling for better com-
plex density estimation: Application to articulated ob-
ject tracking. In International Conference on Scalable
Uncertainty Management, pages 525–538.
Dubuisson, S., Gonzales, C., and Nguyen, X. S. (2013).
Sub-sample swapping for sequential monte carlo ap-
proximation of high-dimensional densities in the con-
text of complex object tracking. International Journal
of Approximate Reasoning, 54(7):934 – 953.
Gordon, N., Salmond, D. J., and Smith, A. (1993). Novel
approach to nonlinear/non-Gaussian Bayesian state
estimation. IEE Proceedings of Radar and Signal Pro-
cessing.
MacCormick, J. and Isard, M. (2000). Partitioned sampling,
articulated objects, and interface-quality hand track-
ing. In ECCV, pages 3–19.
Murphy, K. P. (2002). Dynamic Bayesian Networks : Rep-
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
582
resentation, Inference and Learning. PhD thesis, Uni-
versity of California, Berkeley.
Rose, C., Saboune, J., and Charpillet, F. (2008). Reducing
particle filtering complexity for 3D motion capture us-
ing dynamic Bayesian networks. ICAI, pages 1396–
1401.
Sigal, L., Balan, A., and Black, M. (2010). Humaneva: Syn-
chronized video and motion capture dataset and base-
line algorithm for evaluation of articulated humanmo-
tion. International Journal of Computer Vision, 87(1-
2):4–27.
Zhang, X., Hu, W., Wang, X., Kong, Y., Xie, N., Wang,
H., Ling, H., and Maybank, S. (2010). A swarm in-
telligence based searching strategy for articulated 3D
human body tracking. In CVPR, pages 45–50.
APPENDIX
Proposition 1. We denote by Q
j
=
j
h=1
P
h
and R
j
=
K
h= j+1
P
h
the set of parts already processed after the
jth step of the diagram in Figure 2 (the jth step
consists of processing parts in P
j
) and those still to
be processed, respectively. By convention, we set
Q
0
= R
K
=
/
0. We shall show that 1) after the prop-
agations of parts in P
j
(prediction step), the particle
set represents the density p(x
Q
j
t
, x
R
j
t1
|y
1:t1
, y
Q
j1
t
);
2) after the corrections of parts in P
j
, the particle set
represents the density p(x
Q
j
t
, x
R
j
t1
|y
1:t1
, y
Q
j
t
); 3) af-
ter the swapping of parts in P
j
, the particle set rep-
resents the density p(x
Q
j
t
, x
R
j
t1
|y
1:t1
, y
Q
j
t
); Since the
resampling step does not alter the distribution repre-
sented by the particle set, at the last step (Kth step) of
the diagram in Figure 2, the particle set represents the
density p(x
Q
K
t
, x
R
K
t1
|y
1:t1
, y
Q
K
t
) = p(x
P
t
|y
1:t1
, y
P
t
) =
p(x
t
|y
1:t
).
We prove 1), 2), 3) by introduction on j. Assume
that after the (j-1)th step, the particle set represents the
density p(x
Q
j1
t
, x
R
j1
t1
|y
1:t1
, y
Q
j1
t
). For the proof of
1) and 2), we refer the reader to (Dubuisson et al.,
2013). We now prove 3). Let F be a set in an inter-
esting partition of P
j
and let u
1
, . . . , u
l
S
j
are such
that F =
S
l
i=1
B(u
i
). Denote x
C
t
= x
l
i=1
B(u
i
)
t
, x
D
t1
=
x
l
i=1
(T (u
i
)\B(u
i
))
t1
, x
A
t
=
l
i=1
pa
t
(x
u
i
t
), x
V
t
= x
Q
j
\(CA)
t
,
x
W
t1
= x
R
j
t1
\ x
D
t1
. From 2), after the corrections of
parts in P
j
, the particle set estimates the distribution
p(x
Q
j
t
, x
R
j
t1
|y
1:t1
, y
Q
j
t
). We have:
p(x
Q
j
t
, x
R
j
t1
|y
1:t1
, y
Q
j
t
) p(x
Q
j
t
, x
R
j
t1
, y
1:t1
, y
Q
j
t
)
= p(x
CAV
t
, x
DW
t1
, y
CAV
1:t
, y
DW
1:t1
)
=
R
p(x
CV
t
, x
A
1:t
, x
DW
t1
, y
CAV
1:t
, y
DW
1:t1
)dx
A
1:t1
=
R
p(x
A
1:t
, y
A
1:t
, x
C
t
, x
D
t1
, y
C
1:t
, y
D
1:t1
, x
V
t
, x
W
t1
, y
V
1:t
, y
W
1:t1
)
dx
A
1:t1
=
R
p(x
A
1:t
, y
A
1:t
).p(x
C
t
, x
D
t1
, y
C
1:t
, y
D
1:t1
|x
A
1:t
, y
A
1:t
)
.p(x
V
t
, x
W
t1
, y
V
1:t
, y
W
1:t1
|x
A
1:t
, y
A
1:t
, x
C
t
, x
D
t1
, y
C
1:t
, y
D
1:t1
)dx
A
1:t1
=
R
p(x
A
1:t
, y
A
1:t
).p(x
C
t
, x
D
t1
, y
C
1:t
, y
D
1:t1
|x
A
1:t
)
.p(x
V
t
, x
W
t1
, y
V
1:t
, y
W
1:t1
|x
A
1:t
, x
C
t
, x
D
t1
, y
C
1:t
, y
D
1:t1
)
dx
A
1:t1
(4)
We will show that: {x
C
t
x
D
t1
y
C
1:t
y
D
1:t1
} and
{x
V
t
x
W
t1
y
V
1:t
y
W
1:t1
} are independent condition-
ally to {x
A
1:t
}. In other word, {x
A
1:t
} d-separate {x
C
t
x
D
t1
y
C
1:t
y
D
1:t1
} and {x
V
t
x
W
t1
y
V
1:t
y
W
1:t1
}.
Since we assume that the observation of a part de-
pends only on this part, all paths that connect y
C
1:t
to
the rest of the DBN must go through x
C
1:t
. This is also
true for y
D
1:t1
, y
V
1:t
and y
W
1:t1
. Hence, it is enough
to show that: {x
A
1:t
} d-separate {x
C
1:t
x
D
1:t1
} and
{x
V
1:t
x
W
1:t1
}. We will show that: {x
A
1:t
} d-separate
{x
C
1:t
x
D
1:t
} and {x
V
1:t
x
W
1:t
}. We have: {x
C
1:t
x
D
1:t
} =
x
CD
1:t
= x
l
i=1
T (u
i
)
1:t
. Consider a path that connects a
node in x
l
i=1
T (u
i
)
1:t
and a node in the rest of the DBN.
If this path goes through a node at a time instant t
0
> t,
then there must exist a node in the path at a time in-
stant t
00
> t such that its two neighbor arcs converge
to this node (these arcs point to this node). In this
case, neither this node nor its descendants is in {x
A
1:t
},
hence {x
A
1:t
} d-separate this path. If the path does not
go through any node at a time instant t
0
> t, then it
must go through a node in {x
A
1:t
}, and in this case the
path is blocked at this node. In all cases, we have
shown that {x
A
1:t
} d-separate {x
C
1:t
x
D
1:t
} and the rest
of the DBN. Hence {x
A
1:t
} d-separate {x
C
1:t
x
D
1:t
} and
{x
V
1:t
x
W
1:t
}.
Now, (4) is equal to:
R
p(x
A
1:t
, y
A
1:t
).p(x
C
t
, x
D
t1
, y
C
1:t
, y
D
1:t1
|x
A
1:t
)
.p(x
V
t
, x
W
t1
, y
V
1:t
, y
W
1:t1
|x
A
1:t
)dx
A
1:t1
Permuting particles over parts x
C
t
x
D
t1
for fixed
value of x
A
1:t
cannot change the estimation of den-
sity p(x
C
t
, x
D
t1
, y
C
1:t
, y
D
1:t1
|x
A
1:t
) because estimations
by samples are insensitive to the order of the ele-
ments in the samples. Moreover, it can neither affect
the estimation of density p(x
V
t
, x
W
t1
, y
V
1:t
, y
W
1:t1
|x
A
1:t
),
since {x
C
t
x
D
t1
y
C
1:t
y
D
1:t1
} and {x
V
t
x
W
t1
y
V
1:t
y
W
1:t1
} are independents conditionally to {x
A
1:t
}.
Hence, permuting particles over the swapping set and
within admissible permutations, whose definitions are
given in Definition 1, guarantee that the target distri-
bution is correctly estimated.
Swapping-basedAnnealedParticleFilterwithOcclusionHandlingfor3DHumanBodyTracking
583