ENHANCING MEMORY-BASED PARTICLE FILTER WITH

DETECTION-BASED MEMORY ACQUISITION FOR ROBUSTNESS

UNDER SEVERE OCCLUSION

Dan Mikami, Kazuhiro Otsuka, Shiro Kumano and Junji Yamato

NTT Communication Science Laboratories, NTT, 3-1 Morinosato-Wakamiya, Atsugi, Kanagawa, 243-0198, Japan

Keywords:

Pose Tracking, Face Pose, Memory-based Prediction, Memory Acquisition.

Abstract:

A novel enhancement for the memory-based particle ﬁlter is proposed for visual pose tracking under severe oc-

clusions. The enhancement is the addition of a detection-based memory acquisition mechanism. The memory-

based particle ﬁlter, M-PF, is a particle ﬁlter that predicts prior distributions from past history of target state,

which achieved high robustness against complex dynamics of a tracking target. Such high performance re-

quires sufﬁcient history stored in memory. Conventionally, M-PF conducts online memory acquisition which

assumes simple target’s dynamics without occlusions for guaranteeing high quality histories. The requirement

of memory acquisition narrows the coverage of M-PF in practice. In this paper, we propose a new memory

acquisition mechanism for M-PF. The key idea is to use a target detector that can produce additional prior

distribution of the target state. We call it M-PFDMA for M-PF with detection-based memory acquisition. The

detection-based prior distribution well predicts possible target position/pose even in limited visibility condi-

tions caused by occlusions. Such better prior distributions contribute to stable estimation of target state, which

is then added to memorized data. As a result, M-PFDMA can start with no memory entries but soon achieve

stable tracking even under severe occlusions. Experiments conﬁrm M-PFDMA’s good performance in such

conditions.

1 INTRODUCTION

Visual object tracking has been acknowledged as one

of the most important techniques in computer vision

(Comaniciu et al., 2003), and is required for a wide

range of applications such as automatic surveillance,

man-machine interfaces (Bradski, 1998; Tua et al.,

2007), and communication scene analysis (Otsuka

et al., 2008).

For visual object tracking, Bayesian ﬁlter-based

trackers have been acknowledged as a promising ap-

proach; they represent a uniﬁed probabilistic frame-

work for sequentially estimating the target state from

an observed data stream (Gordon et al., 1993). At

each time step, the Bayesian ﬁlter computes the pos-

terior distribution of the target state by using observa-

tion likelihood and the prior distribution. One imple-

mentation, the particle ﬁlter (Isard and Blake, 1998),

has been widely used for target tracking. It represents

probability distributions of the target state by a set of

samples, called particles. Particle ﬁlter, in short PF,

can potentially handle non-Gaussian, nonlinear dy-

namics/observation processes; this contributes to ro-

bust tracking. However, most particle ﬁlter-based vi-

sual trackers are rather constrained since they employ

linear, Gaussian, and time invariant dynamics for sim-

plicity.

Mikami et al. focused on the issue of sim-

plicity degrading PF robustness in real-world situ-

ations including abrupt movements and occlusions.

To deal with the target’s non-Markov, non-Gaussian,

and time-varying dynamics, they proposed a memory-

based particle ﬁlter, called M-PF, as an extension of

the particle ﬁlter (Mikami et al., 2009). M-PF eases

the Markov assumption of PF and uses past history

of the target’s states to predict the prior distribution

on the basis of the target’s long-term dynamics. M-

PF offers robustness against abrupt object movements

and quick recovery from tracking failure. However,

such high performance can be achieved only if tar-

get history in memory is voluminous and high quality.

The ﬁrst implementation of M-PF in (Mikami et al.,

2009) includes an online memory acquisition period

which requires the capture of simple dynamics with-

out occlusions for assuring stable tracking. Demand-

ing memory acquisition in this manner narrows the

208

Mikami D., Otsuka K., Kumano S. and Yamato J..

ENHANCING MEMORY-BASED PARTICLE FILTER WITH DETECTION-BASED MEMORY ACQUISITION FOR ROBUSTNESS UNDER SEVERE

OCCLUSION.

DOI: 10.5220/0003808302080215

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 208-215

ISBN: 978-989-8565-04-4

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

coverage of M-PF.

To acquire memory more stably in a wider range

of real-world situations, this paper proposes a new

memory-acquisition mechanism for the M-PF-based

tracker. The key idea is combining M-PF with a target

detector. The target detector can ﬁnd the target posi-

tion/pose even in cluttered conditions, and the detec-

tion result is used for creating an additional prior dis-

tribution of target position/pose. This detection-based

prior distributionand the original memory-based prior

distribution are combined and provided to the poste-

rior distribution estimation step. Such combined prior

distribution prediction contributes to more stable es-

timation of the target state even in limited visibility

conditions. This estimated target state is then added

to memory,and is used for creating the memory-based

prior distribution in future steps. This cycle of detec-

tion, combined prior distribution prediction, posterior

distribution estimation, and memory accumulation is

highly synergistic in terms of boosting M-PF perfor-

mance in real-world environments. We name it M-

PFDMA for M-PF with detection-based memory ac-

quisition. M-PFDMA has the following advantages;

stable initial tracking without memory, quick recov-

ery after occlusion, and wider pose range recoverabil-

ity.

To verify the effectiveness of M-PFDMA, we im-

plement a facial pose tracker. Facial pose tracking

should yield attributes of the face such as position and

rotation. As the object detector, we use the “joint

probabilistic increment sign correlation face detec-

tor,” in short JPrISC face detector, the multi-view face

detector proposed by Tian et al. (Tian et al., 2010).

The JPrISC face detector can detect faces from frontal

view to near proﬁle view, and can output ten face pose

classes. Facial pose tracking experiments verify that

M-PFDMA can acquire target’s state history even un-

der severe occlusion and achieves accurate tracking

and high recoverability.

In the context of human pose tracking, the idea

of tracking-by-detection has been gaining attention

in recent years as a possible alternative to the tra-

ditional target tracking approach (Murphy-Chutorian

and Trivedi, 2008; Ozuysal et al., 2006; Andriluka

et al., 2010); it reﬂects the rapid progress in ob-

ject detectors. However, the current human facial

pose detectors are inadequate for realizing mature

tracking-by-detection since they are not fast enough

for real-time tracking, not accurate enough for de-

termining precise target position/pose, and not able

to well handle target dynamics. Rather than re-

lying on just the detector, combining the detector

with a tracker has been seen as a reasonable solu-

tion. One example was proposed by (Kobayashi et al.,

2006), (Ba and Odobez, 2008). They combined a

PF-based tracker with a face detector for observa-

tion (Kobayashi et al., 2006) and prior distribution

prediction (Ba and Odobez, 2008). In (Kobayashi

et al., 2006), a multi-pose class face detector provides

multiple choices with regard to the observation func-

tion but the tracking pose resolution is limited to de-

tectable pose classes. In (Ba and Odobez, 2008), a

head position detector yields a uniform prior distribu-

tion over all pose ranges, which limits pose accuracy

in tracking. Their usage of detectors improved PF-

based tracking, but they targeted only a simple en-

vironment with no occlusions. On the contrary, our

target is more practical and so occlusions are consid-

ered. M-PFDMA aims at high recoverability from

occlusions. The main feature of M-PFDMA is inte-

grating the object detector into the M-PF framework;

the object detector contributes to fast and stable ac-

quisition of target state memory. Eventually, the im-

proved memory contents leads to fast and reliable re-

covery from occlusions. The synergetic effects of M-

PFDMA separate it from previous combination meth-

ods.

The remainder of this paper is organized as fol-

lows; Sect. 2 brieﬂy reviews memory-based particle

ﬁlter and its recoverability from tracking failure. Sec-

tion 3 proposes our new facial pose tracker. Section 4

details the experimental environment and results. Fi-

nally, Sect. 5 concludes and discusses our proposal.

2 MEMORY-BASED PARTICLE

FILTER AND ITS TRACKING

RECOVERABILITY

This section overviews the memory-based particle ﬁl-

ter and addresses the recovery problem from tracking

failure, which has, up to now, been an unsolved prob-

lem in the ﬁeld of tracking.

2.1 Memory-based Particle Filter

M-PF (Mikami et al., 2009) realizes robust target

tracking without explicit modeling of the target’s dy-

namics even when the target moves quickly. Figure

1 outlines M-PF. M-PF keeps the temporal sequence

of past state estimates

1:T

= {

, ···

} in mem-

ory. Here,

1:T

denotes a sequence of state estimates

from time 1 to time T, and

denotes a pose estimate

at time t. M-PF assumes that the subsequent parts

of past similar states provide good estimates of the

current future. M-PF introduced Temporal Recurrent

Probability (TRP), which is a probability distribution

ENHANCING MEMORY-BASED PARTICLE FILTER WITH DETECTION-BASED MEMORY ACQUISITION FOR

ROBUSTNESS UNDER SEVERE OCCLUSION

209

Prior dist.

Pose space

TRP

∆

2. Temporal sampling

Now

Pose history

1. Modeling TRP [Temporal

Recurrent Probability]

3. Sampling states

Figure 1: M-PF employs past state sequences to predict a

future state. First, it calculates the reoccurrence possibility

of past state estimates (TRP). Past time steps are then sam-

pled on the basis of TRP. Past state estimates corresponding

to the sampled time steps are combined to predict prior dis-

tribution. M-PF enables the implicit modeling of complex

dynamics.

deﬁned in the temporal domain that indicates the pos-

sibility that a past state will reappear in the future. To

predict the prior distribution, M-PF starts with TRP

modeling. It then conducts temporal sampling on the

basis of TRP. The sampled histories are denoted by

blue dots in Fig. 1. It retrieves the corresponding past

state estimates for each sampled time step, denoted by

pink dots in Fig.1. After that, considering the uncer-

tainty in the state estimates, each referred past state is

convoluted with kernel distributions (light green dist.

in Fig.1), and they are mixed together to generate the

prior distribution (green dist. in Fig. 1). Finally, a

set of particles is generated according to the prior dis-

tribution (blue dots in right part of Fig. 1). The M-

PF-based face pose tracker in (Mikami et al., 2009)

estimates the position and rotation at each time step.

M-PF uses the same observation process as traditional

PF, which uses a single template built at initialization.

This yields the 50 degree face rotation limit noted in

(Mikami et al., 2009).

2.2 Recovery from Tracking Failure

The conventional visual trackers track a target with

the assumptions of simple dynamics and excellent

visibility. Severe occlusions remained a challenging

problem, i.e. how can a tracker rediscover the lost tar-

get under severe occlusion. This occlusion recovery

problem can be viewed from two aspects; quickness

of recovery and recoverable pose range.

The quickness of recovery indicates how rapidly

a tracker can rediscover a target after the target reap-

pears after being lost due to occlusion. The recover-

able pose range indicates the pose range within which

the tracker can rediscoverthe face, we must expect the

Recoverable pose range

wide

Quickness of recovery

fast

Dete

ctor

M-PF

M-PFDMA

(Proposed)

Figure 2: Properties of facial pose tracker with regard to

recovery speed and recoverable pose range; PF is only

able to rediscover the target if it takes a pose similar to

the pose prior to tracking loss. By integrating a detector,

PF+Detector enables recovery if the target can be detected.

M-PF enables recovery if the target takes a stored pose. M-

PF-based recovery can ﬁnd the target faster than the detec-

tor. M-PFDMA supports memory-based quick recovery and

detection-based wide pose range recovery.

target pose to change signiﬁcant during an occlusion.

Conventional methods can be mapped as in Fig. 2 ac-

cording to these aspects.

The conventional PF-based tracker tries to redis-

cover a lost target by simply broadening prior distri-

bution frame by frame according to random walk dy-

namics. It may be able to rediscovera lost target if the

occlusion period is short and pose changes is small.

However, as time passes and a pose changes signiﬁ-

cantly, rediscovery probability falls dramatically.

Original M-PF (with simple online memory ac-

quisition) can rediscover the lost target if the target

takes a pose that is stored in memory by memory-

based prior distribution prediction. Such memory-

based rediscovery is faster than detection-based re-

discovery or PF-based rediscovery. This is because

memory-based prior distribution can well predict pos-

sible poses/positions of the target after occlusion.

Therefore, M-PF provides more rapid recovery and

wider recoverable pose range.

Combinationsof particle ﬁltering and detector, de-

noted by PF+Detector in Fig.2, are able to rediscover

the lost target if the target takes detectable poses.

Though the detectable poses and the required time for

detection depend on the detector’s performance, gen-

erally speaking, detectors have much higher computa-

tional costs than trackers. To detect targets that take a

greater variety of poses, computational costs become

even higher. This also means that the recovery speed

tends to be rather slow.

M-PFDMA aims at achieving faster recovery

in wide pose ranges by integrating detection-based

memory acquisition into the memory-based particle

ﬁlter. Detection-based prior distribution helps re-

discovery of targets that take previously unobserved

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

210

poses. Rediscovered and tracked positions/poses are

stored in history. Acquired history can be used

to improve memory-based prior distribution predic-

tion. This synergetic combination of detection-based

memory acquisition and memory-based prior distri-

bution prediction enables faster recovery in wide pose

ranges.

3 MEMORY-BASED PARTICLE

FILTER WITH

DETECTION-BASED MEMORY

ACQUISITION: M-PFDMA

This section proposes an enhancement of M-PF for

object tracking called M-PFDMA. It stands for M-PF

with detection-based memory acquisition. It achieves

quick recovery in wide pose range due to its synergis-

tic combination of an object detector and tracker.

As reviewed in Sect. 2.1, the basic assumption of

M-PF was that the target repeats similar movements

again and again. On the basis of this assumption,

M-PF introduced the temporal recurrent probability

(TRP), which indicated the tendency of past simi-

lar states reappearing in the future. M-PF replaced

the simple dynamics model employed in conventional

particle ﬁlters, such as the random walk model, by

the temporal recurrent probability, for predicting prior

distribution, called memory-based prior distribution.

M-PFDMA yields detection-based prior distribution

in addition to the memory-based prior distribution.

The detection-based prior distribution is folded into

memory-based prior distribution. This integration of

an effective detector into a uniﬁed M-PF framework

is the key contribution of M-PFDMA.

The remainder of this section ﬁrst overviews M-

PFDMA, and then, describes the prior distribution

prediction formulation. Finally, we describe its im-

plementation in a facial pose tracker.

3.1 System Overview

M-PFDMA integrates an object detector into the M-

PF framework. Figure 3 illustrates the block dia-

gram of M-PFDMA; the differences from M-PF are

hatched. M-PFDMA has four main components; ini-

tialization, prior distribution prediction, posterior dis-

tribution prediction, and tracking result estimation. In

the initialization step, the tracker detects a target and

makes a target model. In the prior distribution calcu-

lation step, it calculates prior distribution on the basis

of two clues. One is memory-based prior distribu-

tion prediction, which is described in Sect. 2.1. The

Camera

Initialization

Object

detection

Detection-based

Prior dist. prediction

Memory-based prior

dist. prediction

Combined

prior

estimation

Posterior dist.

estimation

Tracking result

estimation

Past history

of tracking

results

Prior distribution prediction

Figure 3: Block diagram of our facial pose tracker. It has

four main components; initialization, prior distribution pre-

diction, posterior distribution prediction, and tracking result

estimation. The key differences of M-PFDMA from M-PF

are hatched, i.e. integration of detection-based prior distri-

bution prediction into memory-based prior distribution pre-

diction. It enables rediscovery of a lost target even if the

target takes a position/pose which has not been stored in

memory while occlusions. As the more position/pose are

added into the memory, the better the memory-based prior

distribution prediction becomes. The synergetic effect be-

tween detection and tracking is the key contribution of the

proposed M-PFDMA.

other is detection-based prior distribution estimation;

detection-based probability distribution is generated

on the basis of the detection results. The memory-

based and detection-based prior distributions are then

combined, which is described in Sect. 3.2 in more de-

tail. The posterior distribution calculation step calcu-

lates observation likelihood, and then, by using likeli-

hood and prior distribution, calculates posterior dis-

tribution. Finally, a pose estimate is obtained by

weighted averaging and stored in memory. The steps

from determining the prior distribution to pose esti-

mation are repeated in each frame.

3.2 Formulation of Prior Distribution

Prediction of M-PFDMA

M-PFDMA’s key extension from M-PF resides in its

prior distribution prediction parts. Unlike M-PF, M-

PFDMA employs an object detector and combines

memory-based prior distribution and detection-based

prior distribution. This section formulates the prior

distribution of M-PFDMA.

Bayesian ﬁlters, including particle ﬁlter, calculate

prior distribution by multiplying the observation like-

lihood by the motion dynamics of target state as in

p(x

t+1

1:t

) =

p(x

t+1

1:t

) · p(x

1:t

)dx

1:t

, (1)

where x

denotes the state vector indicating position

and rotation, x

1:t

= {x

, ··· , x

} denotes the state se-

quence of state vector from time 1 to t, and Z

1:t

ENHANCING MEMORY-BASED PARTICLE FILTER WITH DETECTION-BASED MEMORY ACQUISITION FOR

ROBUSTNESS UNDER SEVERE OCCLUSION

211

, ··· , Z

} denotes the sequence of observations

from time 1 to t.

As the dynamics model, conventional particle ﬁl-

ters assume a short term Markov model as in

p(x

t+1

1:t

) ≈ p(x

t+1

). (2)

The memory-based particle ﬁlter assumes the ten-

dency of repeating past positions/poses, and intro-

duced the temporal recurrent probability Φ(·). It re-

placed the dynamics model with the temporal recur-

rent probability given by

p(x

t+∆t

1:t

, ∆t) =

∑

τ=1

Φ(t|

1:t

, ∆τ) · K(x

t+∆τ

where ∆t denotes the time offset between current time

and prediction target and

x denotes a point estimate

stored in memory. K() is the kernel distribution that

represents uncertainty in the stored state estimate.

M-PFDMA replaces the memory-based prior dis-

tribution by a combination of memory-based prior

distribution and detection-based prior distribution cal-

culated based on object detection results

) as

p(x

t+∆t

1:t

, ∆t, z

t+∆t

) = α

∑

j=1

q(x

))

+(1− α)

∑

τ=1

Φ(t|

1:t

, ∆τ) · K(x

t+∆τ

), (3)

where α denotes the mixing weight between memory-

based prior distribution (ﬁrst part of (4)) and

detection-based prior distribution q() (latter part of

(4)), and N

denotes the number of detected objects.

The detection result ex

) includes estimated position

and pose, and q(x|

)) denotes Gaussian distribu-

tions with mean for each position/pose. In this paper,

a static predeﬁned value α = 0.15, which is indepen-

dent of the number of detected objects, is employed

for the mixing weight α.

The combined prior distribution realizes high re-

coverability and accuracy, which yields quick and sta-

ble memory acquisition.

3.3 Implementation in a Facial Pose

Tracker

We implement a facial pose tracker on the basis of

M-PFDMA. As the detection method, the multi-view

face detector proposed by Tian et al. (Tian et al.,

2010), called a JPrISC face detector, is employed

The multi-view face detector is able to detect faces

Though we used the JPrISC as the face pose detector,

it is not specialized in face pose. By collecting training im-

ages, it can be applied for detecting other objects.

and to output ten pose classes including frontal view

and near proﬁle view. It achieved relatively fast detec-

tion without degrading detection accuracy by using a

calculation sequence determined by entropy.

4 EXPERIMENT

To conﬁrm M-PFDMA’s performance, we focus on

severe occlusion cases because stable tracking is al-

ready possible with conventionalmethods if the target

is not occluded, even if it exhibits complex dynamics.

4.1 Experimental Settings

Video capturing environment is as follows. We used

PointGreyResearch’s FLEA, a digital color camera, to

capture 1024 x 768 (pixels) images at 30 frames per

second. Note that the tracking processes uses only

grayscale images converted from the color images.

The CPU of the PC used was an Intel Core2Extreme

3.0GHz (Quad Core) and the GPU was NVIDIA

GeForce GTX480. All experiments used 2000 par-

ticles.

Our tracker was implemented on the basis of STC-

Tracker (Lozano and Otsuka, 2008), which acceler-

ates particle ﬁltering by GPU implementation. It can

run at 30 fps, and our M-PFDMA-based tracker also

can run at 30 fps.

4.2 Typical Example of Proposed Facial

Pose Tracker in Action

We compare M-PFDMA to three other methods.

First one is the memory-based particle ﬁlter (M-PF)

(Mikami et al., 2009), the second one is a combination

of particle ﬁltering and face detector called the JPrISC

face detector (PF+Detector). Note that the JPrISC

face detector can output ten pose classes in addition to

position. Therefore, PF+Detector is expected to show

higher performance than similar method in (Ba and

Odobez, 2008) especially in terms of speed of recov-

ery. And the third one is FaceAPI (Seeingmachines,

), which is known as the best face tracker that is com-

mercially available. The FaceAPI can detect a face

in near frontal view, and then can sequencially esti-

mate position and pose of the face. Note that because

FaceAPI is a commercial facial pose tracker, the de-

tailed algorithm is not apparent.

We prepared two videos with severe occlusions.

In the ﬁrst video, objects horizontally and vertically

cross at the camera’s centerline as in Fig.8, and cause

occlusions. Tracking starts in the top-right area,

and the subject moves up-down and left-right while

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

212

time

M-PF PF+Detection M-PFDMA

Figure 4: Tracking results of M-PF, PF+Detector, and M-PFDMA (proposed). White mesh denotes estimated posi-

tion/pose. While tracking is unstable, the mesh turns gray. The left column shows M-PF’s output, the middle column shows

PF+Detector’s output, and the right column shows M-PFDMA’s output. In each column, ﬁgures are listed in time order.

changing his pose. The second video simulates video

conference situation; the subject makes a presentation

in front of a camera, interacting with participants on

the other side.

Figure 4 shows snapshots of the tracking behavior

of M-PF, PF+Detector, and M-PFDMA, from the ﬁrst

video. Only upper half images are shown. In Fig. 4,

left column, middle column, and right column show

the result of M-PF, PF+Detector, and M-PFDMA, re-

spectively. In each column, ﬁgures are listed in time

order. From ﬁrst row to second row, the target face

moved from right to left; and from second row to

third row, it moved from left to right. Second row

shows that M-PFDMA and PF+Detector successfully

detected the target face while M-PF didn’t, because

the target’s pose had not been stored. It is one exam-

ple of improvement in recoverable pose range. This

recovery conﬁrms the effectiveness of the detector.

However, these recoveries were achieved by the de-

tector, so they took rather a long time. Third row

shows that M-PF and M-PFDMA found the lost tar-

get, while PF+Detector took much longer to redis-

cover it. In this situation, past state history of right

side had already been stored, so the memory-based

methods quickly rediscovered the lost target.

FaceAPI is able to detect a face in near frontal

view, so it successfully rediscovered the tracking tar-

get after occlusions if the target was near frontal view

and was observed without blur. While a target is mov-

ing, the observed image becomes blurred. Therefore,

FaceAPI failed to rediscovered the lost target while it

was moving even if it had not been occluded. On the

contrary, M-PFDMA quickly rediscover the lost tar-

get under blurred observation. Thus, M-PFDMA out-

performed other trackers including PF-based trackers

and a commercial non-PF-based tracker.

Figure 5 shows snapshots of the tracking behav-

ior of M-PFDMA from the second video. The snap-

shots are listed in time order. The second video in-

cludes self-occlusion, e.g., turning back while moving

left to right (Fig.5(b)-(e)) and right to left (Fig. 5(g)-

(h)). It also includes scale changes (Fig.5(e)-(f)) and

non-rigid deformations (Fig.5(g)), e.g., facial expres-

sion changes. During natural behaviors as depicted in

Fig.5, our M-PFDMA successfully estimates the po-

sition and rotation of the target while the target is not

occluded.

The above comparisons veriﬁed that M-PFDMA

supports both memory-based quick recovery and

detection-based recovery and so covers the cases in

which the target deviates from stored poses.

4.3 Quantitative Evaluations

For quantitative evaluations, the same videos were

used. First, tracking success ratio was examined.

The tracking success ratio is the ratio of frames in

which the tracker estimated pose and position cor-

rectly as conﬁrmed by manual observation. The video

includes numerous occlusions. Therefore, recovery

performance, i.e. speed of recovery and recoverable

pose range directly impacts the tracking success ra-

tio. Comparison targets are M-PF, PF+Detector, and

FaceAPI.

ENHANCING MEMORY-BASED PARTICLE FILTER WITH DETECTION-BASED MEMORY ACQUISITION FOR

ROBUSTNESS UNDER SEVERE OCCLUSION

213

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 5: Tracking results for M-PFDMA (proposed). White mesh denotes estimated position/pose. While tracking is unsta-

ble, the mesh turns gray. (a) Initialization. (b)-(e) The subject moves from right to left while rotating; (e)-(g) the subject gets

close to and backs away it; (g)-(h) the subject moves from left to right while rotating. During such natural behavior of the

subject, the proposed M-PFDMA occasionally missed the target. However, it rediscovered the target soon after occlusions.

M-PF

PF+Detector

M-PFDMA

(Proposed)

FaceAPI

M-PF

PF+Detector

M-PFDMA

(Proposed)

Figure 6: Tracking results of M-PF, PF+Detector, FaceAPI,

and M-PFDMA are shown. M-PFDMA achieved faster

recovery of lost target than the others. In frame 225,

PF+Detector and M-PFDMA successfully estimated the po-

sition/pose of the target, whereas M-PF didn’t because his-

tory wasn’t stored close to the position/pose (left column).

In frame 400, M-PF and M-PFDMA successfully estimated

the position/pose of the target, but PF+Detector didn’t be-

cause detection requires large computation time (middle

column). In frame 655, only M-PFDMA successfully es-

timated the position/pose of the target (right column).

The results, shown in Table 1, conﬁrm that M-

PFDMA successfully estimated the target’s position

and rotation in 81.5%/ 90.1% (video1/video2) of

frames, whereas M-PF, PF+Detector, and FaceAPI

achieved only 37.5%/ 36.9%, 52.4%/ 62.2%, and

64.2%/ 53.7%, respectively. Figure 6 shows a part

of the tracking results for video1 in more detail.

In Fig. 6, the horizontal axis denotes frame number.

Each line denotes a tracking result. The discontinu-

M-PF

PF+Detector

M-PFDMA

(Proposed)

FaceAPI

Figure 7: Tracking results of M-PF, PF+Detector, FaceAPI,

and M-PFDMA are shown. M-PFDMA achieved faster re-

covery of lost target than the others. Compared with Fig. 6,

a similar property is observed.

(a) M-PF (b) M-PFDMA (proposed)

Figure 8: Stored history by each method; (a) M-PF and (b)

M-PFDMA (proposed method). Positions of stored history

are shown by yellow lines. M-PF obtained history only in

right-top area, where the target face was initialized. Con-

trary, M-PFDMA obtained hitory in wider area.

ities indicate the frames in which target position/pose

were not correctly estimated. Figure 7 shows track-

ing results for video2. As shown in Figs. 6 and 7,

M-PFDMA combined the frames correctly estimated

by PF+Detector and M-PF while avoiding their deﬁ-

ciencies.

Next, the memory acquisition results of the previ-

ous M-PF and the proposed M-PFDMA from video 1

are shown in Fig.8. Yellow lines denote the posi-

tion of the history stored in memory. So, target’s

movements along the yellow lines are supposed to

be tracked stably on the basis of memory-based prior

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

214

Table 1: Comparisons of tracking successful ratio among

M-PF, PF+Detector, FaceAPI, and M-PFDMA (proposed).

When we calculated the tracking successful ratio, the

frames which were not able to track due to occlusion were

excluded.

successful tracking ratio

Method video 1 video2

M-PF 37.5% 36.9%

PF+Detector 52.4% 62.2%

FaceAPI 64.2% 53.7%

M-PFDMA (proposed) 81.5% 90.1%

distribution prediction. The left ﬁgure shows the

output of M-PF, and the right ﬁgure shows that of

M-PFDMA. As shown in Fig.8, M-PF stored his-

tory covered only the top-right area, where tracking

started. This means that the tracker failed to redis-

cover target when the target moved to other areas af-

ter occlusions due to large changes in position/pose

while occlusions. On the contrary, the stored memory

of M-PFDMA covered the entire ﬁeld of view. The

numbers of stored memories of this sequence by M-

PF and M-PFAP are 455 and 808, respectively. The

memory acquisition performance of M-PFDMA un-

der severe occlusion was conﬁrmed.

5 CONCLUSIONS

A memory-based particle ﬁlter with detection-based

memory acquisition, in short M-PFDMA, was pro-

posed for vision-based object tracking. M-PFDMA

offers robust memory acquisition under severe oc-

clusion since it creates a synergistic combination of

detection-based memory acquisition and the memory-

based approach. M-PFDMA was shown to achieve

high accuracy and quick recovery in real-world sit-

uations. We veriﬁed its effectiveness in facial pose

tracking experiments.

Future works include memory management. M-

PFDMA stores the correctly estimated target state

in memory. The correctness is automatically judged

by using the maximum likelihood among particles.

Though it works well in most cases, the quality of

stored data is very important for memory-based prior

distribution prediction. Therefore, we will consider

more precise ways of judging tracking correctness.

Future works also include automatic determination of

the mixing weight α of detection-based and memory-

based prior distribution. Results on this paper em-

ployed static α. However, the ideal α varies by condi-

tions such as current tracking stability. We would like

to reveal factors affecting α, and then, would like to

tackle automatic setting of α according to the factors.

REFERENCES

Andriluka, M., Roth, S., and Schiele, B. (2010). Monocu-

lar 3D pose estimation and tracking by detection. In

CVPR 2010, pages 623–630.

Ba, S. and Odobez, J.-M. (2008). Probabilistic head pose

tracking evaluation in single and multiple camera.

Multimodal Technologies for Perception of Humans,

4625/2008:276–286.

Bradski, G. R. (1998). Computer vision face tracking for

use in a perceptual user interface. In Proc. IEEE Work-

shop Applications of Computer Vision, pages 214–

219.

Comaniciu, D., Ramesh, V., and Meer, P. (2003). Kernel-

based object tracking. IEEE Trans. PAMI, 25:564–

577.

Gordon, N., Salmond, D., and Smith, A. F. M.

(1993). Novel approach to non-linear and

non-gaussian bayesian state estimeation. lEE

Proc.F:Communications Rader and Signal Process-

ing, 140(2):107–113.

Isard, M. and Blake, A. (1998). Condensation - condi-

tional density propagation for visual tracking. IJCV,

29(1):5–28.

Kobayashi, Y., Sugimura, D., Sato, Y., Hirasawa, K.,

Suzuki, N., Kage, H., and Sugimoto, A. (2006). 3D

head tracking using the particle ﬁlter with cascaded

classiﬁers. In BMVC.

Lozano, O. M. and Otsuka, K. (2008). Real-time visual

tracker by stream processing. Journal of VLSI Signal

Processing Systems.

Mikami, D., Otsuka, K., and Yamato, J. (2009). Memory-

based particle ﬁlter for face pose tracking robust under

complex dynamics. In Proc. CVPR, pages 999–1006.

Murphy-Chutorian, E. and Trivedi, M. M. (2008). Head

pose estimation in computer vision: A survey. IEEE

Trans. PAMI.

Otsuka, K., Araki, S., Ishizuka, K., Fujimoto, M., Hein-

rich, M., and Yamato, J. (2008). A realtime multi-

modal system for analyzing group meeting by com-

bining face pose tracking and speaker diarization. In

Proc. ACM ICMI, pages 257–264.

Ozuysal, M., Lepetit, V., Fleuret, F., and Fua, P. (2006).

Feature harvesting for tracking-by-detection. In LNCS

3953(ECCV2006), pages 592–605.

Seeingmachines. FaceAPI. http://

www.seeingmachines.com/product/faceapi/.

Tian, L., Ando, S., Suzuki, A., and Koike, H. (2010). A

probabilistic approach for fast and robust multi-view

face detection using compact local patterns. In Proc.

the IIEEJ Image Electronics and Visual Computing

Workshop.

Tua, J., Taob, H., and Huang, T. (2007). Face as mouse

through visual face tracking. CVIU, 108(1-2):35–40.

ENHANCING MEMORY-BASED PARTICLE FILTER WITH DETECTION-BASED MEMORY ACQUISITION FOR

ROBUSTNESS UNDER SEVERE OCCLUSION

215