IDENTIFICATION AND RECONSTRUCTION OF COMPLETE GAIT

YCLES FOR PERSON IDENTIFICATION IN CROWDED SCENES

Martin Hofmann, Daniel Wolf and Gerhard Rigoll

Institute for Human-Machine Communication, Technische Universit¨at M¨unchen

Arcisstr. 21, Munich, Germany

Keywords:

Gait recognition, Person identiﬁcation, Occlusions, Gait cycle reconstruction, Graph cuts.

Abstract:

This paper addresses the problem of gait recognition in the presence of occlusions. Recognition of people

using their gait has been an active research area and many successful algorithms have been presented. However

to this point non of the methods addresses the problem of occlusion. Most of the current algorithms need a full

gait cycle for recognition. In this paper we present a scheme for reconstruction of full gait cycles, which can be

used as preprocessing step for any state-of-the-art gait recognition method. We test this on the TUM-IITKGP

gait recognition database and show a signiﬁcant performance gain in the case of occlusions.

1 INTRODUCTION

Person identiﬁcation using gait information has be-

come an established ﬁeld of research. In order to

identify people, current applications successfully use

physiologic features such as face, iris and ﬁngerprint.

However it is also possible to detect people using be-

havior based features such as voice, dialect, signature

and gait. The main advantage of using gait features

over other features (like face, iris, ﬁngerprint) is the

possibility to identify people from large distances and

without the person’s direct cooperation. For exam-

ple, in low resolution images, a person’s gait signa-

ture can be extracted, while the face is not even visi-

ble. Also no direct interaction with a sensing device

is necessary, which allows for undisclosed identiﬁca-

tion. Thus gait recognitionhas great potential in video

surveillance, tracking and monitoring.

A major challenge for recognition of people using

gait is that almost all current approaches (Han and

Bhanu, 2006)(Lee and Grimson, 2001)(Wang et al.,

2003) need a sequence of the video, where a complete

gait cycle of the walking person is visible. A complete

gait cycle is a sequence starting with one foot forward

and ending with the same foot forward.

In most databases this is not a problem, because

the databases are constructed to always show the com-

plete gait cycles. However in actual real world ap-

plications, fully visible gait cycles can not always be

guaranteed. Assume for example an airport, a train

station or other crowded places. In these scenarios,

the gait cycles can be corrupted due to occlusions by

other walking pedestrians. Current approaches fail in

these cases.

We thus present a method to overcome these lim-

itations. Therefore we propose a preprocessing stage,

which effectively performs motion segmentation on

the input video and reconstructs synthetic complete

gait cycles using partial information available from

the corrupted and occluded input sequences.

In principle our methods consists of two separate

parts: First, all walking people are detected, tracked

and accurately segmented. Thus the ﬁrst part can be

thought of as an application of motion segmentation.

Once people are segmented in each frame, in the sec-

ond part, the gait cycle of each person can be analyzed

and complete gait cycles can be reconstructed.

We test our method on the TUM-IITKGP gait

recognition database (Hofmann et al., 2011). This

database features sequences where each person is

completely visible and other sequences where each

person is occluded by other pedestrians. On the two

baseline algorithms, we show that our preprocessing

greatly increases recognition results in the case of oc-

clusions.

In Section 2 and 3 we present the two processing

parts. We show results in Section 4 and we conclude

in Section 5.

594

Hofmann M., Wolf D. and Rigoll G..

IDENTIFICATION AND RECONSTRUCTION OF COMPLETE GAIT CYCLES FOR PERSON IDENTIFICATION IN CROWDED SCENES.

DOI: 10.5220/0003329305940597

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2011), pages 594-597

ISBN: 978-989-8425-47-8

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

2 PERSON TRACKING

AND SEGMENTATION

In this section we present our motion segmentation

method, which is used to detect, track and accurately

segment a varying number of people in the input

videos.

We use a graph based image segmentation tech-

nique. Here an image is represented as an undirected

graph. Nodes correspond to pixels and a cut through

connecting edges reveils the segmentation. A simi-

lar approach for motion segmentation is for example

(Bugeau and P´erez). Our approach, however, is dif-

ferent in that we incorporate an explicit counting of

people using mean shift.

We deﬁne P the set of all pixels in a frame and

L the set of possible labels. We denote by l(p) ∈ L

the labeling for a speciﬁc pixel p ∈ P and l(P) =

{l(p)|p ∈ P } the set of all label assignments. N

denotes the standard 4-connected neighborhood and

consists of the corresponding pixel pairs (p, q). With

these deﬁnitions, the Energy deﬁnition becomes:

C(l(P)) =

∑

p∈P

ata

(l(p))

+ γ·

∑

(p,q)∈N

mooth

(l(p), l(q))

+ δ·C

Labels

(l(P )) (1)

The data term describes of assigning labels to indi-

vidual nodes. The smoothness term ensures smooth-

ness of the resulting objects and the label term dis-

courages the use of too many (small) objects. The

data and smoothness term are explained in detail be-

low, the label term C

Labels

= |L

′

| is the number of la-

bels used in the ﬁnal assignment.

2.1 Data Term

The data term C

Data

(l(p)) describes the costs asso-

ciated with assigning label l(p) to pixel p. In our

work, the data term consists of a term deﬁned by

background subtraction and a term deﬁned by optical

ﬂow.

Data

(l(p)) = α ·C

(l(p)) +C

(l(p)) (2)

For the background model, we use the ﬁrst frame

of the sequence (which is always empty in the used

database). We use the YCbCr color space and take

the absolute difference in intensity I(p) as the mea-

sure. We do efﬁcient shadow suppression by setting

I(p) = 0 (thus background) for pixels, whose Cr and

Cb values are similar to the background model but

Figure 1: Predicting label assignment using optical ﬂow.

may have a signiﬁcant intensity difference. Let l

the label of the background and L

be the set of all

other labels representing people. Then the costs for

labeling pixel p with label l becomes

(l(p)) =

(

I(p) f¨ur l = l

(1− I(p)) f¨ur l ∈ L

(3)

The background term C

described above is able

to separate background from moving objects, how-

ever naturally it is not possible to distinguish between

multiple walking people. Thus we additionally incor-

porated an optical ﬂow term C

, which (1) separates

objects moving in different directions and at the same

time (2) ensures consistent tracking of objects (i.e.

keeping the objects identity)

We ﬁrst apply mean shift clustering on the out-

put from optical ﬂow (Farneb¨ack, 2002). This not

only ﬁnds the number of objects in the frame, but

also gives a rough estimate on where the respective

objects can be found. More speciﬁcally we use a 3-

dimensional mean shift, with x, y and φ dimensions.

Here φ correspondsto the direction of the ﬂow at pixel

p. We set the size of the used mean shift vector to ap-

proximately the expected height and width of a person

([80 × 200] Pixels), as well as 80

◦

for the ﬂow direc-

tion.

Because the mean shift clustering is independent

from previous frames, the labels from the mean shift

clustering are arbitrary and have to be matched to the

labeling of the energy formulation. This is illustrated

in Figure 1. First, a predicted labeling is calculated

from the previous frame using the optical ﬂow esti-

mate v(p) at each pixel:

l(p + v(p)) = l(p). Then

each mean shift cluster is assigned the label which

best ﬁts to the predicted labellings.

Let P

be the set of pixels in the mean shift cluster

which best ﬁts the original pixels in cluster l. Then

the contribution to the data term is deﬁned as

(l(p)) =

(

−β f¨ur p ∈ P

0 sonst

(4)

Thus, assigning a given label is encouraged by β,

if a corresponding object has been found by the mean

shift clustering.

IDENTIFICATION AND RECONSTRUCTION OF COMPLETE GAIT CYCLES FOR PERSON IDENTIFICATION IN

CROWDED SCENES

595

2.2 Smoothness Term

The smoothness term models the similarity of the

color of adjacent pixels p

and p

. We set it as fol-

lows:

Smooth

(l(p

), l(p

)) =

(

0 l(p

) = l(p

)

−

kc(p

)−c(p

l(p

) 6= l(p

)

(5)

Here, c(p

) is the color vector of pixel p

2.3 Energy Minimization

For minimizing the energy of 1, we use the α-

expansion algorithm (Boykov et al., 2001)(Kol-

mogorov and Zabih, 2004) including label term (De-

long et al., 2010). Thus we seek to ﬁnd the optimal

labeling

l(P) = argminC(l(P )).

3 GAIT RECONSTRUCTION

The second part of the paper describes how to de-

tect occlusions and how to compensate for them. The

main idea is to replace silhouettes, which are com-

pletely or partly occluded, by silhouettes from other

frames, which are not occluded. To this end, two steps

are necessary: First all silhouettes which are occluded

are found. Then the gait period is calculated and lastly

so called ”reconstruction frames” are searched and

used to replace the corrupted silhouettes.

3.1 Detection of Occlusions

A good feature to detect occluded frames is the num-

ber of pixels in the silhouette. Figure 2 shows the

number of pixels in a tracked silhouette. It can be

seen that frames between 145 and 150 are deﬁnitely

occluded. In order to ﬁnd the precise range of oc-

cluded frames we ﬁrst calculate the median M

pixels. Silhouettes which have less than

· M

pix-

els are deﬁnitely occluded. From these silhouettes we

search backward and forward until the number of pix-

els goes above0.95·M

and all pixels in this range are

also considered part of the occluded range.

3.2 Measuring the Gait Period

For the last step of ﬁnding reconstruction frames, it is

necessary to have a good estimate of the gait period

T. In order to ﬁnd the gait period, for each frame,

the lower half of the silhouette is correlated with the

lower parts of the silhouettes in all other frames. The

difference in frame number to the frame with the best

2000

4000

6000

8000

10000

12000

number of pixels

100 110 120 130 140 150 160 170 180 190 200

frame number

number of pixels

nr of pixels

median

Figure 2: Number of pixels in a silhouette.

match is recorded in a list. The median frame differ-

ence in this list yields the gait period we are looking

for. This relatively simple method performs very re-

liably and correctly ﬁnds the gait period for all the

sequences in the database.

3.3 Finding Reconstruction Frames

The goal is to reconstruct the silhouettes which are

corrupted by occlusions. The procedure is best de-

scribed using Figure 3. Here, frames 48-50 (red) are

the occluded frames which are to be reconstructed.

We seek to ﬁnd silhouettes from other frames which

are similar to the occluded ones. This is possible, be-

cause it can be assumed that the sequence consists of

multiple gait cycles, which are very similar to each

other. Thus the corrupted frames can be replaced by

corresponding frames from another gait cycle. Of

course, the corrupted frames themselves cannot be

used to ﬁnd corresponding frames. Thus we take the

last non-corrupted frame (frame 47 in Figure 3) and

search for a best match using normalized correlation.

It is important to note that this search may not be

performed on all other available frames, but only on

those which are in a search region, either one or more

gait periods ahead or in the past. This is necessary

to avoid incorrect matches for example to a silhouette

where the opposite foot is extended. Thus, the possi-

ble search ranges are S

= [kT − ∆, kT + ∆], ∀k 6= 0.

Once the best matching frame is found, the corre-

sponding frames are copied to replace the corrupted

frames.

one gait period back

one gait period forward

regular framesearch window

search window

last non-occluded frame

occluded frames

Figure 3: Search area for ﬁnding the reconstruction frames.

VISAPP 2011 - International Conference on Computer Vision Theory and Applications

596

4 RESULTS

To our knowledge, not much work has focused on

gait recognition in the presence of occlusions. How-

ever it is a very important issue when moving to-

wards a real working gait recognition system. There

are many gait recognition databases, e.g. HumanID,

UCSD, CMU Mobo, Soton, CASIA, and others (Hof-

mann et al., 2011). We use the TUM-IITKGP gait

recognition database (Hofmann et al., 2011), because

it focuses on gait recognition with occlusions. This

database features 35 individuals which are recorded

in six different conﬁgurations (Regular, hands-in-

pocket, backpack, gown, dynamic occlusions, static

occlusions). In this paper we use the ﬁrst conﬁgura-

tion for training and focus on the last two conﬁgura-

tions for recognition. For all our experiments we set

[α = 200, β = 80, γ = 30, δ = 40.000].

We compare our results to the two baseline algo-

rithms described in (Hofmann et al., 2011). In Table

1 evaluation results are shown. The ﬁrst algorithm

(based on color histograms) is clearly outperformed.

The second baseline algorithm (based on Gait Energy

Image) could not work without the preprocessing we

propose in this paper.

Table 1: Top 1 recognition rates for Baseline 1 (Color His-

togram) and Baseline 2 (Gait Energy Image).

dynamic

occlusions

static

occlusions

Baseline 1 43.7% 70.0%

Baseline 1 + ours 84.3% 87.5%

Baseline 2 - -

Baseline 2 + ours 67.2% 72.7%

Qualitative results of our proposed reconstruction

method are shown in Figure 4. It can be seen that the

corrupted sequence is nicely reconstructed.

5 CONCLUSIONS

In this paper we have shown a preprocessing stage

that allows to reconstruct complete gait cycles such

that gait recognition is possible in spite of occlusions.

In principle this preprocessing can be applied to any

gait recognition algorithm. In our experiments we

have shown that a preprocessing like ours is in fact

beneﬁcial in the case of occlusions.

(a) Sequence with occlusions

(b) Reconstructed sequence

Figure 4: Example of reconstruction (b) of gait cycle which

was originally (a) corrupted by occlusions.

REFERENCES

Boykov, Y., Veksler, O., and Zabih, R. (2001). Efﬁcient ap-

proximate energy minimization via graph cuts. IEEE

TPAMI, Number 20(12):1222-1239.

Bugeau, A. and P´erez, P. Track and cut: Simultaneous

tracking and segmentation of multiple objects with

graph cuts.

Delong, A., Osokin, A., Isack, H. N., and Boykov, Y.

(2010). Fast approximate energy minimization with

label costs. CVPR.

Farneb¨ack, G. (2002). Polynomial Expansion for Orienta-

tion and Motion Estimation. PhD thesis, Link¨oping

University, Sweden.

Han, J. and Bhanu, B. (2006). Individual recognition using

gait energy image. IEEE TPAMI, Volume 28, Number

Hofmann, M., Sural, S., and Rigoll, G. (2011). Gait recog-

nition in the presence of occlusion: A new dataset and

baseline algorithms. In International Conferences on

Computer Graphics, Visualization and Computer Vi-

sion (WSCG).

Kolmogorov, V. and Zabih, R. (2004). What energy func-

tions can be minimized via graph cuts? IEEE TPAMI,

Number 26(2):147-159.

Lee, L. and Grimson, W. E. L. (2001). Gait analysis for

recognition and classiﬁcation. MIT Artiﬁcial Intelli-

gence Lab, Cambridge.

Wang, L., Tan, T., Ning, H., and Hu, W. (2003). Silhouette

analysis-based gait recognition for human identiﬁca-

tion. IEEE TPAMI, Vol. 25, No. 10.

IDENTIFICATION AND RECONSTRUCTION OF COMPLETE GAIT CYCLES FOR PERSON IDENTIFICATION IN

CROWDED SCENES

597