A GENDER RECOGNITION EXPERIMENT ON THE CASIA GAIT

DATABASE DEALING WITH ITS IMBALANCED NATURE

ul Mart

ın-F

elez, Ram

on A. Mollineda and J. Salvador S

anchez

Institute of New Imaging Technologies (INIT) and Dept. Llenguatges i Sistemes Inform

atics

Universitat Jaume I. Av. Sos Baynat s/n, 12071, Castell

o de la Plana, Spain

Keywords:

Gender recognition, Gait analysis, Class imbalance problem, Human silhouette, Appearance-based method.

Abstract:

The CASIA Gait Database is one of the most used benchmarks for gait analysis among the few non-small-

size datasets available. It is composed of gait sequences of 124 subjects, which are unequally distributed,

comprising 31 women and 93 men. This imbalanced situation could correspond to some real contexts where

men are in the majority, for example, a sports stadium or a factory. Learning from imbalanced scenarios

usually requires suitable methodologies and performance metrics capable of managing and explaining biased

results. Nevertheless, most of the reported experiments using the CASIA Gait Database in gender recognition

tasks limit their analysis to global results obtained from reduced subsets, thus avoiding having to deal with

the original setting. This paper uses a methodology to gain an insight into the discriminative capacity of the

whole CASIA Gait Database for gender recognition under its imbalanced condition. The classiﬁcation results

are expected to be more reliable than those reported in previous papers.

1 INTRODUCTION

The perception of gender determines social interac-

tions. Humans are very accurate at recognizing gen-

der from a face, a voice or the manner in which an in-

dividual walks (gait). Nevertheless, in comparison to

a voice or a face, gait can be perceived at a greater dis-

tance. This particular issue has stirred up the interest

of the computer vision community in creating gait-

based gender recognition systems. In recent years,

this matter has become a hot research area in the com-

puter vision ﬁeld (Yu et al., 2009; Li et al., 2008;

Huang and Wang, 2007). A number of applications

can beneﬁt from the development of such systems,

for example, demographic analysis of a population,

access control, biometric systems, etc.

Apart from being successfully captured at a dis-

tance, gait has additional advantages with regard

to other biometric features: it is non-contact, non-

invasive and, in general, does not require subjects’

willingness. Nevertheless, there are important draw-

backs that make the implementation of a gait-based

gender classiﬁcation system a hard challenge. For in-

stance, gait analysis is very sensitive to deﬁcient or

incomplete segmentation of the subject silhouette, to

variations in clothing and/or footwear, to distortions

in the gait pattern produced by carrying objects or by

changes of mood, to walking speed, and so forth.

These sources of complexity have contributed to

the lack of public databases with a moderate or large

number of gait samples with enough diversity, and

also to the limited usefulness of the research done up

until now. Some of the few non-small-size datasets

available for benchmark purposes are listed in Ta-

ble 1. All of them are unequally distributed in terms of

the number of men and women, and they take into ac-

count some covariates that affect the manner of walk

(viewpoint changes, footwear and clothing changes,

walking surface changes, carrying conditions,etc).

In this work, the CASIA Gait Database (CASIA,

2005) is studied due to its availability and complete-

ness. Some works (Yu et al., 2009; Huang and

Wang, 2007; Lee and Grimson, 2002) have used this

database for gender recognition tasks. However, their

experiments have been formulated on the basis of

small subsets with an equal number of subjects per

class, giving results greatly dependent on singularities

of the subsets. In addition, they measured the classiﬁ-

cation performance in terms of global accuracy, ignor-

ing individual class error rates and possible biased be-

haviours of the classiﬁers. Such practices make it im-

possible to evaluate the potential of the CASIA Gait

Database for gender recognition purposes considering

its true distribution and number of samples. This pa-

439

Martín Félez R., A. Mollineda R. and Salvador Sánchez J. (2010).

A GENDER RECOGNITION EXPERIMENT ON THE CASIA GAIT DATABASE DEALING WITH ITS IMBALANCED NATURE.

In Proceedings of the International Conference on Computer Vision Theory and Applications, pages 439-444

DOI: 10.5220/0002849204390444

 SciTePress

Table 1: Non-small-size gait databases.

Name #Subjects #Men #Women #Sequences

USF HumanID Gait Database (Sarkar et al., 2005) 122 85 37 1870

Soton Gait Large Database (Shutler et al., 2002) 100 84 16 2128

CASIA Gait Database (CASIA, 2005) - Dataset B 124 93 31 13640

per proposes a methodology to gain an insight into

the discriminative capacity of the whole CASIA Gait

Database for gender recognition. The classiﬁcation

model consists of an ensemble of classiﬁers that suit-

ably deal with the imbalance of the training data. The

classiﬁcation results, in terms of suitable performance

measures, are expected to be more reliable than those

reported in previous papers.

2 PREVIOUS WORK

There is a lot of research related to gait-based identiﬁ-

cation, but only a few recent works use gait for gender

recognition.There are two different approaches to de-

scribe gait: i) dynamic features from subjects’ move-

ments (Davis and Gao, 2004; Yoo et al., 2005), and

ii) static attributes from the subject’s appearance (Lee

and Grimson, 2002; Huang and Wang, 2007; Yu et al.,

2009), which implicitly contain information about

his/her movements. The closer related works to this

paper lie in the last approach and are described below.

In (Lee and Grimson, 2002), static features that

describe the silhouette appearance of a human walk-

ing are used for person identiﬁcation and gender

recognition. A segmentation process was applied to

video frames in order to extract human silhouettes,

which were then normalized regarding size and lo-

cation. To represent appearance, human silhouettes

were divided into seven regions that were ﬁtted with

ellipses. To represent movement (changes in silhou-

ette poses across the frames), some parameters of

the ellipses that model the same region are averaged

across all the frames of a sequence, resulting in a set

of 57 attributes per sequence. Classiﬁcation experi-

ments on the MIT Gait Database (MIT, 2001) lead to

an accuracy close to 80%.

A closely related work was presented in (Huang

and Wang, 2007), where the same research methodol-

ogy of (Lee and Grimson, 2002) was applied to a part

of the CASIA Database. A classiﬁcation accuracy of

85% was obtained from averaging 200 runs with dif-

ferent pairs of training and test sets. From the 124

subjects (93 men and 31 women) available, 25 women

and 25 men were randomly selected for each training

set, while another 5 women and 5 men were chosen

for the corresponding test set. Apart from the previ-

ous result, this work proposes an information fusion

experiment in which decisions were based on three

different points of view: front, back and side view.

The gender recognition rate of the fusion scheme was

89.5%, which was higher than those results obtained

from the individual views.

Another recent study (Yu et al., 2009) proposed

a different appearance-based method for gait-based

gender recognition that was tested on the CASIA

Gait Database. Given a sequence of gait silhou-

ettes, a Gait Energy Image (GEI) is created by com-

bining them. The GEI is divided into 5 regions,

head/hairstyle, chest, back, waist/buttocks and legs,

which are weighted as regards a previous psychologi-

cal study. Experiments involved a single subset com-

posed of 31 women and 31 randomly selected men

that fed a Support Vector Machine with a linear ker-

nel. The best classiﬁcation result was an accuracy

of 95.97%. Nevertheless, the use of only one sub-

set raises doubts about the reliability of the result, be-

cause of its dependence on the subset singularities.

3 METHODOLOGY

This paper proposes a methodology to gain an insight

into the discriminative capacity of the CASIA Gait

Database for gender recognition, considering all of its

samples (31 women and 93 men). The experimental

design involves all the samples in contrast to some

previous works (Huang and Wang, 2007; Yu et al.,

2009), where only reduced subsets composed of an

equal number of samples per gender were used.

The methodology has four main supports:

• Feature extraction: as in (Lee and Grimson,

2002), the average values across all frames of a

gait sequence of some parameters of seven el-

lipses that ﬁt silhouette regions are used.

• Performance measures: this work uses well-

known unbiased measures to evaluate the classi-

ﬁcation effectiveness in imbalanced contexts.

• Classiﬁcation model: an ensemble of classiﬁers is

proposed to manage the data imbalance.

• Evaluation of the classiﬁer error: It is estimated

by a 10-fold cross validation repeated 10 times.

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

440

The next subsections provide details of each of the

four items introduced above.

3.1 Feature Extraction

For feature extraction, the ellipse-ﬁtting method pre-

sented in (Lee and Grimson, 2002) was used due to

the preliminary nature of this paper and the simplicity

of this method. In addition, it is also referenced in the

other two works (Huang and Wang, 2007; Yu et al.,

2009) that are the main works in which this paper is

based on. The process proposed in (Lee and Grimson,

2002) includes the following steps, as was introduced

in the previous section:

Foreground Segmentation. Each gait sample of the

CASIA Gait Database includes the gait video se-

quence and the corresponding set of frames with

the foreground segmented from the background.

These frames, where the silhouettes are high-

lighted, are used directly in order to make this

proposal more appropriate to be a benchmark for

future comparisons.

Silhouette Extraction. The bounding box that en-

closes all the silhouette pixels is located, and the

resulting reduced image is extracted.

Silhouette Regionalization. The silhouette is di-

vided into seven regions with ﬁxed proportions:

head, chest, back, front thigh, rear thigh, front

calf/foot and rear calf/foot.

Ellipse Fitting. The shape of the foreground pixels

of each region is ﬁtted with an ellipse. For details,

see Figure 1.

Feature Extraction. Four features per ellipse are ex-

tracted: the x and y-coordinates of the centroid,

the orientation of the major axis (α) and the as-

pect ratio (axis

/axis

). An extra global feature,

which consists of the quotient of the y-coordinate

of the silhouette centroid to the silhouette height,

is also considered.

Gait Representation. To represent the gait video

sample (changes in silhouette poses across the

frames), the mean and the standard deviation of

the four parameters of each ellipse are computed

across all the frames of the sequence. The eight

resulting features of each of the seven ellipses are

concatenated, along with the mean of the extra

global feature, to built a 57-dimensional vector.

3.2 Performance Measures for

Imbalanced Data Sets

A typical metric for measuring the effectiveness of

a learning process is the accuracy of the resulting

classiﬁer over a test or validation set. For a two-

class problem, this index can be easily computed from

a 2 ×2 confusion matrix deﬁned by the True Posi-

tive (TP) and True Negative (TN) cases, which are

the numbers of positive and negative samples cor-

rectly classiﬁed, respectively, and the False Positive

(FP) and False Negative (FN) cases, which are the

numbers of negative and positive samples incorrectly

classiﬁed, respectively. Accuracy is formulated as

Acc = (T P + T N)/(T P + FN + T N + FP).

However, empirical evidence shows that this mea-

sure can be strongly biased with respect to class im-

balance (Provost and Fawcett, 1997). This shortcom-

ing has motivated the search for new measures suit-

able for imbalanced contexts, for example, (i) True

Positive rate T Pr = T P/(T P + FN); (ii) True Neg-

ative rate T Nr = T N/(TN + FP); (iii) Geometric

mean Gmean =

√

T Pr ∗T Nr, that chooses models in

which both accuracies are high and balanced; and (iv)

Area Under the ROC Curve (AUC), which can be

computed as AUC = (T Pr +T Nr)/2 for a single clas-

siﬁcation result.

In this paper, TPr, TNr, Gmean and AUC are com-

puted along with Accuracy to provide enough per-

class knowledge of the classiﬁer performance.

3.3 Classiﬁcation Model

The classiﬁcation model consists of an ensemble of

classiﬁers that can suitably deal with the imbalance

of the training data (Kang and Cho, 2006).

Given an imbalanced two-class training set, a

number of balanced subsets equal to the number of

base classiﬁers of the ensemble are generated. Each

subset contains all samples of the minority class and

as many randomly selected samples of the majority

class as were needed to obtain a balanced subset. The

ensemble combines, by majority voting, the individ-

ual decisions of base classiﬁers trained with the corre-

sponding balanced subsets. For details, see Figure 2.

In a gait-based gender recognition task, where

each person is usually represented by several se-

quences of gait frames, the previous process of subset

generation can be performed in two ways. The ﬁrst

way is to balance the subset at person level, which

means that the same number of women and men are

randomly selected, and all their sequences joined to

form a new subset. It is worth noting that this subset

may not be exactly balanced with respect to the num-

ber of sequences of each gender. The alternative is to

balance at sequence level, which refers to the arbitrary

selection of an equal number of sequences from each

gender. Under this approach, the number of differ-

ent subjects represented in the subset by at least one

A GENDER RECOGNITION EXPERIMENT ON THE CASIA GAIT DATABASE DEALING WITH ITS

IMBALANCED NATURE

441

axis

(

x,y

)

Video Frame

Foreground

segmentation

Silhouette

extraction

(

x,y

)

centroid

Feature

extraction

Ellipse fitting

Silhouette

Regionalization

Figure 1: Feature extraction process.

Women

gait sequences

Subset

Men

gait sequences

Subset

Men

gait sequences

Subset

Men

gait sequences

…

Men

gait sequences

Women

gait sequences

Imbalanced Training set

Women

Subset

Men

Women

Subset

Men

Women

Subset

Men

…

Ensemble of classifiers

SVM Classifier

Balanced Training set

Women

gait sequences

Subset

Men

gait sequences

Balanced Training set

Women

gait sequences

Subset

Men

gait sequences

Balanced Training set

Women

gait sequences

Subset

Men

gait sequences

…

SVM Classifier

Figure 2: Generation of an ensemble of SVM classiﬁers.

sequence is, in general, much greater than that num-

ber in the strategy at person level. In this paper, both

means of balance are implemented.

3.4 Error Evaluation Scheme

A 10-fold cross validation scheme that was repeated

10 times was used to estimate the recognition rates.

The application of a stratiﬁed division method re-

sulted in pairs of training and test partitions with dis-

tributions of samples per class similar to those of the

original dataset. Each imbalanced training subset was

used to feed an ensemble of classiﬁers described in

Section 3.3, which later performed a classiﬁcation

session on the corresponding test subset. Algorithm 1

provides details of this process.

4 EXPERIMENTAL RESULTS

The aim of the experiments is to ﬁnd out the actual ca-

pacity of the CASIA Gait Database for gender recog-

nition from side-view gait sequences. From the 124

people available, distributed into 31 women and 93

men, the gait sequences corresponding to the sub-

ject identiﬁed as 005 (a man) were discarded due to

their low number of frames with foreground informa-

tion and their intractable noise. From the remaining

123 subjects, and the 6 side-view sequences per in-

dividual, a collection of 738 sequences was created

with 186 and 552 samples from women and men, re-

spectively. Each sample was represented by a 57-

dimensional vector, as explained in Section 3.1.

Classiﬁcation results are estimated by repeating

(10 times) a 10-fold cross validation scheme, which

involves an ensemble of 25 Support Vector Machines

(SVM) for managing the imbalance of the training

data. The number of classiﬁers chosen was 25 for

two reasons: i) 25 is an odd number, which avoids

ties, and ii) there is empirical evidence that more than

about 25 base classiﬁers does not provide, in gen-

eral, signiﬁcant improvements to the ensemble accu-

racy (Bauer and Kohavi, 1999).

Three different experiments were designed.

Baseline. The imbalance is ignored and, thus, not

treated. The 10-fold cross validation uses a single

SVM (not an ensemble) fed by the imbalanced

training partitions (see Sections 3.3 and 3.4).

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

442

Algorithm 1. Training/Classiﬁcation algorithm.

for all training fold Tra

( i ∈ 1,10 ) do

for all classiﬁer C

i j

in ensemble ( j ∈1,25) do

womenSubset

i j

← all sequences of all women

from Tra

if balancing at person level then

menSubset

i j

← all sequences of a number

of randomly selected men equal to the num-

ber of women from Tra

else if balancing at sequence level then

menSubset

i j

← a number of randomly se-

lected men sequences equal to the number

of women sequences from Tra

end if

Balanced

i j

← womenSubset

i j

∪ menSubset

i j

Train C

i j

with Balanced

i j

end for

for all sample

in Test

( k ∈ 1,| Test

| ) do

for all trained classiﬁer C

i j

in ensemble do

predLabel

ik j

← Classify sample

on C

i j

end for

predLabel

← Combine all predLabel

ik j

majority voting

Compare predLabel

with actualLabel

for

perfomance measures

end for

Balanced Classes at Sequence Level. The imbal-

ance is managed. The 10-fold cross validation

performs with an ensemble of 25 SVM that

learn from balanced subsets at sequence level,

randomly drawn from the imbalanced training

partitions (see Sections 3.3 and 3.4).

Balanced Classes at Person Level. The imbalance

is managed. The 10-fold cross validation per-

forms with an ensemble of 25 SVM that learn

from balanced subsets at person level, randomly

drawn from the imbalanced training partitions

(see Sections 3.3 and 3.4).

The main difference between the second and third

experiment is that the number of men represented in

the balanced subsets by at least one sequence is, in

general, quite a lot higher in the second than in the

third, although the number of men sequences remains

the same. Therefore, the second experiment has more

diversity in the men class than the third one.

Averaged results of the 10 times 10-fold cross val-

idations are shown in Table 2. When focusing on clas-

siﬁcation accuracy, although the three results are very

similar, the baseline approach achieved better results

than the two other methods. However, this higher ac-

curacy hides a strong imbalance between the recog-

nition rates of the two classes, which are 78.4% and

98% for the women (TPr) and men (TNr) classes, re-

spectively. In the case of the two imbalance-sensitive

methods, the difference between both rates is much

more moderate due to a signiﬁcant improvement in

the TPr, and a slight degradation of the TNr.

A joint view of these two rates is given by the

Gmean and AUC metrics, which compute unbiased

measures of the classiﬁer performance. As regards

these metrics, the approach based on balanced classes

at person level produces results that are quite a lot bet-

ter than those of the baseline experiment. When the

two imbalance-sensitive methods are compared, the

one which balances at person level seems to be able

to better generalize because of the greater number of

sequences for each man represented.

A direct comparison between these results and

those from previous related works is not appropriate

because all of them are deﬁned in terms of differ-

ent feature extraction strategies, classiﬁcation models,

error evaluation schemes and training and test parti-

tions. Nevertheless, their main gender recognition re-

sults on the CASIA Gait Database are shown here to

allow for a broader analysis of results. These works

presented their classiﬁer performance only in terms

of accuracy, and these results are very close to those

introduced in this paper: 85% from (Lee and Grim-

son, 2002), 89,5% from (Huang and Wang, 2007)

and 95.97% from (Yu et al., 2009). However, as was

demonstrated above, accuracy is not a reliable mea-

sure in imbalanced scenarios.

5 CONCLUSIONS

An exhaustive study designed to evaluate the capacity

of the CASIA Gait Database for gender recognition

tasks was carried out. This dataset contains gait sam-

ples from 124 subjects, distributed in an unbalanced

way with 31 women and 93 men. To our knowledge,

the papers that have previously worked on this collec-

tion have avoided dealing with its imbalanced nature

by using reduced balanced subsets. Therefore, there

seems to be no previous results considering the whole

dataset for benchmark purposes.

This paper proposes a methodology to learn from

the CASIA Gait Database while dealing with its im-

balanced complexity, and to suitably evaluate the ef-

fectiveness of the resulting classiﬁer. In particular, a

distributed learning approach within a classiﬁer en-

semble, and some metrics to appropriately measure

the classiﬁcation performance in an imbalanced con-

text, like AUC and the geometric mean of per-class

A GENDER RECOGNITION EXPERIMENT ON THE CASIA GAIT DATABASE DEALING WITH ITS

IMBALANCED NATURE

443

Table 2: Experimental results.

Measure/Experiment Baseline Balancing At Sequence Level Balancing At Person Level

Accuracy 93.1% ±0.77% 92.1% ±0.61% 91.8% ±0.56%

TPr 78.4% ±1.92% 84.5% ±1.2% 87% ±1.55%

TNr 98% ±0.59% 94.6% ±0.53% 93.4% ±0.5%

Gmean 87.6% ±1.17% 89.4% ±0.79% 90.1% ±0.86%

AUC 88.2% ±1.07% 89.6% ±0.77% 90.2% ±0.84%

success rates were considered.

The imbalance-sensitive approach was compared

with a plain method based on a single classiﬁer. When

the global classiﬁcation accuracy was used, the results

of both strategies were very similar but, when AUC

and Gmean were considered, the proposed strategy

was signiﬁcantly better. This result can be explained

by scrutinizing the recognition rates per class since

the proposed approach improved this rate quite a lot

for the women/minority class, while the rate for the

men/majority class was only slightly reduced.

Regarding the use of the whole CASIA Gait

Database through its own silhouette frames and the

application of standard methods of learning and error

estimation, the strategy proposed here could become a

good benchmark for future comparisons of gait-based

gender recognition. This preliminary work could be

improved by using that strategy on more databases,

with other classiﬁers and taking into account other

feature extraction methods.

ACKNOWLEDGEMENTS

Partially funded by projects CSD2007-00018 and CI-

CYT TIN2009-14205-C04-04 from the Spanish Min-

istry of Innovation and Science, P1-1B2009-04 from

Fundaci

o Caixa Castell

o-Bancaixa and grant PRE-

DOC/2008/04 from Universitat Jaume I. Portions

of the research in this paper use the CASIA Gait

Database collected by Institute of Automation, Chi-

nese Academy of Sciences.

REFERENCES

Bauer, E. and Kohavi, R. (1999). An empirical comparison

of voting classiﬁcation algorithms: Bagging, boost-

ing, and variants. Mach. Learning, 36(1-2):105–139.

CASIA (2005). CASIA Gait Database.

http://www.sinobiometrics.com.

Davis, J. and Gao, H. (2004). Gender recognition from

walking movements using adaptive three-mode PCA.

In IEEE CVPR, Workshop on Articulated and Non-

rigid Motion, volume 1.

Huang, G. and Wang, Y. (2007). Gender classiﬁcation based

on fusion of multi-view gait sequences. In Proc. 8th

Asian Conference Computer Vision, pages 462–471.

Kang, P. and Cho, S. (2006). EUS SVMs: Ensemble of

under-sampled SVMs for data imbalance problems. In

ICONIP, pages 837–846.

Lee, L. and Grimson, W. (2002). Gait analysis for recogni-

tion and classiﬁcation. Proc. 5th IEEE Int’l. Conf. on

Automatic Face and Gesture Recogn., pages 155–162.

Li, X., Maybank, S., Yan, S., Tao, D., and Xu, D.

(2008). Gait components and their application to gen-

der recognition. IEEE Trans. SMC-C, 38(2):145–155.

MIT (2001). Human Gait Recognition Database.

MIT Artiﬁcial Intelligence Lab (Cambridge).

http://www.ai.mit.edu/projects/gait/.

Provost, F. and Fawcett, T. (1997). Analysis and visual-

ization of classiﬁer performance: Comparison under

imprecise class and cost distributions. In Proc. of the

3rd ACM SIGKDD, pages 43–48.

Sarkar, S., Phillips, P., Liu, Z., Vega, I., Grother, P., and

Bowyer, K. (2005). The HumanID gait challenge

problem: data sets, performance, and analysis. IEEE

Trans. on PAMI, 27(2):162–177.

Shutler, J., Grant, M., Nixon, M. S., and Carter, J. N. (2002).

On a large sequence-based human gait database. In

Proc. 4th Int’l Conf. on RASC, pages 66–71.

Yoo, J., Hwang, D., and Nixon, M. (2005). Gender classiﬁ-

cation in human gait using support vector machine. In

Proc. ACIVS, pages 138–145.

Yu, S., Tan, T., Huang, K., Jia, K., and Wu, X. (2009). A

study on gait-based gender classiﬁcation. IEEE Trans-

actions on Image Processing, 18(8):1905–1910.

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

444