The next subsections provide details of each of the
four items introduced above.
3.1 Feature Extraction
For feature extraction, the ellipse-fitting method pre-
sented in (Lee and Grimson, 2002) was used due to
the preliminary nature of this paper and the simplicity
of this method. In addition, it is also referenced in the
other two works (Huang and Wang, 2007; Yu et al.,
2009) that are the main works in which this paper is
based on. The process proposed in (Lee and Grimson,
2002) includes the following steps, as was introduced
in the previous section:
Foreground Segmentation. Each gait sample of the
CASIA Gait Database includes the gait video se-
quence and the corresponding set of frames with
the foreground segmented from the background.
These frames, where the silhouettes are high-
lighted, are used directly in order to make this
proposal more appropriate to be a benchmark for
future comparisons.
Silhouette Extraction. The bounding box that en-
closes all the silhouette pixels is located, and the
resulting reduced image is extracted.
Silhouette Regionalization. The silhouette is di-
vided into seven regions with fixed proportions:
head, chest, back, front thigh, rear thigh, front
calf/foot and rear calf/foot.
Ellipse Fitting. The shape of the foreground pixels
of each region is fitted with an ellipse. For details,
see Figure 1.
Feature Extraction. Four features per ellipse are ex-
tracted: the x and y-coordinates of the centroid,
the orientation of the major axis (α) and the as-
pect ratio (axis
1
/axis
2
). An extra global feature,
which consists of the quotient of the y-coordinate
of the silhouette centroid to the silhouette height,
is also considered.
Gait Representation. To represent the gait video
sample (changes in silhouette poses across the
frames), the mean and the standard deviation of
the four parameters of each ellipse are computed
across all the frames of the sequence. The eight
resulting features of each of the seven ellipses are
concatenated, along with the mean of the extra
global feature, to built a 57-dimensional vector.
3.2 Performance Measures for
Imbalanced Data Sets
A typical metric for measuring the effectiveness of
a learning process is the accuracy of the resulting
classifier over a test or validation set. For a two-
class problem, this index can be easily computed from
a 2 ×2 confusion matrix defined by the True Posi-
tive (TP) and True Negative (TN) cases, which are
the numbers of positive and negative samples cor-
rectly classified, respectively, and the False Positive
(FP) and False Negative (FN) cases, which are the
numbers of negative and positive samples incorrectly
classified, respectively. Accuracy is formulated as
Acc = (T P + T N)/(T P + FN + T N + FP).
However, empirical evidence shows that this mea-
sure can be strongly biased with respect to class im-
balance (Provost and Fawcett, 1997). This shortcom-
ing has motivated the search for new measures suit-
able for imbalanced contexts, for example, (i) True
Positive rate T Pr = T P/(T P + FN); (ii) True Neg-
ative rate T Nr = T N/(TN + FP); (iii) Geometric
mean Gmean =
√
T Pr ∗T Nr, that chooses models in
which both accuracies are high and balanced; and (iv)
Area Under the ROC Curve (AUC), which can be
computed as AUC = (T Pr +T Nr)/2 for a single clas-
sification result.
In this paper, TPr, TNr, Gmean and AUC are com-
puted along with Accuracy to provide enough per-
class knowledge of the classifier performance.
3.3 Classification Model
The classification model consists of an ensemble of
classifiers that can suitably deal with the imbalance
of the training data (Kang and Cho, 2006).
Given an imbalanced two-class training set, a
number of balanced subsets equal to the number of
base classifiers of the ensemble are generated. Each
subset contains all samples of the minority class and
as many randomly selected samples of the majority
class as were needed to obtain a balanced subset. The
ensemble combines, by majority voting, the individ-
ual decisions of base classifiers trained with the corre-
sponding balanced subsets. For details, see Figure 2.
In a gait-based gender recognition task, where
each person is usually represented by several se-
quences of gait frames, the previous process of subset
generation can be performed in two ways. The first
way is to balance the subset at person level, which
means that the same number of women and men are
randomly selected, and all their sequences joined to
form a new subset. It is worth noting that this subset
may not be exactly balanced with respect to the num-
ber of sequences of each gender. The alternative is to
balance at sequence level, which refers to the arbitrary
selection of an equal number of sequences from each
gender. Under this approach, the number of differ-
ent subjects represented in the subset by at least one
A GENDER RECOGNITION EXPERIMENT ON THE CASIA GAIT DATABASE DEALING WITH ITS
IMBALANCED NATURE
441