Fast Gait Recognition from Kinect Skeletons

Tanwi Mallick, Ankit Khedia, Partha Pratim Das and Arun Kumar Majumdar

Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur 721302, India

Keywords:

Gait Recognition, Kinect Skeleton Stream.

Abstract:

Recognizing persons from gait has attracted attention in computer vision research for over a decade and a

half. To extract the motion information in gait, researchers have either used wearable markers or RGB videos.

Markers naturally offer good accuracy and reliability but has the disadvantage of being intrusive and expensive.

RGB images, on the other hand, need high processing time to achieve good accuracy. Advent of low-cost depth

data from Kinect 1.0 and its human-detection and skeleton-tracking abilities have opened new opportunities

in gait recognition. Using skeleton data it gets cheaper and easier to get the body-joint information that can

provide critical clue to gait-related motions. In this paper, we attempt to use the skeleton stream from Kinect

1.0 for gait recognition. Various types of gait features are extracted from the joint-points in the stream and

the appropriate classiﬁers are used to compute effective matching scores. To test our system and compare

performance, we create a benchmark data set of 5 walks each for 29 subjects and implement a state-of-the-art

gait recognizer for RGB videos. Tests show a moderate accuracy of 65% for our system. This is low compared

to the accuracy of RGB-based method (which achieved 83% on the same data set) but high compared to

similar skeleton-based approaches (usually below 50%). Further we compare execution time of various parts

of our system to highlight efﬁciency advantages of our method and its potential as a real-time recogniser if an

optimized implementation can be done.

1 INTRODUCTION

Human gait is an important indicator of health and

serves as an identiﬁcation mark for an individual. It

was ﬁrst studied by the biologists because it can pro-

vide great information about health, with applications

ranging from diagnosis, monitoring, and rehabilita-

tion. However, now it is also accepted as unique iden-

tiﬁer for an individual and so can be considered for

identiﬁcation and authentication of an individual.

In this paper, we try to use Kinect

1.0 for de-

tecting the gait of an individual. There are various

systems available for gait analysis like wearable sen-

sors, marker-based systems and Kinect is the latest

technique in this race. However each has got its own

pros and cons and their usage can be judged accord-

ing to the context. The marker-based systems are the

most accurate system used for gait analysis but they

are generally very costly and can be used only in lab-

oratory or controlled environments. Then comes the

wearable sensors which are cheap, small, lightweight,

mobile but they are intrusive, that is, the subject has to

Kinect for XBox One. has been released a while after

this work was completed. This is called Kinect 2.0 now.

wear those sensors. Also they must account for signal

drift and noise and must be placed correctly.

The latest sensor used for gait analysis is Kinect.

It is cheaper compared to the above two and non-

intrusive and can measure a wide range of gait pa-

rameters using the sensor and Software Development

Kit (SDK).

However the problem is that the joint points are

approximated by the Kinect and hence are not very

accurate. But, almost all the gait detection systems

ﬁrst try to locate the joint points and then extract fea-

tures using it so in spite of less accuracy we would

still try to exploit this feature of Kinect in this paper

so as to obtain maximum possible accuracy out of it

as the overhead of joint extraction is removed.

Our objective is to identify an individual on the

basis of her gait with maximum use of joint informa-

tion in extracting various features, use of depth data

information for increasing the accuracy, determining

which of the features are more crucial over other and

looking into different classiﬁcation algorithms for dif-

ferent types of features.

The paper is organized as follows. Section 2 dis-

cusses the prior work in this area. We deﬁne the fea-

342

Mallick, T., Khedia, A., Das, P. and Majumdar, A.

Fast Gait Recognition from Kinect Skeletons.

DOI: 10.5220/0005713903400347

In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016) - Volume 3: VISAPP, pages 342-349

ISBN: 978-989-758-175-5

tures of gait and their extraction in Section 3. Section

4 discusses the classiﬁers. Experiments and Results

are explained in Section 5. Finally, we conclude in

Section 6.

2 RELATED WORK

In vision research, there has been a lot of experiments

to recognize people from gait. The gait detection

problem also gives information about the well-being

of an individual as well as can be used for recogni-

tion. Different works propose different approaches to

the problem of gait detection. The gait recognition ap-

proach can broadly be of two types – Marker-based or

Marker-less. Marker-based approaches make use of

wearable sensors for gait detection and the marker-

less approaches consist of using video cameras or

Kinect for gait detection (Stone and Skubic, 2011),

(Gabel et al., 2012), (Preis et al., 2012), (Sinha et al.,

2013), (Wang et al., 2003), (Isa et al., 2005), (Ball

et al., 2012).

2.1 Marker-based Approach

The marker-based approaches use of some sensors

placed on the subject. Moving Light Display (MLD)

is a light pattern corresponding to the moving sub-

jects. Johansson (Johansson, 1973), (Johansson,

1975) showed that humans can quickly identify a

moving light display (MLD), corresponds to a walk-

ing human but when presented with a static image

from the MLD, humans are unable to recognize any

structure. This was the basis of the marker-based sys-

tems for gait recognition. Tanawongsuwan & Bo-

bick (Tanawongsuwan and Bobick, 2001) use joint-

angle trajectories measured using a magnetic-marker

motion-capture system. There is also relevant work in

the computer animation ﬁeld, including that of recov-

ering underlying human skeletons from motion cap-

ture data (O’Brien et al., 2000),(Silaghi et al., 1998)

and analysing and adjusting characteristics of joint-

angle signals (Sudarsky and House, 2000), (Bruder-

lin et al., 1996), (Bruderlin and Williams, 1995) and

(Brand and Hertzmann, 2000).

2.2 Marker-less Approach

This approach uses RGB video cameras or Kinect for

data acquisition, extracts each frame from the video

and then performs image processing to extract the rel-

evant features for recognition. They can be broadly

divided into two categories – RGB and RGB-D

RGB images with depth information.

RGB

The spatial and temporal features are mainly extracted

from the RGB frames in various ways. Ran et.

al. (Ran et al., 2007) use Hough Transform to extract

the main leg angle and use Bayesian Classiﬁer for gait

detection. Jean et. al. (Jean et al., 2009) proposed the

use of trajectories of signiﬁcant body points like the

head and feet for gait detection.

Model-free Human body silhouette is the most

frequently used initial feature, which can be easily

obtained from background subtraction. Boulgouris &

Chi (Boulgouris and Chi, 2007) use Radon transform

on silhouette to extract the feature of each frame, and

employ Linear Discriminant Analysis (LDA) for di-

mensionality reduction. A similar method has been

used in (Wang et al., 2003) where Wang et. al. detect

gait patterns in a video sequence and develop an eigen

gait or gait signature for the particular video.

Ben-Abdelkader et. al. (BenAbdelkader et al.,

2001) exploit the self similarity to create a represen-

tation of gait sequences that is useful for gait recog-

nition. Quasi gait methods rely on various static fea-

tures like build of the body. One advantage to quasi

gait approaches is that they may be less sensitive to

variation in a gait. For example, the gait of a person

may vary for various reasons, but their skeletal dimen-

sions will remain constant. Bobick & Johnson (Bo-

bick and Johnson, 2001) measured a set of four pa-

rameters that describe a static pose extracted from a

gait sequence.

Kellokumpu et. al. (Kellokumpu et al., 2009) as-

sume time as the third dimension other than XY axes

in the image plane, so that consider the accumulation

of gait sequence as XY T three-dimensional space.

Davis & Bobick (Davis and Bobick, 1997) describe

a Motion Energy Image (MEI) and a Motion History

Image (MHI), both derived from temporal image se-

quences.

RGB-D

With the development of depth imaging, researchers

has also tried using them for Gait analysis. The depth

information can be obtained by using multiple cam-

eras, stereo cameras or Kinect.

Non-Kinect

There have been several research in this ﬁeld using

non-Kinect based techniques. Igual et. al. (Igual

et al., 2013) presented an approach for gait-based

gender recognition using depth cameras. The main

contribution of this study was a new fast feature ex-

traction strategy that uses the 3D point cloud ob-

Fast Gait Recognition from Kinect Skeletons

343

tained from the frames in a gait-cycle. Ioanaddis et.

al. (Ioannidis et al., 2007) proposed the use of inno-

vative gait identiﬁcation and authentication method

based on 2-D and 3-D features. The data was captured

using stereo camera which can be used to extract the

depth information.

Kinect

Stone et. al. (Stone and Skubic, 2011) has tried to

ﬁnd any anomality of the subject on the basis of the

walking speed and stride length using depth informa-

tion returned by Kinect. Preis et. al. (Preis et al.,

2012) proposed to directly calculate the static fea-

tures(length of the bones) from the actual 3-D coor-

dinates of the joints returned by the Kinect and pre-

sented some results using different classiﬁcation al-

gorithms and compared the performance of different

algorithms. Sinha et. al. (Sinha et al., 2013) has pre-

sented the use of static, distance and area features

with Neural Network learning for gait recognition us-

ing Kinect. Ball et. al. (Ball et al., 2012) has used

Kinect for gait recognition. It has taken a very small

dataset of 4 subjects and tried to do unsupervised clus-

tering using angular features and K-means clustering

algorithm. Some works has basically tried to calcu-

late some gait features like stride length, speed us-

ing the marker based techniques and the Kinect based

techniques and tried to ﬁnd the accuracy of the Kinect

based systems considering the other one as the stan-

dard (Gabel et al., 2012), (Stone and Skubic, 2011).

Chattopadhyay et. al. (Chattopadhyay et al., 2014)

explored the applicability of Kinect RGB-D streams

in recognizing gait patterns of individuals. Gait En-

ergy Volume (GEV) is a feature that performs gait

recognition in frontal view using only depth image

frames from Kinect.

In this work we judiciously select and combine the

features, through a set of detailed experiments, to get

maximum skeleton based recognition in optimal time.

3 FEATURE EXTRACTION

Gait is a continuous yet periodic process. Hence it

is usually studied and analysed in terms of the half-

gait-cycle. We deﬁne various features (usually over

a half-gait-cycle) and discuss how they are extracted

and what their characteristics are. The half-gait-cycle

and the features are deﬁned in terms of the 3D joint-

points of the 20-joints’ skeletal model (Figure 1) re-

turned by Kinect in every frame. The skeleton stream

is ﬁrst cleaned up using the moving-average ﬁlter (of

window size 8) to reduce noise due to sudden spikes.

Figure 1: 20-joints’ skeletal model tracked by Kinect. We

refer to RIGHT as ’R’, LEFT as ’L’, and ’CENTER’ as ’C’.

Figure 2: An example of half-gait-cycle extraction.

3.1 Half-Gait-Cycle Detection

Consider the plot (Figure 2) of the absolute differ-

ence of X-coordinates D

between left and right an-

kle joint-points over consecutive frames. Formally,

= |ANKLE L(k).x − ANKLE R(k).x| for 1 ≤ k ≤

N, where N = total number of frames for an individ-

ual side-walk (N > 1). The plot is ﬁrst cleaned up

using the moving-average ﬁlter (of window size 3) to

reduce noise. The half-gait-cycle is then deﬁned as

the frames between two consecutive local minima in

this plot.

We use six types of features, namely, Static, Area,

Distance, Dynamic, Angular, and Contour-based fea-

tures here. The ﬁrst 5 features are extracted from the

skeleton stream while the contour-based features are

extracted from the depth stream as detailed in the next

few sections.

3.2 Static Features (10-tuple)

The static features estimate the physique of the user.

They are invariant over movements. We deﬁne 10

static features (Table 1) in terms of the Euclidean dis-

tance, d(.,.) between the adjacent joint-points. To es-

timate these features we consider the median of these

values over the entire video to annul the effects of in-

termittent spikes.

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

344

Table 1: Static features in terms of joint-points. The last 4

features are both for right and left limbs (X = R or L).

Height = d(HEAD, SHOULDER C) +

d(SHOULDER C, SPINE) +

d(SPINE, HIP C) +

d(HIP L, KNEE L) +

d(KNEE L, ANKLE L)

Torso = d(SHOULDER C, SPINE) +

d(SPINE, HIP C)

Upper Arm (X) = d(ELBOW X, SHOULDER X)

Forearm (X) = d(ELBOW X, WRIST X)

Thigh (X) = d(HIP X, KNEE X)

Lower Leg (X) = d(KNEE X, ANKLE X)

3.3 Area Features (2-tuple)

During side-walk the upper (lower) part of the body

sweeps a certain area by the swing and spread of the

hands (legs). Each such area, usually, is a distinguish-

ing factor for an individual. It is deﬁned as the area

of the XY -projection of a closed polygon of N joint-

points ~p

= (x

),0 ≤ i ≤ N,3 ≤ N ≤ 20, selected

for the side-walk. It is given by A =

∑

i=0

∗ y

−

∗ x

); j = (i + 1) mod N.

We consider two discriminating areas deﬁned as:

Upper Body SHOULDER C, SHOULDER R, HIP R, HIP C,

HIP L, & SHOULDER L

Lower Body HIP C, HIP R, KNEE R, ANKLE R, ANKLE L,

KNEE L, & HIP L

The area feature vector is computed as the mean

of these numbers over a half-gait-cycle.

3.4 Distance Features (4-tuple)

The Euclidean distance between the centroid of a

body part and the centroid of the upper body is usually

unique for an individual. The body part is contained

by a closed polygon of N vertices and the centroid

is computed as ~c =

∑

i=0

. We consider four dis-

tances – to the centroids of both hands and legs. The

corresponding polygons are deﬁned as:

Upper Body SHOULDER C, SHOULDER R, HIP R, HIP C,

HIP L, & SHOULDER L

Right Hand SHOULDER R, ELBOW R, & WRIST R

Left Hand WRIST L, ELBOW L, & SHOULDER L

Right Leg HIP R, KNEE R, & ANKLE R

Left Leg ANKLE L, KNEE L, & HIP L

The distance feature vector is computed as the

mean of these numbers over a half-gait-cycle.

3.5 Dynamic Features (2-tuple)

The Stride Length and Speed of the subject form the

dynamic features. Consider the plot (Figure 2) of the

(a) (b)

Figure 3: Angular Features θ, φ, and ρ from (Isa et al.,

2005), where S

≡ HIP X, S

≡ KNEE X, S

≡ AN-

KLE X, S

≡ FOOT X, and X = RIGHT or LEFT.

absolute difference of X-coordinates between left and

right ankle joint-points over consecutive frames. The

gap (in X-coordinate) between the alternate maxima’s

(or minima’s) in this plot gives the step lengths. We

take the median of step lengths as the stride length.

We compute the number of frames in a stride and

using the Kinect’s frame rate as 30 fps, we com-

pute the speed of the subject as stride length/stride

time. These dynamic features are situation dependent

and can vary abruptly. Yet they often contain some

individual-speciﬁc information that can improve the

overall accuracy.

3.6 Angular Features (6-tuple)

While walking different parts of the leg (side-view)

make distinctive angles θ, φ, and ρ with the vertical

and the horizontal lines. These are depicted in Fig-

ure 3. Using the XY -projection of the coordinates

of the joint-points these angles can be computed as:

θ = tan

−1

−x

−y

,φ = tan

−1

−x

−y

,ρ = tan

−1

−y

−x

These angles are considered for the half-gait-cycle.

3.7 Contour-based Features

So far we considered features extracted from the

skeleton stream. The contour-based feature, in con-

trast, is extracted from the depth stream. Recogniz-

ing people through gait depends on how the silhou-

ette shape of an individual changes over time. Pro-

crustes Shape Analysis

is used to obtain the Gait Sig-

nature (Wang et al., 2003) of the video as follows:

1. The ﬁrst frame is taken as the static background

and is subtracted from all frames to leave only the

moving subject in them.

Procrustes analysis is a form of statistical shape analy-

sis used to analyze the distribution of a set of shapes.

Fast Gait Recognition from Kinect Skeletons

345

2. Each frame is binarized using a threshold. Filter

out the largest connected component. This is the

shape or silhouette of the subject.

3. Compute the centroid of the silhouette from the

points on the contour. Traverse the contour anti-

clockwise to transform the points along outer con-

tour in the coordinate system with the centroid

= (x

) as the origin. Represented each point

as a complex number z

4. The shape is represented as the Z =

,· ·· ,z

] where N

is the number of

points on the outer contour. Two representations

represent same shape if one can be obtained from

the other using a combination of translation,

rotation, and scaling. Normalized representations

for different frames of a video by interpolation

such that they contain the same number of points.

5. Compute the principal eigen vector of the matrix

S =

∑

∗

)/(u

∗

) where u

represents the con-

ﬁguration of a frame of the video and ∗ opera-

tion means the complex conjugate transpose of a

matrix. The Principal Eigen Vector serves as gait

signature for the video.

We use different classiﬁers for different features.

4 CLASSIFICATION

We use three different classiﬁers or matching algo-

rithms - Na

ıve Bayes Classiﬁer for static, area, dis-

tance and dynamic features, Dynamic Time Warping

for angular features, and Procrustes Distance for con-

tour features.

4.1 Na

ıve Bayes Classiﬁer

The static, area, distance and dynamic features are

mutually independent. Hence they are composed in

a 18-dimensional feature vector (10 static, 2 area, 4

distance, and 2 dynamic). Na

ıve Bayes classiﬁer is

used with this feature vector to assign scores to each

video in the training set with respect to its similarity

to a test video. These scores are stored for later use.

The higher the scores, more similar are the gaits.

4.2 Dynamic Time Warping

Angular features are considered as a sequence over a

half-gait-cycle. To match such a sequence of a train-

ing video with that of a test video, we use Dynamic

Time Warping (M

uller, 2007). DTW works well

for non-linear time alignment where one sequence is

shifted, stretched, or shrunk in time with respect to

the other. Time is normalized over a half-gait-cycle

to adjust the sequences to the same length. Also, we

perform variance normalisation of these sequences to

reduce noise. A test video is matched against each of

these sequences in the training set and the resulting

DTW scores are stored in the database for later use.

4.3 Procrustes Distance

The contour-based feature is obtained as the eigen

gait signature for a video s described in Section 3.7.

The Procrustes distance between two gait signatures

between a test and a training video is given by

d(u

) = 1 − |u

∗

/(|u

), where u

are

gait signatures and the ∗ operation represents the

complex conjugate transpose of a vector. The smaller

the distance, more similar are the gaits. The corre-

sponding scores are stored in the database.

4.4 Composite Score

The differences in DTW and Procrustes distances are

small compared to the Bayesian scores. Hence these

differences are ampliﬁed by exponentiation and then

the 3 scores are multiplied to obtain the composite

score. Finally Nearest-Neighbour classiﬁer is used

on this composite score to classify a test video to the

class of the training video where the score maximizes.

5 EXPERIMENTS AND RESULTS

The system has been implemented using several li-

braries. The videos are captured in C++ using Kinect

Windows SDK

v1.8, the features are extracted using

MATLAB 2012b, DTW & Procrustes distances also

are computed using MATLAB, and a open-source

code

for Na

ıve Bayes Classiﬁer in C#.Net is used.

We have carried out several experiment to validate our

system as described below.

5.1 Data Sets and Processing

No benchmark gait dataset for skeleton and depth data

from Kinect 1.0 is available. Hence we have created

a dataset of 29 subjects (20 male and 9 female) for

training as well as testing. For this 5 composite Kinect

videos (comprising RGB, depth and skeleton streams)

http://www.microsoft.com/en-in/download/

details.aspx?id=40278

http://www.codeproject.com/Articles/318126/

Naive-Bayes-Classiﬁer

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

346

Figure 4: 6 Frames depicting the half-gait-cycle of the sub-

ject alternating with the silhouettes from respective frames.

of the side-walk of each subject was recorded using

the Kinect 1.0. In every video the subject moves in

a straight-line without occlusion against a ﬁxed back-

ground that is separately recorded (For sample RGB

frames of a video see Figure 4). From the composite

video we extract the individual streams

The skeleton stream is ﬁrst ﬁltered using a mov-

ing average ﬁlter (of size 8) to reduce jitter. The joint-

points are then used to extract the half-gait-cycle (Fig-

ure 4) and the static, area, distance, dynamic, and an-

gular features as discussed in Section 3.

Frames from the depth stream are binarised af-

ter background subtraction. For the largest connected

component in every binary frame (silhouette of the

human – Figure 4) the contour is calculated. The con-

tours are used to compute the gait signature of the

depth video.

The same processing is done for the training as

well as test videos to extract the features. For a test

video, however, we need to compute the classiﬁcation

scores and the composite score against every training

video (Section 4). These are fed to the Nearest Neigh-

bour classiﬁer for ﬁnal recognition.

5.2 Results

We use 5 videos each for 29 subjects (20 male and 9

female). The system is trained with 4 of these videos

for every subject and the 5th video is used for testing.

The performance of the system is measured by the

While the skeleton stream is used for most features, the

depth stream is used only for contours, and the RGB stream

is used just for visualization. It has no contribution to the

recognition tasks.

accuracy – the ratio of the number of videos correctly

labeled to the total number of test videos (29 here).

To understand the effectiveness and discriminat-

ing power of various features, we have performed the

recognition using various sets of features (and cor-

responding classiﬁers). The results are given in Ta-

ble 2. The results show that the static and angular fea-

tures are the most dominating. The dynamic features

(speed and stride length), though situation dependent,

help to increase the accuracy while area and distance

have hardly any impact. The increase in accuracy af-

ter incorporating contour based features is marginal

because the contour based features are already been

taken care of by other features like the distance and

area features.

Table 2: Accuracy with different feature sets.

Features used Accuracy

Static features 48.25%

Distance features 34.48%

Angular features 37.93%

Static, distance features 44.82%

Static, distance, area & dynamic features 55.17%

Static, distance, area & dynamic & angular features 65.57%

Static, distance, area & dynamic, angular & contour

based features

68.96%

Using the features extracted from the skeleton

stream, we get accuracy of around 65%. The lack of

accuracy is due to the inaccuracy of the coordinates

of the joint-points. The skeletons often are erroneous

and any error in this leads to signiﬁcant loss of feature

information.

In Table 3, we compare our results with a num-

ber of previous papers using the accuracy data as re-

ported in each. We ﬁnd that methods working on

RGB have better accuracy at the cost of efﬁciency.

Only one Kinect skeleton-based approach (Preis et al.,

2012) achieved accuracy comparable to RGB meth-

ods. However, its results are reported on a small data

set. Otherwise, our method achieves a better accuracy

compared to other skeleton-data methods.

For an apple-to-apple comparison we have im-

plemented an RGB-based method by mixing the ap-

proaches from (Roy et al., 2012) and (Wang et al.,

2003). We test this method with the same data set as

our system (only RGB frames are used) and the re-

sults are given in Table 4. We ﬁnd that this achieves

a much better accuracy of 83% (in comparison to our

65%) albeit at the cost of efﬁciency.

Fast Gait Recognition from Kinect Skeletons

347

Table 3: Comparison with reported results from prior work.

Description Remarks

16 Wearable Sensor, Angular features,

2 data sets: 73%: 1st. 42%: 2nd

(Tanawongsuwan and Bobick, 2001)

Good accuracy but intrusive

and costly

RGB sensor, Contour based features,

71% (Wang et al., 2003)

High processing time for

each RGB frame

RGB Sensor, Pose Kinematics & Pose

energy images, 83% (Roy et al., 2012)

Good accuracy on large

dataset. Heavy computation

Kinect, Skeleton, Static features, 85%

(Preis et al., 2012)

High accuracy; Small

dataset (9 subjects); Frontal

view (stationary subjects)

Kinect, Skeleton, Angular features,

Avg.: 44% sub1: 35%, sub2: 74%,

sub3: 39%, sub4: 33% (Ball et al.,

2012)

Accuracy low for even small

dataset

Kinect, Skeleton, Static, distance &

area features, 25% (Sinha et al., 2013)

Very low accuracy as angu-

lar features not considered

Table 4: Comparison on same data set (RGB only).

Description Remarks

RGB Sensor, Shape feature, 83% (Wang

et al., 2003)

Good accuracy on large

dataset.

RGB Sensor, Key Poses (Roy et al., 2012) Heavy computation.

We extract the shape based feature (Wang et al., 2003) from RGB data

then estimate the key poses (Roy et al., 2012) to recognize gait from

Kinect RGB data of our gait data set.

6 CONCLUSION

There have been several attempts to recognize gait

from RGB video. Many of these offer about 85% ac-

curacy (Tables 3 and 4). Handling RGB data is ex-

pensive in terms of processing speed and hence most

of these methods cannot work in real-time. In con-

trast, the present system works mainly with skeleton

stream to recognize gait. Skeleton data is less in vol-

ume (only 60 ﬂoating point numbers per frame cor-

responding to the 3D coordinates of 20-joints) com-

pared to RGB or depth data (typically 640 X 480 ≈

0.3 million integers). Therefore skeleton-based tech-

niques are more amenable to real-time processing.

The system takes about 1.5 secs (for a test video)

to recognize the gait if only static, area, distance &

dynamic features are used. This gives over 55% ac-

curacy (Table 2) which is better than similar skeleton-

based methods reported earlier (Table 3). Recognition

from RGB (Roy et al., 2012), (Wang et al., 2003) on

the same data set takes about 12 secs each video while

the accuracy improves to 83% (Table 4).

If angular features are added to the set, the execu-

tion time of our system increases to about 29 secs /

video while the accuracy goes to over 65% (Table 2).

This nearly 20-fold increase in time is due to the use

of DTW in matching because we use a na

ıve MAT-

LAB implementation that is quadratic in complexity.

Using a linear implementation can drastically reduce

this time. Also, reduction of the dimensionality of the

angular feature set can substantially improve time.

Adding contour-based features to our set improves

the accuracy to 69% (Table 2) while the time shoots

to 127 secs. This is due to use of depth data that is

inherently heavy. Hence we recommend not to use

depth data and contour-based features.

We are, therefore, working further on smarter im-

plementations for skeleton-based features for meeting

real-time constraints and at the same time experiment-

ing with better classiﬁers including HMM and SVM.

ACKNOWLEDGEMENT

We acknowledge TCS Research Scholar Program.

REFERENCES

Ball, A., Rye, D., Ramos, F., and Velonaki, M. (2012).

Unsupervised clustering of people from skeleton data.

In Human-Robot Interaction, Proc. of Seventh Annual

ACM/IEEE International Conference on, pages 225–

226.

BenAbdelkader, C., Cutler, R., Nanda, H., and Davis,

L. (2001). Eigengait: motion-based recognition

of people using image self-similarity. In Audio-

and Video-Based Biometric Person Authentication

(AVBPA 2001). Lecture Notes in Computer Science.

Proc. of 3rd International Conference on, volume

2091, pages 284–294.

Bobick, A. F. and Johnson, A. Y. (2001). Gait recognition

using static activity-speciﬁc parameters. In Computer

Vision and Pattern Recognition (CVPR 2001). Proc.

of 2001 IEEE Computer Society Conference on, vol-

ume 1, pages 423–430.

Boulgouris, N. V. and Chi, Z. X. (2007). Gait recognition

using radon transform and linear discriminant analy-

sis. Image Processing, IEEE Transactions on, 16:731–

740.

Brand, M. and Hertzmann, A. (2000). Style machines. In

Computer Graphics and Interactive Techniques (SIG-

GRAPH ’00). Proc. of 27th Annual Conference on,

pages 183–192.

Bruderlin, A., Amaya, K., and Calvert, T. (1996). Emotion

from motion. In Graphics Interface (GI ’96). Proc. of

Conference on, pages 222–229.

Bruderlin, A. and Williams, L. (1995). Motion signal pro-

cessing. In Computer Graphics and Interactive Tech-

niques (SIGGRAPH ’95). Proc. of 22nd Annual Con-

ference on, pages 97–104.

Chattopadhyay, P., Roy, A., Sural, S., and Mukhopadhyay,

J. (2014). Pose depth volume extraction from rgb-d

streams for frontal gait recognition. Journal of Visual

Communication and Image Representation, 25:53–63.

VISAPP 2016 - International Conference on Computer Vision Theory and Applications

348

Davis, J. W. and Bobick, A. F. (1997). The representation

and recognition of human movement using temporal

templates. In Computer Vision and Pattern Recogni-

tion. Proc. of IEEE Computer Society Conference on,

pages 928–934.

Gabel, M., Gilad-Bachrach, R., Renshaw, E., and Schuster,

A. (2012). Full body gait analysis with Kinect. In En-

gineering in Medicine and Biology Society (EMBC),

Proc. of 2012 Annual International Conference of the

IEEE, pages 349–361.

Igual, L., Lapedriza, A., and Borras, R. (2013). Robust

Gait-Based Gender Classiﬁcation using Depth Cam-

eras. Eurasip Journal On Image And Video Process-

ing.

Ioannidis, D., Tzovaras, D., Damousis, I. G., Argyropoulos,

S., and Moustakas, K. (2007). Gait recognition using

compact feature extraction transforms and depth in-

formation. Information Forensics and Security, IEEE

Transactions on, 2:623–630.

Isa, W. N. M., Sudirman, R., and Sh-Salleh, S. H. (2005).

Angular features analysis for gait recognition. In

Computers, Communications, & Signal Processing

with Special Track on Biomedical Engineering (CCSP

2005). Proc. 1st International Conference on, pages

236–238.

Jean, F., Albu, A. B., and Bergevin, R. (2009). To-

wards view-invariant gait modeling: Computing view-

normalized body part trajectories. Pattern Recogni-

tion, 42:2936–2949.

Johansson, G. (1973). Visual perception of biological mo-

tion and a model for its analysis. Perception and Psy-

chophysics, 14:201–211.

Johansson, G. (1975). Visual motion perception. Scientiﬁc

American, 232:76–88.

Kellokumpu, V., Zhao, G., Li, S. Z., and Pietikinen, M.

(2009). Dynamic texture based gait recognition. In

Advances in Biometrics (ICB 2009). Lecture Notes in

Computer Science. Proc. of 3rd International Confer-

ence on, volume 5558, pages 1000–1009.

uller, M. (2007). Dynamic Time Warping.

O’Brien, J. F., Jr., R. E. B., Brostow., G. J., and Hodgins,

J. K. (2000). Automatic joint parameter estimation

from magnetic motion capture data. Graphics Inter-

face. Proc. of Conference on, pages 53–60.

Preis, J., Kessel, M., Werner, M., and Linnhoff-Popien, C.

(2012). Gait recognition with kinect. In Kinect in

Pervasive Computing, Proc. of the First Workshop on.

Ran, Y., Weiss, I., Zheng, Q., and Davis, L. S. (2007).

Pedestrian detection via periodic motion analysis. In-

ternational Journal of Computer Vision, 71:143–160.

Roy, A., Sural, S., and Mukhopadhyay, J. (2012). Gait

recognition using pose kinematics and pose energy

image. Signal Processing, 92:780–792.

Silaghi, M.-C., Pl

ankers, R., Boulic, R., Fua, P., and Thal-

mann, D. (1998). Local and global skeleton ﬁtting

techniques for optical motion capture. In Modelling

and Motion Capture Techniques for Virtual Environ-

ments (CAPTECH ’98). Proc. of International Work-

shop on, pages 26–40.

Sinha, A., Chakravarty, K., and Bhowmick, B. (2013).

Person identiﬁcation using skeleton information from

kinect. In Advances in Computer-Human Interaction

(ACHI 2013), Proc. 6th International Conference on,

pages 101–108.

Stone, E. and Skubic, M. (2011). Evaluation of an inexpen-

sive depth camera for in-home gait assessment. Jour-

nal of Ambient Intelligence and Smart Environments,

3:349–361.

Sudarsky, S. and House, D. (2000). An Integrated Approach

towards the Representation, Manipulation and Reuse

of Pre-recorded Motion. In Computer Animation (CA

’00). Proc. of Conference, pages 56–61.

Tanawongsuwan, R. and Bobick, A. (2001). Gait Recog-

nition from Time-normalized Joint-angle Trajectories

in the Walking Plane. In Computer Vision and Pat-

tern Recognition (CVPR 2001), Proc. of 2001 IEEE

Computer Society Conference on, pages 726–731.

Wang, L., Tan, T., Hu, W., and Ning, H. (2003). Automatic

gait recognition based on statistical shape analysis.

Image Processing, IEEE Transactions on, 12:1120–

1131.

Fast Gait Recognition from Kinect Skeletons

349