Table 3: Comparison with reported results from prior work.
Description Remarks
16 Wearable Sensor, Angular features,
2 data sets: 73%: 1st. 42%: 2nd
(Tanawongsuwan and Bobick, 2001)
Good accuracy but intrusive
and costly
RGB sensor, Contour based features,
71% (Wang et al., 2003)
High processing time for
each RGB frame
RGB Sensor, Pose Kinematics & Pose
energy images, 83% (Roy et al., 2012)
Good accuracy on large
dataset. Heavy computation
Kinect, Skeleton, Static features, 85%
(Preis et al., 2012)
High accuracy; Small
dataset (9 subjects); Frontal
view (stationary subjects)
Kinect, Skeleton, Angular features,
Avg.: 44% sub1: 35%, sub2: 74%,
sub3: 39%, sub4: 33% (Ball et al.,
2012)
Accuracy low for even small
dataset
Kinect, Skeleton, Static, distance &
area features, 25% (Sinha et al., 2013)
Very low accuracy as angu-
lar features not considered
Table 4: Comparison on same data set (RGB only).
Description Remarks
RGB Sensor, Shape feature, 83% (Wang
et al., 2003)
Good accuracy on large
dataset.
RGB Sensor, Key Poses (Roy et al., 2012) Heavy computation.
We extract the shape based feature (Wang et al., 2003) from RGB data
then estimate the key poses (Roy et al., 2012) to recognize gait from
Kinect RGB data of our gait data set.
6 CONCLUSION
There have been several attempts to recognize gait
from RGB video. Many of these offer about 85% ac-
curacy (Tables 3 and 4). Handling RGB data is ex-
pensive in terms of processing speed and hence most
of these methods cannot work in real-time. In con-
trast, the present system works mainly with skeleton
stream to recognize gait. Skeleton data is less in vol-
ume (only 60 floating point numbers per frame cor-
responding to the 3D coordinates of 20-joints) com-
pared to RGB or depth data (typically 640 X 480 ≈
0.3 million integers). Therefore skeleton-based tech-
niques are more amenable to real-time processing.
The system takes about 1.5 secs (for a test video)
to recognize the gait if only static, area, distance &
dynamic features are used. This gives over 55% ac-
curacy (Table 2) which is better than similar skeleton-
based methods reported earlier (Table 3). Recognition
from RGB (Roy et al., 2012), (Wang et al., 2003) on
the same data set takes about 12 secs each video while
the accuracy improves to 83% (Table 4).
If angular features are added to the set, the execu-
tion time of our system increases to about 29 secs /
video while the accuracy goes to over 65% (Table 2).
This nearly 20-fold increase in time is due to the use
of DTW in matching because we use a na
¨
ıve MAT-
LAB implementation that is quadratic in complexity.
Using a linear implementation can drastically reduce
this time. Also, reduction of the dimensionality of the
angular feature set can substantially improve time.
Adding contour-based features to our set improves
the accuracy to 69% (Table 2) while the time shoots
to 127 secs. This is due to use of depth data that is
inherently heavy. Hence we recommend not to use
depth data and contour-based features.
We are, therefore, working further on smarter im-
plementations for skeleton-based features for meeting
real-time constraints and at the same time experiment-
ing with better classifiers including HMM and SVM.
ACKNOWLEDGEMENT
We acknowledge TCS Research Scholar Program.
REFERENCES
Ball, A., Rye, D., Ramos, F., and Velonaki, M. (2012).
Unsupervised clustering of people from skeleton data.
In Human-Robot Interaction, Proc. of Seventh Annual
ACM/IEEE International Conference on, pages 225–
226.
BenAbdelkader, C., Cutler, R., Nanda, H., and Davis,
L. (2001). Eigengait: motion-based recognition
of people using image self-similarity. In Audio-
and Video-Based Biometric Person Authentication
(AVBPA 2001). Lecture Notes in Computer Science.
Proc. of 3rd International Conference on, volume
2091, pages 284–294.
Bobick, A. F. and Johnson, A. Y. (2001). Gait recognition
using static activity-specific parameters. In Computer
Vision and Pattern Recognition (CVPR 2001). Proc.
of 2001 IEEE Computer Society Conference on, vol-
ume 1, pages 423–430.
Boulgouris, N. V. and Chi, Z. X. (2007). Gait recognition
using radon transform and linear discriminant analy-
sis. Image Processing, IEEE Transactions on, 16:731–
740.
Brand, M. and Hertzmann, A. (2000). Style machines. In
Computer Graphics and Interactive Techniques (SIG-
GRAPH ’00). Proc. of 27th Annual Conference on,
pages 183–192.
Bruderlin, A., Amaya, K., and Calvert, T. (1996). Emotion
from motion. In Graphics Interface (GI ’96). Proc. of
Conference on, pages 222–229.
Bruderlin, A. and Williams, L. (1995). Motion signal pro-
cessing. In Computer Graphics and Interactive Tech-
niques (SIGGRAPH ’95). Proc. of 22nd Annual Con-
ference on, pages 97–104.
Chattopadhyay, P., Roy, A., Sural, S., and Mukhopadhyay,
J. (2014). Pose depth volume extraction from rgb-d
streams for frontal gait recognition. Journal of Visual
Communication and Image Representation, 25:53–63.
VISAPP 2016 - International Conference on Computer Vision Theory and Applications
348