3D POSE ESTIMATION FROM SILHOUETTES IN
CYCLIC ACTIVITIES ENCODED BY A DENSE
GAUSSIANS MIXTURE MODEL
S. Amin Dadgar, Jean-Christophe Nebel and Dimitrios Makris
Digital Imaging Research Centre, Kingston University, London, U.K.
Keywords: 3D Pose Estimation, Principle Component Analysis, Gaussian Mixture Model, Annealed Particle Filter.
Abstract: This paper presents a system for 3D Pose estimation of cyclic activities (e.g. walking, jogging). Principal
Component Analysis is used to compress the high dimensional space of poses. Human activities are
encoded by Hidden Markov Models, overlaid on Gaussian Mixture Models. A generative approach based on
the Annealed Particle Filter is used to estimate poses from silhouettes derived by a monocular camera.
Experimental results indicate the value of the proposed Dense Gaussian Mixture Model when initialised by
a gait cycle.
1 INTRODUCTION
Visual analysis of human motion is one of the most
active research fieldsin computer vision, because of
its promising applications in many areas such as
visual surveillance, perceptual user interface,
content-based image storage and retrieval, video
conferencing, athletic performance analysis, virtual
reality. Human body pose estimation,which is the
process of estimating the configuration of the human
body parts over time, is a prerequisite to human
motion analysis. While specialised Motion Capture
(MoCap) systems can produce satisfactory results,
their applicability is limited, because of their
intrusive character (i.e. requirement to attach
sensors/markers on the human body). Therefore,
non-intrusive visual-based systems seem to be the
attractive alternative that can be easily employed in
a variety of sites and applications.
In recent years, there has been a clear shift of
research from 2D image-plane (Kou, 2007) to 3D
articulated motion analysis (Balan, 2005), (Roth,
2004), (Sidenbladh, 2000) to overcome limitations
of specific angle-views and ambiguous depth
information in 2D representations. Since 3D pose
recovery from a 2D view is an ill-pose problem,
researchers attempt to use multiple views (Darby),
(Lee 2007), (Bhatia, 2004) and/or learnt models of
specific activities (Deutscher, 2005), (Hogg, 1983),
(Howe, 2004), (Howe, 2005), (Lakany, 1999) to
constraint poseestimation.
Assuming a specific cyclic activity such as
walking and joggingthis paper presents a system
which is capable of estimating a sequence of 3D
poses from a 2D image sequence (2D video)
captured by a monocular camera.
Sequences of 3D posesderived by a Motion
Capture system are used to train activity models.
Principal Component Analysis (PCA) is used to
reduce the dimensionality of the pose space and a
Hidden Markov Model (HMM) overlaid on a
Gaussian Mixture Model (GMM) encodes the
dynamics of the activity. In this work, we propose
the usage of a Dense GMM initialised by a gait
cycle as the base of the HMM.
Inferring the image data is constrained by the
learnt activity model. Specifically, an Annealed
Particle Filter (APF) (Balan, 2005), is used to
generate 3D pose hypotheses. Then, hypotheses are
assessed by comparing their 2D projection with the
input image silhouettes in a Bayesian Framework.
2 METHODOLOGY
Here we definethe estimation of a 3D Human Body
pose as the recovery of the configuration of a set of
body parts. Given a sequence of image observations
(i.e. sequence of 2D contour) 
, 1 ,
492
Amin Dadgar S., Nebel J. and Makris D.
3D POSE ESTIMATION FROM SILHOUETTES IN CYCLIC ACTIVITIES ENCODED BY A DENSE GAUSSIANS MIXTURE MODEL.
DOI: 10.5220/0002896004920495
In Proceedings of the International Conference on Computer Vision Theory and Applications (VISIGRAPP 2010), page
ISBN: 978-989-674-028-3
Copyright
c
2010 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
the proposed system aims at estimating the sequence
of state vectors
, which represent the
configuration of the 3D poses of the human body
within an specific cyclic activity (e.g. walking).
2.1 Learning
Training data is acquired from a MoCap device for a
specific cyclic activity. 3D Poses are represented by
the joint angles of a standard human model of 10
body parts. The space of unconstrained poses is
inevitably highly dimensional (32D). Sincesearching
in this space is not efficient, we reduce the
dimensionality of the solution space to 8D using
PCA.
The 8D samples of the training data are then
used to estimate a GMM that constrains the solution
space. In particular, two different solutions are
investigated:
a)A mixture of 4 Gaussians (equal to the number
of gait phases in walking/jogging) that is estimated
by Expectation Maximisation (EM).
b)A Dense GMM initialised by a gait cycle. This
implies that the number of Gaussian clusters is equal
to the number of pose samples in the initialising gait
cycle.
Finally, the dynamics of the activity are
modelled by a HMM overlaid on the GMM.
2.2 Testing
Poses are inferred from image data using a
generative approach. Specifically an APF is used to
generate particles in the 8D PCA space constrained
by the HMM. A reverse PCA transformation of
these particles results in 3D pose hypotheses.
3D pose hypotheses are projected on the image
plane using the camera calibration model, which is
assumed to be known. The projections of the 3D
hypotheses are then assessed using the following
formula:
|

,

·
,
(1)
where
is the posterior probability of a Gaussian
cluster according to the HMM formulation,
,

is the Chamfer distance between the
input image silhouette
and the 2D projection of a
pose hypothesis

, and
,
is the
Mahalanobis distance between the pose hypothesis
and the Gaussian cluster.
3 EXPERIMENTS
3.1 Datasets
Two sequences from the HumanEva dataset (Sigal,
2007) were used for testing and training our
methodology:
1. Sequence of “walking in circle”. The data
for the subject 2 was used for training and testing
(S2W). This sequence consists of about two circular
paths where one circle (i.e. frames from 431 to 870)
has been chosen to train the walking activity and the
remaining frames (i.e. frames from 1 to 430) were
used for testing.
2. Sequence of “jogging in circle”. The data
for the subject 2 was used for training and testing
(S2J) too. This sequence consists of 5 circular
pathswhere three circles (i.e. frames from 301 to
790) were used for training, while the two other
circles (i.e. frames from 1 to 300) were used for
testing.
3.2 Results
A PCA transformation is applied to each of the two
training datasets (i.e. walking and jogging) to reduce
their dimensionality from 32D to 8D and at the same
time preserve more than 90% of the data variation
(Fig. 1)
Modelling of activities was based on either a
mixture of 4 Gaussians initialised according to the 4
phases of a gait cycle (Fig. 2) or a Dense-GMM
initialised by the samples of a gait cycle (Fig. 3).
Walking model consists of 84 clusters while jogging
consists of 48 clusters.
Pose hypotheses were generated by either an
exhaustive search constrained by either the GMM
model of the activity or APF. Hypotheses were
assessed against the image data, i.e. human
silhouettes segmented by a foreground/background
separation algorithm. The system solution for each
frame is selected according to the Maximum
Likelihood Estimation, where the likelihood is given
by Eq.1.
Table 1 summarises the results of our
experiments. They reveal that Dense-GMM based
modelling is more accurate than the one based on a
small number of Gaussian Clusters. Moreover,
Dense-GMM is more computationally efficient since
it requires about only 40% of the time that is
required by the mixture of 4 Gaussians.
Fig 4 shows in the PCA space the pose
estimation results against ground truth when
hypotheses are generated by exhaustive search
3D POSE ESTIMATION FROM SILHOUETTES IN CYCLIC ACTIVITIES ENCODED BY A DENSE GAUSSIANS
MIXTURE MODEL
493
constrained by the GMM model. Since the centres of
Gaussians have high likelihood, estimated poses
(blue) are concentrated on the Medial Axis of the
search space.
Higher accuracy is achieved by APF (Fig. 5). In
our implementation, we used 3 layers and 40
particles per layer. Higher number of particles may
improve the accuracy marginally but would increase
significantly the computational time. Annealed
Particle filtering produced the best results for both
walking and jogging activities withmean errors of
101 mm and 162 mm respectively.
Figure 1: First three dimensions of the PCA space for the
walking (left) and jogging (right) datasets.
Figure 2: Modelling of walking poses (blue points) based
on a mixture of 4 Gaussians (coloured ellipsoids)
initialised by the 4 different phases of the walking cycle,
i.e. left leg passing right leg, left leg in front, right leg
passing left leg and right leg in front.
Figure 3: Modelling of walking (left) and jogging (right)
poses using Dense GMMs initialised by samples of a gait
cycle. Mixtures of 84 and 48 Gaussians areshown for
walking and jogging data respectively.
Table 1: Summary of pose estimation results.
Activity Modelling Search
Mean
Error
mm
Std
Error
mm
Walking
4
-GMM Exhaustive 211 58
Walking
D
enseGMM Exhaustive 129 50
Walking
D
enseGMM APF 101 17
Jogging
D
enseGMM Exhaustive 168 40
Jogging
D
enseGMM APF 163 44
Figure 4: Estimated poses (blue) against ground truth(red)
represented in the compressed PCA space for walking
(left) and jogging (right) activity. Modelling was based on
Dense GMM and hypotheses were generated by
exhaustive search.
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
494
Figure 5: Estimated poses (blue) against ground truth(red),
represented in the compressed PCA space for walking
(left) and jogging (right) activity. Modelling was based on
Dense GMM and hypotheses were generated by APF.
4 CONCLUSIONS
We presented a pose estimation method for
monocular image sequences, where the human
object is assumed to perform a known cyclic activity
(e.g. walking, jogging). We demonstrated the value
of using a Dense GMM initialised by a gait cycle of
the activity as the base of a HMM-like dynamic
model. Such modelling improves the accuracy and
decreases the computational time of pose
estimationcompared to a mixture of few Gaussians.
Hypotheses were generated by APF and poses
were estimated according to the Maximum
Likelihood Estimation. APF improves accuracy
when compared to an exhaustive search of the PCA
space constrained by the GMM model.
Future work will focus on pose estimation in
complex scenarios consisting of more than one
activity.
REFERENCES
Balan, A. O., Sigal, L. and Black, M. J.,2005. A
quantitative evaluation of video-based 3D person
tracking, IEEE workshop on VPSETS, pp. 349-356.
Bhatia, S., Sigal, L., Isard, M. and Black, M. J., 2004. 3D
Human Limb Detection using Space Carving and
Multi-view Eigen Models, ANM workshop, pp. 17.
Darby, J., Li, B. and Costen, N. 2008, Human Activity
Tracking from Moving Camera Stereo Data, BMVC.
Deutscher, J., Reid, I., 2005. Articulated body motion
capture by stochastic search, International Journal of
Computer Vision, 61, 2, 185-205.
Hogg, D.,1983. Model-based vision: a program to see a
walking person, Image and Vision Computing, 1, 1, 5-
20.
Howe, N. R., 2005. Silhouette lookup for automatic pose
tracking, IEEE workshop on Articulated and Nonrigid
Motion, pp. 3
Howe, NR. Flow Lookup and Biological Motion
Perception, ICIP, 3, 1168-1171.
Kuo, P., Nebel, J-C., Makris, D., 2007. Camera Auto-
Calibration from Articulated Motion, AVSS.
Lakany, H. M., G.M. Haycs, M. Hazlewood, S.J. Hillman,
1999. Human walking: tracking and analysis,
Proceedings of the IEE Colloquium on Motion
Analysis and Tracking, pp. 5/1–5/14.
Lee, C. S., Elgammal, A., 2007, Modelling view and
posture manifolds for tracking, ICCV, 1-8.
Roth, S., Sigal, L. and Black, M. Gibbs, 2004. Likelihoods
for Bayesian tracking, CVPR, 1, 886-893.
Sidenbladh, H., Black, M. and Fleet, D., 2000. Stochastic
tracking of 3D human figures using 2D image motion,
Lecture Notes in Computer Vision, 1843, 702-718.
Sigal, L., Black, M. J., 2006.HumanEva: Synchronized
Video and Motion Capture Dataset for Evaluation of
Articulated Human Motion, Technical Report CS-06-
08.
(a) (b) (c) (d)
(e)
Figure 6: Indicative results for walking (first and second
row) and jogging (third and fourth row) sequences using
Dense GMM and APF. (a) input images, (b) silhouette
extracted by foreground/background separation, (c)
boundary of silhouette, (d) estimated pose, (e) estimated
pose overlaid on image silhouette.
3D POSE ESTIMATION FROM SILHOUETTES IN CYCLIC ACTIVITIES ENCODED BY A DENSE GAUSSIANS
MIXTURE MODEL
495