1.1 Related Work
Since geometric camera calibration reveals the
relationship between the 3D space that is viewed by
the camera and its projection on the image plane, it
is a key for the reconstruction of a 3D articulated
structure. The common practice for calibrating
cameras is to obtain point correspondences between
a known calibration pattern and its projection on the
image (Tsai, 1987). Although such process is
straightforward, it is very often unpractical as
cameras often change position, their number may be
very large or physical access to them is impossible.
In order to deal with this issue, Taylor offered a pose
recovery method which does not require any camera
calibration (Taylor, 2000). He exploits an
orthographic projection model that assumes 3D
objects are very far away from the camera thus the
depth of their surface points are almost constant.
Although it has been widely used (Mori and Malki,
2002, Mori and Malki, 2006, Remondino and
Roditakis, 2003), as we shall see in our result
section, accuracy is compromised by such a strong
assumption. Inspired by Taylor’s work (Taylor,
2000), our method does not required any manual
camera calibration and relies on the location of 2D
image key points, i.e. joints, as an input. However, it
uses bipedal motion constraints to recover more
accurately 3D poses.
The extraction of 2D joint positions from an image
has been a very active field of research (Ren et al.,
2005, Kuo et al., 2008, Balan and Black, 2006,
Urtasun et al., 2006). Ren et al. extracts body
segments by exploiting parallelism and pairwise
constraints of the body parts (Ren et al., 2005). Kuo
et al. extended this approach by adding other image
cues, i.e. colour and motion, which are informative
regarding body part location (Kuo et al., 2008).
Others use a Wandering-Stable-Lost framework to
track 2D body parts/key points through the
sequences (Balan and Black, 2006, Urtasun et al.,
2006).
Most pose reconstruction methods rely either on
multiple cameras (Bhatia et al., 2004, Izo and
Grimson, 2007), and/or assume specific types of
activities (Bhatia et al. 2004, Elgammal and Lee,
2006, Tian et al., 2004, Lim et al., 2006, Ek et al.,
2008). Moreover, some of them require manual
initialisation of their 3D tracker (Balan and Black,
2006, Urtasun et al., 2006, Martinez-del-Rincon et
al., 2008). Therefore, all these constraints
dramatically limit the practical applications of those
systems.
Our approach exploits general constraints
imposed by human bipedal motion. They include the
presence of at least one foot on the ground during
most activities; this constraint has already been used
successfully in 2D body tracking (Martinez-del-
Rincon et al., 2008). These bipedal constraints are
much less restricting than assuming a specific type
of motions (e.g., walking). In this work, we use such
constraints for camera self-calibration from
observing human motion to derive 3D poses for key
frames (Kuo et al., 2007) and further infer 3D poses
between key frames.
2 METHODOLOGY
2.1 3D Human Pose Recovery
Our goal is to recover 3D human postures in video
sequences by exploiting human bipedal motion
constraints. We propose a 3D pose estimator which
generates possible 3D poses from 2D joint positions
in the input image sequence. Then the most proper
pose is selected by taking into account learned
human motion models. Figure 1 illustrates the flow
of our 3D human pose recovery. It requires an image
processing task of detection of “image key points”
related to postures i.e., 2D body joints, from the
video. The 3D pose estimator, which is based on
pin-hole projection, then transforms 2D images
points to a set of 3D poses in real world, using the
constraints of human bipedal motion. Based on the
motion models which are learnt from dynamics and
further constraints of human motion, the most likely
pose can be selected among the proposed 3D poses.
Because the task of obtaining key body points
from the image has been tackled in our previous
paper (Kuo et al., 2008) and a paper dealing with
pose selection is in preparation, in this work, we will
concentrate on the 3D pose estimator which covers
the pose reconstruction by exploiting human bipedal
motion constraints.
2.2 3D Pose Estimator
The 3D pose estimator generates a set of 3D pose
proposals from 2D joint positions based on the pin-
hole projection model. First, postures are estimated
automatically for a set of key frames where camera
auto-calibration can be performed. Then, the other
postures are recovered by propagating the
parameters of the pin-hole projection model
obtained for the key frames to other frames by
introducing a constraint of bipedal motion.
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
558