on the training set. Second, the Hammoude metric
(top-down) allows to obtain a more reliable posture
hypothesis. With this strategy, a very quick bottom-
up approach filters out most of the pose candidates
so that the more computational intensive top-down
process only has to evaluate a reduced number of hy-
potheses.
The idea of combining bottom-up and top-down
approaches has been successfully exploited in other
applications. For instance, in (Ramanan et al., 2007a),
two different methods are used to build models for
person detection. First a bottom-up approach searches
for body part candidates in the image, which are then
clustered to find and identify assemblies of parts that
might be people. Simultaneously, a top-down ap-
proach is used to find people by projecting the pre-
vious assembled parts in the image plane.
We believe that the combination of the bottom-
up and top-down processes above mentioned, is the
key for the efficiency and reliability of detection and
tracking algorithms. In one hand, the amount of im-
age information to process is huge and thus requires
top-down constraints given by models. However,
matching the models to the image must be guided by
bottom-up processes for efficiency. We evaluate our
method and study the trade-off between the bottom-
up and top-down processes in a series of simulations.
Our paper is organized as follows. Section 2 de-
scribes related work. In Section 3 we describe the
method’s architecture, which is divided in to the fol-
lowing major components: (i) the machine learn-
ing part (offline) and (ii) the matching strategy be-
tween the observed image and the generated hypothe-
ses (online). In Section 3 some experiments concern-
ing realistic scenarios is presented. Finally, Section
4 presents the conclusions of the paper and provides
directions for further research work.
2 RELATED WORK
A large number of works have been made avail-
able concerning human motion analysis, although
with different focus and classification methods. In
(Gavrila, 1999) the division is made into 2D and 3D
approaches in which the 2D approaches are further
sub-branched in methods that take advantageof an ex-
plicit use of shape models, and others that do not use
any kind of model (i. e. Image Descriptors). In recent
works (e.g. (Borenstein and Ullman, 2008), (Bran-
dao et al., 2011)), various directions in research have
emerged, such as combining top-down and bottom-up
models, PF algorithms for tracking human body parts,
and model-free approaches. Many of these newtrends
cannot be placed within the classifications mentioned
above. So, a more generic approach is proposed in
(Poppe, 2007), where the main division is made ac-
cording to model-based (or generative) and model-
free (or discriminative) approaches. The estimation
process step consists is computing the pose parame-
ters that minimizes the error between observation and
the projection of the human body model. Two classes
of estimators are possible to identify: top-down and
bottom-up (Poppe, 2007). Top-down approach con-
sists in matching a projection of the human body
model with the observed image, while in Bottom-up
approaches individual body parts are found and then
assembled into a human body image. In more recent
works (Brandao et al., 2011), (Ramanan et al., 2007b)
these two are combined for better performance
2.1 Bottom-up Estimation
Bottom-up approaches are typically used to find body
parts and then used to assemble them into a full hu-
man body; these parts are normally described as 2D
templates. The main problems associated with the
bottom-up process are normally the quantity of false
positives marked as limb-like regions in an image.
Another drawback is the need of part detectors for
most body parts since missing information is likely
to result in less accurate pose estimation.
In (Micilotta et al., 2006), the first step is to find a
person in the image, so body parts are learned by the
trackers and a possible assembly is found by applying
RANdom SAmple Consensus (RANSAC). Heuristics
are used to remove unlikely poses, and a prior pose
determines the likelihood function of the assembly.
2.2 Top-down Estimation
Top-down approaches match a projection of the hu-
man body with the image observation. In order to
achieve fast solutions, a local search is performed
in the neighbourhood of an initial pose estimation
(Gavrila, 1999). According to (Gavrila and Davis,
1996) a hierarchical classification is possible in order
to achieve better performance for initial positioning.
This way, they first build the torso and head and then
the rest of the limbs of the model.
The main constraint presented in top-down ap-
proaches is the initialization in the first frame which
leads to a manually starting requirement. Other is-
sues are the computational effort of rendering the hu-
man body model and the calculation of the distance
between the rendered model and the image observa-
tion.
Top-downapproaches also present some problems
Vision-basedHandPoseEstimation-AMixedBottom-upandTop-downApproach
567