formancein order to discriminate among several mod-
els the best one representing a given set of sample
points. In all the cases, we assume that the measure
vector of the nodes follows a multivariate Gaussian
distribution, which substantially simplifies the evalu-
ation of the score measure given by equation (2) (see
(Song et al., 2003)).
4.1 Database Description
We have used two different databases. The Caltech‘s
Database (courtesy of C. Fanti), provides 3D infor-
mation on a set of human-body landmarks in mo-
tion (Fanti et al., 2003). This database contains 3500
samples containing 3D information (position and ve-
locity) of 14 fixed landmarks on a walking human
body: head (H), neck (N), shoulders (LS,RS), el-
bows (LE,RE), wrists (LW,RW), hips (LH,RH), knees
(LK,RK) and ankles (LA,RA). Experiments from dif-
ferent points of view, 0, π/4, π/2, 3π/4 and π radi-
ans, have been conducted using the 3D information.
In order to carry out experiments with more complex
motions, we have also used some actions from the
HumanEva database (Sigal and Black, 2006). The
HumanEva database contains 4 actors performing a
set of 6 actions each one in 3 separate trials. Here
we focus on three of these actions: Box, Gesture and
ThrowCatch. This database provides images from
seven points of view: frontal view (camera C1), lat-
eral views (C2 and C3), and 4 diagonal views (BW1,
BW2, BW3 and BW4).
4.2 Labeling Experiments
The DTG models used in our experiments have been
learned using the algorithm proposed in (Song et al.,
2001). The labeling of a sample is considered cor-
rect if its cost is lower than the true cost assumed as
known (only for the test samples). This criterion un-
fortunately does not guarantee that the fitted labeling
is equal to the true one. The ambiguity defined by
the relative location of the feet in a walking person
seen from the side is impossible to solve by using this
model, and this configuration will have an equal or
lower cost than the true labeling. Also background
points can be selected as part of the best labeling.
4.2.1 Caltech‘s Database
We have used 2500 image samples for learning the
DTG models and 1000 for the labeling experiments.
We learn two different types of models: a) static mod-
els, using only the projections of the 3D positions;
b) motion models, using both the position and veloc-
ity projections. Some of the fitted models from this
Figure 5: FEP working over two samples with 34 points
(14+20): the first sample (a, b, c) shows a π/2 radians point
of view where selected and expected points are coincident;
the second sample (d, e, f) shows a 0 radians point of view
where the points of both legs are exchanged; the remain-
ing body points are fitted correctly. All the points in each
sample are shown in (a) and (d); the points selected by FEP
are shown in (b) and (e); finally, (c) and (f) show the fitted
DTG. Green points: expected and selected labels are coinci-
dents; red diamonds: selected labels are not coincident with
expected labels; blue squares: expected labels that are not
coincident with selected labels.
database are shown in Figure 4 (a)-(c). Once the mod-
els have been learned we use them to label the remain-
ing 1000 samples.
To compare the FEP and DP robustness under
added noise, we run experiments with 14, 20 and 40
random added points over the original 14 points using
the learned static and motion models. In Figure 6 (a)
and (b), the results of both experiments are shown. It
can be observed that in general FEP outperforms DP.
Moreover, the percentage of samples with a labeling
cost equal or lower than the original model is almost
100% for the FEP algorithm in all cases. Only for the
samples in 3π/4 angle do we have 97.8%. Moreover,
in most of the experiments, FEP has O(N
2
) efficiency;
only for angles π/4 and 3π/4 does it reaches O(N
3
)
in certain tests with added noisy points (see Figure 7).
In figure 6 (c) and (d) a comparison on the timing effi-
ciency of both techniques, FEP and DP, using the two
models is shown. In all the cases, FEP outperforms
DP.
It is not an easy task to establish a relationship
between the set of graphs evaluated by FEP and DP
respectively. The main difficulty is in the way the
graph is built. DP starts from the first vertex accord-
ing to vertex elimination order, while FEP starts from
the base triangle, so that the first vertex in DP is the
last added vertex in FEP. In order to assess the qual-
ity of the fitted graph given by DP and FEP, we run
experiments on the Caltech’s database counting the
selected number of correct nodes by each algorithm.
Figure 8 shows the obtained results using the mo-
tion model where both techniques provide equivalent
performance. Results are very similar for the static
model, so only motion model is shown in Figure 8.
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
492