4 EXPERIMENTS
We performed experiments for an application of pose
estimation. We would like to point out that the learn-
ing method can be applied to any parts based model
which integrates pixelwise classification with random
forests, for instance methods for joint object recogni-
tion and segmentation.
The proposed algorithm has been evaluated on the
CDC4CV Poselets dataset (Holt et al., 2011). Our
goal is not to beat the state of the art in pose esti-
mation, but to show that spatial learning is able to
improve pixelwise classification of parts based mod-
els. The dataset contains upper body poses taken
with Kinect and consists of 345 training and 347
test depth images. The authors also supplied cor-
responding annotation files which contain the loca-
tions of 10 articulated parts: head(H), neck(N), left
shoulder(LS), right shoulder(RS), left elbow(LE), left
hand(LHA), right elbow(RE), right hand(RHA), left
hip(LH), right hip(RH). We created groundtruth seg-
mentations through nearest neighbor labeling. In our
experiments, the left/right elbow (LE,RE) and hand
(LHA,RHA) parts were extended to left/right upper
arm (LUA,RUA) and forearm (LFA, RFA) parts, we
also defined the part below the waist as other (the
black area in the Figure 2b).
Unless otherwise specified, the following param-
eters have been used for RDF learning: 3 trees each
with a depth of 9; 2000 randomly selected pixels per
image, roughly distributed across the body; 4000 can-
didate pairs of offsets; 22 candidate thresholds; off-
sets and thresholds have been learned separately for
each node in the forest. For spatial learning, 28 pairs
of neighbors have been identified between the 10 parts
based on a pose where the subject stretches his arms.
The parameter λ was set to 0.4.
We evaluate our method at two levels: pixelwise
classification and pairs of parts recognition. Pixel-
wise decisions are directly provided by the random
forest. Part localizations are obtained from the pixel-
wise results through pixel pooling. We create a poste-
rior probability map for each part from the results on
RDF. After non-maximum suppression and low pass
filtering, the location with largest response is used
as an estimate of the part, and then the positions of
pairs of detected parts are calculated, which approxi-
mately correspond to joints and serve as the interme-
diate pose indicator. In the following, we denote them
by the pair of neighboring parts.
Table 1 shows classification accuracies of the
three settings. A baseline has been created with clas-
sical RDF learning and depth features. Spatial learn-
ing with depth features only and with depth and edge
Table 1: Results on body part classification in pixelwise
level: D=deph features; E=edge features.
Accuracy
Classical RDF with D 60.30%
Spatial D λ = 0.4 61.05%
Spatial D+E λ = 0.4 67.66%
features together are shown in table 1. We can see
that that spatial learning can obtain a performance
gain, although the layout is used in the prediction
model and no pairwise terms have been used. Figure
2 shows some classification examples, which demon-
strates that spatial learning makes the randomized for-
est more discriminative. The segmentation output is
cleaner, especially at the borders.
At part level, we report our results of pairs of parts
according to the estimation metric by (Ferrari et al.,
2008): a pair of groundtruth parts is matched to a de-
tected pair if and only if the endpoints of the detected
pairs lie within a circle of radius r=50% of the length
of the groundtruth pair and centered on it. Table 2
shows our results on part level using several settings.
It demonstrates that spatial learning improves recog-
nition performance for most of parts.
The experiments at both pixelwise and part level
demonstrate that spatial learning makes randomized
forest more discriminative by integrating the spatial
layout into its prediction model. This proposition is
very simple and fast to implement, as the standard
pipeline can be still used. Only the learning method
has been changed, the testing code is unchanged.
There is no additional computational burden whatso-
ever during testing; a slight increase in computational
complexity can be observed for learning. No complex
discrete optimization problems need to be solved.
5 CONCLUSIONS
In this paper, we proposed a novel learning algorithm
for randomized decision forests which integrates in-
formation on the spatial layout of target labels. The
classification algorithm is of exactly the same compu-
tational complexity, a slightly higher computational
burden is put on the learning stage. We applied our al-
gorithm on the body part classification, although any
other application requiring the segmentation of an ob-
ject into parts may benefit from the contribution. An-
other contribution extends the well known depth com-
parison features to edge comparison features obtained
from grayscale images. Results show that RDF in-
deed benefits from the integration of the spatial layout
of parts and the edge features.
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
630