Table 1: Confusion matrix for nearest neighbor approx.
90%10%0% 0% 0% 0% 0% 0%
4% 96%0% 0% 0% 0% 0% 0%
2% 4% 56%38%0% 0% 0% 0%
0% 6% 0% 94%0% 0% 0% 0%
0% 0% 2% 0% 96%0% 2% 0%
0% 0% 0% 0% 0% 70%30%0%
0% 0% 0% 0% 0% 2% 98%0%
0% 0% 0% 0% 4% 42%2% 52%
Table 2: Performance results for nearest neighbor approx.
90% 96% 56% 94% 96% 70% 98% 52%
94% 83% 97% 71% 96% 61% 74% 100%
98% 97% 94% 95% 99% 91% 96% 94%
‘1v0v0l0l0’. This is, indeed, much closer to Posture
4. Also, the data used for training showed that esti-
mating folded vs. stretched fingers is not completely
possible using 2D information solely. This can be im-
proved by considering also depth information.
Comparison to Related Work. We compare
against recently introduced 3D SURF (Knopp et al.,
2010), which has not been investigated yet for hand
gesture recognition. Our aim is to show that, although
depth information is essential for robust hand seg-
mentation and may provide benefits for the posture
recognition on its own, 3D descriptors, in their cur-
rent state, do not pay-off for certain problems. We
show that using our approach we obtain better results
than using a more expensive 3D descriptor, which re-
quires both more data and computational time to train
a model. We consider a bag of words approach to-
gether with a multi-class SVM classifier, similarly
as in (Knopp et al., 2010). We obtain the follow-
ing results for 3D SURF: R = 64.5%, P = 71.0% and
Acc = 87.88%, as apposed to R = 81.5%, P = 84.5%
and Acc = 95.5% for our rule-based approach.
