a b c
Figure 6: Results of the layered-exhaustive search: a. The average Euclidean distance (estimation error), b. Estimation time
of the searches and c. Percentage of the correct estimation for each experimentation
mance when our database is used for training (even if
the repetition of each label in the training set is only
once). Our database led the RDF to create a fewer
number of trees/nodes and to achieve a higher accu-
racy (Exp 1.a). Moreover, the accuracy increased furt-
her if an appropriate higher resolution, of poses for
the primitive-layers, is considered (Exp 1.b). These
improvement were because of the following reasons.
Firstly, it equipped the system to search through deep-
trees –layer-by-layer– implicitly. Thus, this could
lead to the increase in accuracy with almost no com-
putational overhead. Secondly, the shift in the forest’s
training, from pixel-level to pose-vector, could eli-
minate the disadvantages of the pixel-level training
using random forest mentioned in Section 2. More
specifically, in our experimentation, it reduced the in-
put vector size from hundred or thousand to 28 (DoF
and less). That was because there are more significant
variances (amount of information) to the pose-space
than the pixel space.
Moreover, our hierarchical database, introduced
the possibility, to employ the costly exhaustive search
with acceptable performance. The layered-exhaustive
search has difficulties, mainly, in the correct estima-
tion of ‘semi-global’ (wrist) rotation. However, in the
results of this experimentation (Section 4.2) if we to-
lerate minor errors (10%) in the accuracy, the recog-
nition rates, in Exp 2.c and Exp 2.d, will illustrate an
acceptable accuracy (98% and 79%, respectively).
REFERENCES
Bosch, a., Zisserman, A., and Muoz, X. (2007). Image
Classification using Random Forests and Ferns. IEEE,
ICCV , 11th Inter Conf on Com Vis, pages 1–8.
Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Jour-
nal of Software Tools.
Breiman, L. (1999). Random Forests. Machine Learning,
45(5):1–35.
Camgoz, N. C., Kindirolgu, A. A., and Akarun, L. (2014).
Gesture Recognition using Template Based Random
Forest Classifiers. Europeann Conf on Com Vis
(ECCV) Chalearn Workshop, pages 579–594.
Gavrila, D. M. (2007). A Bayesian, exemplar-based appro-
ach to hierarchical shape matching. IEEE Trans on
Pattern Anal and Machine Intel, 29(8):1408–1421.
Heimonen, T., Hakulinen, J., Turunen, M., Jokinen, J. P. P.,
Keskinen, T., and Raisamo, R. (2013). Designing
gesture-based control for factory automation. Lec No-
tes in Com Sci, 8118 LNCS(PART 2):202–209.
Jacob, M. G., Li, Y. T., and Wachs, J. P. (2011). A gesture
driven robotic scrub nurse. IEEE Int Conf on Sys, Man
and Cybrntcs, pages 2039–2044.
Kennedy, J. and Eberhart, R. (1995). Particle swarm optimi-
zation. Neural Net, Proc., , IEEE International Conf.,
4:1942–1948.
Keskin, C., Kirac¸, F., Kara, Y. E., and Akarun, L. (2012).
Hand pose estimation and hand shape classification
using multi-layered randomized decision forests. Lec.
Notes in Com Science, (P6):852–863.
Miranda, L., Vieira, T., Martinez, D., Lewiner, T., Vieira,
A. W., and Campos, M. F. M. (2012). Real-time ge-
sture recognition from depth data through key poses
learning and decision forests. Brazilian Symp of Comp
Grph & Im Proc, pages 268–275.
Mo, Z. and Neumann, U. (2006). Real-time hand pose re-
cognition using low-resolution depth images. Proc. of
the IEEE Com. Society Conf. on Com. Vis. and Pattern
Recognition, 2:1499–1505.
Schlenzig, J., Hunter, E., and Jain, R. (1994). Re-
cursive identification of gesture inputs using hidden
Markov\nmodels. IEEE Proc on Apps of Comp Vis,
pages 187–194.
Schroff, F., Criminisi, A., and Zisserman, A. (2008). Object
Class Segmentation using Random Forests. Proc of
the British Machine Vision Conf, pages 54.1–54.10.
Sharp, T., Keskin, C., Robertson, D., Taylor, J., Shotton,
J., Kim, D., Rhemann, C., Leichter, I., Vinnikov, A.,
Wei, Y., Freedman, D., Kohli, P., Krupka, E., Fitzgib-
bon, A., and Izadi, S. (2015). Accurate, Robust, and
Flexible Real-time Hand Tracking. ACM Conf on Hu-
man Factors in Comp Sys (CHI), pages 3633—-3642.
Shotton, J., Sharp, T., and Kohli, P. (2013). Decision Jung-
les: Compact and Rich Models for Classification.
Starner, T. E. and Pentland, A. (1995). Visual Recogni-
tion of American Sign Language Using Hidden Mar-
kov Models. Media, pages 189–194.
Verikas, A., Gelzinis, A., and Bacauskiene, M. (2011). Mi-
ning data with random forests: A survey and results of
new tests. Pattern Rec., 44(2):330–349.
Yuille, A. and Kersten, D. (2006). Vision as Bayesian in-
ference: analysis by synthesis? Trends in Cognitive
Sciences, 10(7):301–308.
Zhao, W., Chai, J., and Xu, Y.-Q. (2012). Combining
marker-based mocap and RGB-D camera for acqui-
ring high-fidelity hand motion data. Eurographics
ACM SIGGRAPH Symp on Comp Animation.
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
128