nition rate. The higher training time in BRCF can be
effectively reduced by adopting the proposed weight-
ing method, in which the weights of the sample allo-
cated at the nodes are adaptively controlled. When-
ever growing a tree, it implicitly separate examples of
different classes into left and right child nodes. As an
application, we utilize hand-gesture recognition for
controlling a digital TV. Low-level feature extraction
and decoding can be done quickly by adopting dense
sampling of HOG and HOF descriptors followed by
the clustering of these descriptors using BRCF. We
can successfully decode and classify various hand
gestures and control the digital TV without a signif-
icant time delay. However we admit that user evalu-
ation of the proposed natural user interface by hand
gestures is open to argument.
There are some future works for the proposed
hand gestures recognition method. Firstly, since our
method does not detect hand region, the recognition
rate is highly affected by the changes of position and
scale of the hand. Second, the generated feature vec-
tors are very sparse, hence non-linear classifier is re-
quired to discriminate them. For the first issue, there
are many methods to segment hand region from RBG
or RGB-D images that can be integrated in our recog-
nition pipeline (Kurakin et al., 2012). For the feature
vectors, we could augment it by using Fisher Vector
method (Perronnin and Dance, 2007).
ACKNOWLEDGEMENTS
This work was supported through the sponsored re-
search project Network-type Brain Machine Interface
by the Ministry of Internal Affairs and Communica-
tions, Japan.
REFERENCES
Chaudhry, R., Ravichandran, A., Hager, G., and Vidal, R.
(2009). Histograms of oriented optical flow and binet-
cauchy kernels on nonlinear dynamical systems for
the recognition of human actions. In Computer Vision
and Pattern Recognition, 2009. CVPR 2009. IEEE
Conference on, pages 1932–1939.
Dalal, N. and Triggs, B. (2005). Histograms of oriented gra-
dients for human detection. In Computer Vision and
Pattern Recognition, 2005. CVPR 2005. IEEE Com-
puter Society Conference on, volume 1, pages 886–
893 vol. 1.
Dollar, P., Rabaud, V., Cotrell, G., and Belongie, S. (2005).
Behavior recognition via sparse spatio-temporal fea-
tures. In Visual Surveillance and Performance Eval-
uation of Tracking and Surveillance, 2005. 2nd Joint
IEEE International Workshop on, pages 65–72.
Everingham, M., Ali Eslami, S. M., Gool, L. V., Christo-
pher Williams, K. I., Winn, J., and Zisserman, A.
(2014). The pascal visual object classes challenge:
A retrospective. International Journal of Computer
Vision.
Jurie, F. and Triggs, B. (2005). Creating efficient code-
books for visual recognition. In Computer Vision,
2005. ICCV 2005. Tenth IEEE International Confer-
ence on, volume 1, pages 604–610 Vol. 1.
Kurakin, A., Zhang, Z., and Liu, Z. (2012). A real time sys-
tem for dynamic hand gesture recognition with a depth
sensor. In Signal Processing Conference (EUSIPCO),
2012 Proceedings of the 20th European, pages 1975–
1979.
Laptev, I. and Lindeberg, T. (2003). Space-time inter-
est points. In Computer Vision, 2003. Proceedings.
Ninth IEEE International Conference on, pages 432–
439 vol.1.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond
bags of features: Spatial pyramid matching for recog-
nizing natural scene categories. In Computer Vision
and Pattern Recognition, 2006 IEEE Computer Soci-
ety Conference on, volume 2, pages 2169–2178.
Moosmann, F., Nowak, E., and Jurie, F. (2008). Random-
ized clustering forests for image classification. Pat-
tern Analysis and Machine Intelligence, IEEE Trans-
actions on, 30(9):1632–1646.
Perronnin, F. and Dance, C. (2007). Fisher kernels on vi-
sual vocabularies for image categorization. In Com-
puter Vision and Pattern Recognition, 2007. CVPR
’07. IEEE Conference on, pages 1–8.
Shotton, J., Johnson, M., and Cipolla, R. (2008). Semantic
texton forests for image categorization and segmen-
tation. In Computer Vision and Pattern Recognition,
2008. CVPR 2008. IEEE Conference on, pages 1–8.
Smeaton, A. F., Over, P., and Kraaij, W. (2006). Evaluation
campaigns and trecvid. In In Multimedia Information
Retrieval, pages 321–330.
Uijlings, J., Smeulders, A. W. M., and Scha, R. (2010).
Real-time visual concept classification. Multimedia,
IEEE Transactions on, 12(7):665–681.
Zhang, J., Marszalek, M., Lazebnik, S., and Schmid, C.
(2006). Local features and kernels for classification
of texture and object categories: A comprehensive
study. In Computer Vision and Pattern Recognition
Workshop, 2006. CVPRW ’06. Conference on, pages
13–13.
On-lineHandGestureRecognitiontoControlDigitalTVusingaBoostedandRandomizedClusteringForest
227