Authors:
Jordi Bautista-Ballester
1
;
Jaume Vergés-Llahí
2
and
Domenec Puig
3
Affiliations:
1
ATEKNEA Solutions and Universitat Rovira i Virgili, Spain
;
2
ATEKNEA Solutions, Spain
;
3
Universitat Rovira i Virgili, Spain
Keyword(s):
Action Recognition, Bag of Visual Words, Multikernel Support Vector Machines, Video Representation.
Related
Ontology
Subjects/Areas/Topics:
Active and Robot Vision
;
Computer Vision, Visualization and Computer Graphics
;
Motion, Tracking and Stereo Vision
;
Video Surveillance and Event Detection
Abstract:
Classifying web videos using a Bag of Words (BoW) representation has received increased attention due to its computational simplicity and good performance. The increasing number of categories, including actions with high confusion, and the addition of significant contextual information has lead to most of the authors focusing
their efforts on the combination of descriptors. In this field, we propose to use the multikernel Support Vector Machine (SVM) with a contrasted selection of kernels. It is widely accepted that using descriptors that give different kind of information tends to increase the performance. To this end, our approach introduce contextual information, i.e. objects directly related to performed action by pre-selecting a set of points belonging to objects to calculate the codebook. In order to know if a point is part of an object, the objects are previously tracked by matching consecutive frames, and the object bounding box is calculated and labeled. We code the action v
ideos using BoW representation with the object codewords and introduce them to the SVM as an additional kernel. Experiments have been carried out on two action databases, KTH and HMDB, the results provide a significant improvement with respect to other similar approaches.
(More)