Authors:
Nathanael L. Baisa
1
;
Deepayan Bhowmik
2
and
Andrew Wallace
1
Affiliations:
1
Heriot-Watt University, United Kingdom
;
2
Sheffield Hallam University, United Kingdom
Keyword(s):
Visual Tracking, Correlation Filter, CNN Features, Hybrid Features, Online Learning, GM-PHD Filter.
Related
Ontology
Subjects/Areas/Topics:
Computer Vision, Visualization and Computer Graphics
;
Motion, Tracking and Stereo Vision
;
Tracking and Visual Navigation
;
Video Surveillance and Event Detection
Abstract:
Tracking a target of interest in crowded environments is a challenging problem, not yet successfully addressed in the literature. In this paper, we propose a new long-term algorithm, learning a discriminative correlation filter and using an online classifier, to track a target of interest in dense video sequences. First, we learn a translational correlation filter using a multi-layer hybrid of convolutional neural networks (CNN) and traditional hand-crafted features. We combine the advantages of both the lower convolutional layer which retains better spatial detail for precise localization, and the higher convolutional layer which encodes semantic information for handling appearance variations. This is integrated with traditional features formed from a histogram of oriented gradients (HOG) and color-naming. Second, we include a re-detection module for overcoming tracking failures due to long-term occlusions by training an incremental (online) SVM on the most confident frames using ha
nd-engineered features. This re-detection module is activated only when the correlation response of the object is below some pre-defined threshold to generate high score detection proposals. Finally, we incorporate a Gaussian mixture probability hypothesis density (GM-PHD) filter to temporally filter high score detection proposals generated from the learned online SVM to find the detection proposal with the maximum weight as the target position estimate by removing the other detection proposals as clutter. Extensive experiments on dense data sets show that our method significantly outperforms state-of-the-art methods.
(More)