or feature histograms. Feature matching algorithms
such as SIFT (Zhou et al., 2009), SURF (Ta et al.,
2009), and shape matching algorithms such as con-
tour matching (Yokoyama and Poggio, 2005) are too
computationally expensive to be considered for real-
time tracking by a moving robot with modest com-
pute resources. Optical flow methods (Denman et al.,
2007) may within reach in terms of speed, but they do
not maintain an appearance model. This means they
are unable (by themselves) to recover from occlusions
and objects leaving the field of view.
Histogram-based trackers, on the other hand, are
not only fast, but also maintain an appearance model
that is potentially useful for recovering tracking af-
ter an occlusion or reappearance in the field of view.
When the target object is occluded or leaves the field
of view (or when the tracker gets lost for some rea-
son), we simply need to suspend tracking, continually
search the image for the reappearance of the target ob-
ject, then, once the object has reappeared in the scene,
reinitialize the tracker.
In this paper, we thus consider the problem of re-
detecting the target object once a feature histogram-
based tracking method has been suspended. We as-
sume that the goal is to search for the target object in
every frame without any bias as to where the object
might appear.
The common approach to object search in com-
puter vision is the sliding window. The typical
algorithm slides a detection window over the im-
age at multiple scales, and at each step, the se-
lected image window’s feature histogram is com-
pared with the stored color histogram (the appear-
ance model). The naive sliding window approach is
computationally inefficient, however, and many re-
searchers have developed methods to improve the ef-
ficiency of sliding window calculations in different
contexts. Porikli (Porikli, 2005) propose an “integral
histogram” method using integral images. The inte-
gral image is a well-known technique that supports
calculating the sum of the values in a rectangular re-
gion of a feature plane in constant time. Perreault
and Hebert (Perreault and Hebert, 2007) compute his-
tograms for median filtering efficiently by maintain-
ing separate columnwise histograms, and, as the slid-
ing window moves right, first updating the relevant
column histogram then adding and subtracting the rel-
evant column histograms to the histogram for the slid-
ing window. Sizintsev et al. (Sizintsev et al., 2008)
take a similar approach to obtain histograms over slid-
ing windows by efficiently updating the histogram us-
ing previously calculated histograms for overlapping
windows.
However, although this work demonstrates that it
is possible to compute sliding window histograms in
constant time per window location, it may still not be
fast enough if multiple window sizes and aspect ratios
must be considered, and furthermore, finding a sin-
gle best rectangular window still does not give a pre-
cise object shape and orientation. Chen et al. (Chen
et al., 2008) address the speed issue by scattering ran-
domly generated elliptical regions over the image in
a first rough detection phase and address the preci-
sion issue by performing fine searches from the more
likely candidate regions. In this paper, we propose a
backprojection-based method for the rough detection
phase that does not require breaking the image into
regions.
CAMSHIFT (Continuously Adaptive Mean Shift)
(Bradski, Oct; Allen et al., 2004) is a fast and ro-
bust feature histogram tracking algorithm potentially
useful for mobile robots in outdoor environments.
The method begins with manual initialization from
a target image patch. It then tracks the region using
a combination of color histograms, the basic mean-
shift algorithm (Comaniciu et al., 2000; Comaniciu
et al., 2003), and an adaptive region-sizing step. It is
scale and orientation invariant. The main drawback
of CAMSHIFT is that if the target leaves the field of
view or is occluded, the algorithm either reports an
error or starts tracking a completely different object.
This limitation of CAMSHIFT lies in the fact that
on each frame, it performs a global backprojection
of the appearance model followed by a local search
for the best target region beginning from the previous
frame’s estimated region. Since the method performs
a search for a local peak in the global backprojection,
it is easily distracted by background objects with sim-
ilar color distributions.
We propose in our previous paper (Basit et al.,
2012), a motion model based on EKF to improve
the pursuit robot trajectory relative to the target path.
We fused pursuit robot (differential drive) kinematics
and target dynamics with a model of the color region
tracking sensor using an extended Kalman filter.
In this paper, we extend our work to pro-
pose an efficient method to 1) intelligently suspend
CAMSHIFT tracking when the target leaves the scene
or is occluded and 2) reinitialize the tracker when the
target object returns to view. The decision to sus-
pend tracking is based on an adaptive threshold ap-
plied to the dissimilarity between the CAMSHIFT
region’s color histogram and the stored appearance
model, as well as heuristic limitations on changes
in the tracking window’s size. The reinitialization
method is based on backprojection of the appearance
model, thresholding of the per-pixel likelihood, and
connected components analysis, resulting in a collec-
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
508