(a) (b)
Figure 1: Detection system example. Once the background
subtraction technique is applied, we select a poo of bound-
ing boxes containing the blobs in the scene (a). Viola-Jones
type classifier, in combination with a torso estimation, are
used to select the possible detections (a), which are finally
evaluated using the HOG feature based SVM (b).
then, both the Viola-Jones type classifier and the HOG
based feature SVM are used to detect every moving
person. We decided not to use optic flow to detect ev-
ery moving pixel in the scene because of the process-
ing speed. Since information about the module or the
orientation of the movement are not going to be used
in this methodology, a background subtraction tech-
nique with online updating system is used. We use
the Mixture of Gaussians (MoG) algorithm (Stauffer
and Grimson, 1999) with a short window history, in
order to quickly consider as background stopped tar-
gets. This is also done to make the system more robust
to sudden illumination changes. After that, opening
and closing morphological operators are user to re-
duce the image noise.
Before using the Viola-Jones type classifier, we
have to reduce the processing regions. Having the
background information technique, we perform a
blob detection and, for each blob, a bounding box is
created containing all its foreground pixels.
After we have the bounding boxes, we use the
Viola-Jones type classifier under these regions. De-
tails of how to train the classifier can be seen in (Viola
and Jones, 2001). Finally, a pool of different patches
are obtained (green rectangles in Fig. 1-(a)). Before
we apply the HOG feature based SVM, we introduce
a restriction based in the human movement knowl-
edge. A head movement detected by the background
subtraction technique is always followed by a torso
movement. So, using a small rectangle at the bottom
of the patch we can check the foreground pixels to see
if a movement occurs down the head. If no movement
is found, the patch is discarded. The height of this
rectangle is related with the patch size we are check-
ing. Blue rectangles in Fig. 1-(a) shows an example
of the rectangles we use to check the torso.
The HOG Feature Based SVM is applied to every
remaining patch to confirm it is related to a person.
We perform a classic HOG technique. HOG feature
extraction details can be seen in (Dalal and Triggs,
2005). The computational cost to obtain this feature
is high, but we can improvethe performanceusing the
integral histogram technique (Porikli, 2005). In Fig.
1-(b) we can see both accepted and rejected patches.
3 TRACKING SYSTEM
Although tracking system is mainly focused on us-
ing a particle-filter system, that method decreases its
quality during long-time scenes, since an error in the
estimation is carried along the frames without any
possibility to correct them. Our tracking system is de-
fined as follows: first, we use the detection system ex-
plained before to also track the target. If this method
fails, we perform a particle-filter system to locate the
best target state into the new frame. Finally, we in-
troduce the new elements detected by the detection
system.
After using the detection system, we have a pool
of different accepted patches. We also have, at time
t, a set containing all the targets tracked by the sys-
tem in previous frames. Thus, we create, for every
new patch, a set containing all the targets that could
fit with the new detection, using the euclidean dis-
tance. If we have only one target candidate, we assign
the new patch with the target contained into the set.
If there is more than one candidate. a collision oc-
curs. Then, we perform a comparison between their
respective HOG features using the Bhattacharyya co-
efficient, which is a good method for tracking non-
rigid objects (Comaniciu et al., 2000). The target
which obtains a better coefficient is assigned to the
new patch position.
We compute a predicted position for the targets we
have stored, using a bunch of linear filters (Adalines)
to estimate the target speed, because of its simplic-
ity and its performance under noisy images (Cancela
et al., 2011).
We propose to use a particle-based system, choos-
ing the extracted local HOG features, to model its
appearance. Using the linear filters to predict the
velocity of each target, we can assume the new tar-
get position is described as z
t
j
= ˜z
t
j
+ ω, where ˜z
t
j
is
the predicted position of the target z
j
at time t and
ω ∼ N(0, Σ) is a Gaussian Noise. In our particle filter
system, we add Gaussian noise to the predicted po-
sition to generate a bunch of different particles. The
local HOG feature is extracted for each particle and,
using the Bhattacharyya coefficient we choose the po-
sition which obtains the best value.
Once the position is located, we have to update the
model appearance. Because of a bad chosen particle,
the model should maintain information about previ-
ous appearances. So, having the target model
ˆ
O
t
and
HumanDetectionandTrackingunderComplexActivities
371