INCREMENTAL DETECTION AND TRACKING OF MOVING
OBJECTS BY OPTICAL FLOW AND A CONTRARIO METHOD
Dora Luz Almanza-Ojeda, Michel Devy and Ariane Herbulot
CNRS, LAAS, 7 avenue du Colonel Roche, F-31077 Toulouse, France
Universit´e de Toulouse, UPS, INSA, INP, ISAE, LAAS-CNRS, F-31077 Toulouse, France
Keywords:
Moving obstacles, Detection, Tracking, Clustering, Monocular vision.
Abstract:
This paper concerns moving objects detection and tracking based on the a contrario theory and on a Kalman
filtering process. Only visual information is acquired from a B&W camera embedded on a mobile robot. KLT
and a contrario theory are used to initially detect and cluster moving points. Then, each detected group of
moving points is tracked as a moving object using Kalman Filter. The process detection-clustering-tracking is
executed in an iterative way to deal with some challenges for real robot navigation. Furthermore, the area in
which a moving obstacle is detected, is enlarged in the time until its real limits: clusters are fused with already
detected objects considering similarities about their respective velocities and positions. Experimental results
on real dynamic images acquired from a camera mounted on a moving robot, are presented and discussed.
1 INTRODUCTION
One key function required for autonomous robot nav-
igation, must cope with the detection of objects close
to the robot trajectory, and the estimation of their
states. This function has been studied by the robotic
and the Intelligent Transportation Systems commu-
nities, from different sensory data. For driver assis-
tance, many contributions concern laser-based obsta-
cle detection and tracking (Vu and Aycard, 2009).
Some works have made more robust the approach
from the fusion with monovision (Gate et al., 2009).
But in spite of numerous contributions, this function
still remains a challenge when it is based only on vi-
sion. So, this work concerns the detection of mobile
objects from images acquired from a robot moving in
an outdoor environment . It is proposed to reach this
objective, using only a monocamera system: as it has
been proved in numerous works (Davison, 2003), 2-D
information is sufficient in order to estimate the cam-
era motion using a SLAM algorithm, based on static
points. The proposed strategy consists in detecting
these static points, and moreover detecting and clus-
tering the moving ones in order to track mobile ob-
jects: it is the first step towards the full integration of
a Visual SLAMMOT approach.
The KLT tracker (Shi and Tomasi, 1994) based
on sparse optical flow, is widely used for robotics ap-
plications, because of its simplicity and low compu-
tational cost. Our own method is based also on the
KLT tracker as a valid and confirmed procedure, that
can be applied in a real time context during naviga-
tion. Next, in order to identify which of the tracked
interest points belong to a moving object, we use a
clustering based on the a contrario theory (Desolneux
et al., 2008). (Veit et al., 2007) have validated this
clustering algorithm which does not need any param-
eter tuning for finding clusters of dynamic features
in an image sequence. (Poon et al., 2009) have also
adapted this approach for the detection of moving ob-
jects in short sequences; additionally, the authors ob-
tain 3D components of feature points to improve the
correspondence between the points and the moving
objects. The authors present experimental results on
real images, acquired from a fixed camera; essential
issues of autonomous navigation are not considered.
Finally, the tracking of the detected moving object
is performed by Kalman Filtering. This procedure is
tested on a long sequence of images acquired in an
outdoor open environment during a robot navigation
task. Moving object region is incrementally increased
thanks to statistical evaluations.
2 OVERALL STRATEGY
Figure 1 presents the algorithm performed for every
image acquired at time t with a given period t. Ini-
480
Luz Almanza-Ojeda D., Devy M. and Herbulot A. (2010).
INCREMENTAL DETECTION AND TRACKING OF MOVING OBJECTS BY OPTICAL FLOW AND A CONTRARIO METHOD.
In Proceedings of the International Conference on Computer Vision Theory and Applications, pages 480-483
DOI: 10.5220/0002849404800483
Copyright
c
SciTePress
tially objects tracking (doted rectangle 3) is not acti-
vated, because no moving object has been detected.
The first step (doted rectangle 1) detects a given num-
ber Npts of interest points using the KLT detector.
The KLT tracker is then performed on the Nim next
images to build a trail for every tracked point. We will
call Nimt, the “time of trail” because it represents
the number of images used to accumulate positions
and velocities of tracked KLT features. Specifically
four images are considered as enough to estimate the
apparent motion of a point. Before looking for new
trails in this process 1, new feature points are selected.
KLT process is executed continuously while the robot
navigates in order to provide new visual information
of the environment at each time of trail.
Image(t)
Features
selection
Next
image
KLT: Track
Features
t%4?
1.
Moving
objects?
N
Clustering
Y
Objects?
Merging
Y
Objects
tracking
Y
Delete
objectlimits
2.
N
3.
N
Figure 1: Algorithm to detect and track multiple moving
objects.
The second step is performed on trails provided
at each time of trail, for only moving features (trails
longer than 1 pixel). These features will be grouped
thanks to the a contrario theory (doted rectangle 2).
Resulting sets of points, i.e. clusters, represent mov-
ing objects in the scene. If no object is present then
the control process activates again the first step. Oth-
erwise, a merging evaluation is carried out based on
similar velocity and close position among already de-
tected objects and new ones. The third step performs
independently initialization and tracking by a Kalman
filter of clusters detected as moving objects (doted
rectangle 3). Finally, object current positions are kept
in an occupation grid, in order to avoid several detec-
tions of the same object by our procedure. This task
is entitled as ”Delete object limits” block.
3 OPTICAL FLOW FIELD
We use a sparse optical flow because we must dis-
tribute the processing time among some other main
tasks. Npts initial interest points in input image t
0
are
detected by analyzing of spatial image gradients in
two orthogonal directions (typically N = 150). Loca-
tions of these initial interest points, in next image, are
obtained by maximizing a correlation measure over a
small window. The iterative process is accelerated by
constructing a pyramid with scaled versions of the in-
put image. Furthermore, rotation, scaling and shear-
ing of each point are pertinently handled by calculat-
ing their corresponding linear spatial transformation
parameters during the iterative process. Once dis-
placement vectors are obtained for all initial features,
their velocity is estimated based on their displacement
vector.
When moving objects are detected, the corre-
sponding points are subtracted from the Npts initial
points. Thus in following iterations, KLT process will
search less than Npts new points under the constraint
that these points must not be located close to object
features. This operation allows us to maintain a fix
number of interest points between the KLT and the
tracker process; this rigorous control in the number of
points is important for our performance because long
image sequences will be evaluated.
4 MOVING OBJECT DETECTION
AND TRACKING
Given an input vector V(x, y, v, θ) in R
4
(trails ob-
tained by KLT during a time of trail), the method eval-
uates which elements in V have a particular distribu-
tion contrary to the established random distribution p
of the background model. So, a binary tree with V el-
ements is constructed using a single linkage method.
Each node in the tree represents a candidate group G
that will be evaluated in a set of given regions repre-
sented by H . Each region H H is centered at each
element X G until finding the region H
X
that con-
tains all elements in G; at the same time this region
has to minimize the probability of the background
model distribution. The final measure of meaningful-
ness (called Number of False Alarms NFA) is given
by Eq. (1).
NFA(G) = N
2
H
min
X G,
H H ,
G H
X
B(N 1, n 1, p(H
X
))
(1)
INCREMENTAL DETECTION AND TRACKING OF MOVING OBJECTS BY OPTICAL FLOWAND A CONTRARIO
METHOD
481
In this equation N represents the number of trails in
V, so the number of tracked points from the Npts se-
lected features,
H
is the cardinality of regions and
n is the elements number in the group G. The term
which appears in the minimum function is the ac-
cumulated binomial law. Distribution p consists of
four independent distributions, one for each dimen-
sion data. A group G is said to be meaningful if
NFA(G) 1.
Furthermore two sibling meaningful groups in the
binary tree could belong to the same moving object,
then a second evaluationfor all the meaningful groups
is calculated by Eq. (2). To obtain this new mea-
sure, we use region group information (dimensions
and probability) and a new region that contains both
test groups G
1
and G
2
is computed. New terms are
N
= N 2, number of elements in G
1
and G
2
, re-
spectively n
2
= n
1
1 and n
2
= n
2
1, and term T
which represents the accumulated trinomial law.
NFA
G
(G
1
, G
2
) = N
4
·
H
2
T
N
, n
1
, n
2
, p
1
, p
2
(2)
Both measures defined in Eq. (1) and Eq. (2) repre-
sent the significance of groups of the binary tree. Fi-
nal clusters are found by exploring all the binary tree,
comparing if it is more significant to have two mov-
ing objects G
1
and G
2
or to fusion it in a single group
G. Mathematically, NFA(G) < NFA
G
(G
1
, G
2
) where
G
1
G
2
G.
4.1 Merging Groups
This function is executed when moving objects have
been detected from previous times of trail. Let us
suppose that new ones are detected by the cluster-
ing method. O is a set of M objects given by O =
O
T
O
C
where O
T
consists of (1, 2, ..., k) moving ob-
jects tracked by Kalman filter, and O
C
consists of
(1, 2, ..., l) new moving clusters, that could be inter-
preted either as new moving objects, or part of exist-
ing ones. For each object in O , the velocity vector
is modeled by the mean of their velocity components
in X and Y, respectively represented by µ
v
X
and µ
v
Y
.
Eq. (3) gives a decision measure for merging regions.
min
i, j M,
i 6= j,
O
i
, O
j
O

s(µ
v
X
(O
i
), µ
v
X
(O
j
))
s(µ
v
Y
(O
i
), µ
v
Y
(O
j
))

<
d
v
X
d
v
Y
(3)
We evaluate the similarity measure s which performs
the subtraction among velocity models for each ob-
ject in O . Parameters d
v
X
and d
v
Y
are constant values
set to one pixel. This evaluation is carried out in a
linked way, where merged groups are removed from
O and added as a new object at the end of the list
with, obviously, a new corresponding velocity model.
This strategy enriches the decision process for regions
merging.
4.2 Moving Objects Tracking
Every new object, defined as a cluster in O
C
, is copied
in O
T
as (1) a list of points and the including bound-
ing box extracted from the last image of the time of
trail, and (2) a state vector with the barycenter and the
mean velocity, i.e. X, Y, µ
v
X
and µ
v
Y
values, respec-
tively. Then, as shown in Figure 1, a Kalman filter
tracker, with a constant velocity model, is applied to
find the next object position in next images, using the
KLT tracker results. A feature point could be removed
from the model object when it is not tracked or when
the result given by the KLT tracker is not inside the
object bounding box or is too far of the mean object
points motion. When an object is out of image bounds
or occluded in the scene, it is removed from the track-
ing process.
Finally, a temporal occupation grid is managed in
order to select new KLT features, so that the KLT
tracker is always applied to Npts points: new points
are selected in order to increase the points density in-
side or around moving objects, or in order to monitor
image areas classified as static for a long time.
5 EXPERIMENTAL RESULTS
Robot navigation was performed in a parking with a
camera mounted on our robot; 640× 480 images are
processed off line at 10Hz by a C++ implementation
of our algorithm. By now, it is not integrated with
the robot localization, therefore, we carefully control
robot speed. Figure 2 presents images with main situ-
ations about object detection during the robot motion.
Figure 2a shows the bounding box of two moving
objects, that we labeled as O
1
and O
2
for the right
and left side car, respectively. Object region growing
could be possible at each time of trail when new clus-
ters are detected, as depicted in Figure 2b for O
1
while
Kalman Filter tracks both objects at each image time.
Until Figure 2b, O
1
always shows a fronto-parallel
motion. Caused by a diagonal motion of the car O
2
,
our method detects some regions in the same object
that have different displacements and consequently
different velocities (Figure 2c). To solve this problem,
we initialize and track all objects independently and
some times of trail later, merging is possible as Fig-
ure 2d illustrates it. In the same image, O
1
is hidden
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
482
(a) Image 366. (b) Image 386. (c) Image 418.
(d) Image 446. (e) Image 472. (f) Image 488.
Figure 2: Detection clustering and tracking of moving objects during robot navigation. Top row: Detection and tracking of
right side car O
1
and left side car O
2
. Second row: O
1
and O
2
cross front visual field of robot until O
2
is out of image.
and is removed from the filter process. Detected ob-
jects in the ground are caused by camera movement,
however they fall quickly out of image bounds. Fig-
ures 2e and 2f shows that O
1
is totally detected and
tracked again while O
2
region becomes smaller until
it disappears.
6 CONCLUSIONS AND FUTURE
WORK
The global algorithm works fast, so that it could be
embedded on the robot and executed on line. To
guarantee the highest performance in overall strategy,
the number of feature points processed by the KLT
tracker and by the clustering method must be under
150. Two future works are considered; at first, a new
strategy is evaluated for reducing the latency time be-
tween the arrival of a moving object in the camera
view field and its detection by our algorithm. It re-
quires to build several trails and to apply the cluster-
ing algorithm in parallel. Moreover, a general strat-
egy to estimate robot motion based on monocamera
SLAM approach using static points will be applied to
compensate the points motion caused by the camera
motion, while dynamic points will be considered in a
MOT process.
ACKNOWLEDGEMENTS
This work has been supported by the scholarship
183739 of the Consejo Nacional de Ciencia y Tec-
nolog´ıa (CONACYT), the Secretar´ıa de Educaci´on
P´ublica and by the mexican government.
REFERENCES
Davison, A. (2003). Real-time simultaneous localisation
and mapping with a single camera. In Int. Conf. on
Computer Vision, pages 1403–1410.
Desolneux, A., Moisan, L., and Morel, J.-M. (2008). From
Gestalt Theory to Image Analysis A Probabilistic Ap-
proach, volume 34. Springer Berlin / Heidelberg.
Gate, G., Breheret, A., and Nashashibi, F. (2009). Cen-
tralised fusion for fast people detection in dense envi-
ronments. In ICRA’09, IEEE Int. Conf. on Robotics
Automation, Kobe, Japan.
Poon, H. S., Mai, F., Hung, Y. S., and Chesi, G. (2009).
Robust detection and tracking of multiple moving ob-
jects with 3d features by an uncalibrated monocular
camera. In 4th International Conference on Computer
Vision/Computer Graphics CollaborationTechniques,
pages 140–149, Berlin, Heidelberg. Springer-Verlag.
Shi, J. and Tomasi, C. (1994). Good features to track. In
Proc. IEEE Conf. on Computer Vision and Pattern
Recognition, 1994., pages 593–600.
Veit, T., Cao, F., and Bouthemy, P. (2007). Space-time a
contrario clustering for detecting coherent motion. In
ICRA’07, IEEE Int. Conf. on Robotics and Automa-
tion, pages 33–39, Roma, Italy.
Vu, T. V. and Aycard, O. (2009). Laser-based detection
and tracking moving objects using data-driven markov
chain monte carlo. In ICRA’09,IEEE Int. Conf. on
Robotics Automation, Kobe, Japan.
INCREMENTAL DETECTION AND TRACKING OF MOVING OBJECTS BY OPTICAL FLOWAND A CONTRARIO
METHOD
483