very efficient. If we suppose that both sets contain n
elements, the time complexity of building the corre-
sponding 2-d trees and searching for couples of near-
est neighbours is O(n logn), while the time complex-
ity of performing distance checks between every pos-
sible pair of elements would be O(n
2
).
After identifying the couples of nearest neigh-
bours between the two 2-d trees, two last checks are
performed: every foreground area detected near an
entrance zone, but not associated to a previously de-
tected human, is considered as a new human entering
the cell, and every detected human no longer associ-
ated to a foreground area is considered as a person
that left the cell.
5 HUMAN TRACKING VIA
PARTICLE FILTERING
The tracking strategy here adopted is inspired by the
one proposed in (Bascetta et al., 2011). After BG/FG
Segmentation and foreground areas update, human
workers are tracked by a series of particle filters that
rely on a simplified human walking motion model.
The choice of both the motion model and the particle
filtering strategy results from the following assump-
tions:
• the scene consists of a flat ground plane on which
humans walk around;
• a human worker does not walk sideways;
• human workers and industrial robots are the
unique moving objects in the camera field of view,
but, since robots do not enter the scene from the
entrance zones, their detection is automatically
avoided.
5.1 Human Motion Model
A simple and effectiveway of tracking a human being
motion consists in considering his/her volumetric oc-
cupancy. By circumscribing a rectangular box around
a walking person, we are able to describe his/her mo-
tion in terms of translation on the floor and rotation
around the vertical axis crossing the base in its centre
(see Figure 5(a)).
Having fixed on the ground plane a world-base
Cartesian frame, the pose of a human can be com-
pletely described as p = (x, y, θ), where x and y are
the box base coordinate with respect to the world base
frame X-axis and Y-axis respectively and θ is the an-
gle formed between the tangent to the walking path
and the world base frame X-axis.
Finally, according to the assumption that both
the linear velocity v (i.e. the nonholonomic veloc-
ity along the direction of motion) and the angular ve-
locity ω are piece-wise constant, the adopted human
walking dynamic model can be rendered as a slightly
modified version of the unicycle model presented in
(Arechavaleta et al., 2008):
˙x = v cos(θ)
˙y = v sin(θ)
˙
θ = ω
˙v = σ
˙
ω = η
(9)
where σ and η are two independent and uncorre-
lated Gaussian white noises acting respectively on the
linear velocity v and on the angular velocity ω.
5.2 Particle Filtering Strategy
In our scenario deterministic evaluation of the human
motion state is not possible mainly because of signif-
icant measurement noise. Moreover, analytical calcu-
lation of the motion model output in terms of multiple
rectangular boxes (each one projected according to a
single camera point of view) is not feasible.
Consequently, our tracking strategy consists in as-
signing to every detected human a probability distri-
bution over the possible states in the form of a set
weighted particles, propagated in time according to
the motion model presented in Section 5.1. In this
way, for every moving worker, multiple virtual rep-
resentations are generated and his/her motion state is
estimated by selecting the particle whose representa-
tion best matches the measured foreground. At any
time instant i the motion state of a single walking hu-
man being is composed by a set of N particles:
Q
i
=
n
q
( j)
i
| j = 1, . . . ,N
o
(10)
where every particle represents a possible motion
state configuration:
q
( j)
i
=
x
( j)
i
, y
( j)
i
, θ
( j)
i
, v
( j)
i
, ω
( j)
i
(11)
The initial distribution can be considered known
a priori and it corresponds to a scene without mov-
ing workers. Right after instantiation, every filter is
considered “inactive” and its particle set is initialised
via uniform random sampling inside a subspace of the
model state space defined around the entrance areas.
As soon as a new human is detected (see Section 3),
an “inactive” filter is assigned the corresponding fore-
ground area and thus, it becomes “active”.
While receiving continuously updated informa-
tion regarding the foreground area it is tracking (see