as “sudden”. Two “sudden” actions do not have to
look similar, they can be visually as different as kick-
ing and falling. On the other hand, we process live
streams of data from video surveillance cameras in
an underground railway station. We process videos
with an arbitrary number of people anywhere in the
camera view, in different views with different view-
points and perspectives. This causes huge variations
of motion and visual appearance of both people and
actions in subway videos, and makes action recogni-
tion unadapted to the problem at hand. Therefore, we
define our problem as sudden movement discovery.
We tested and evaluated our approach on a dataset
recorded in the Paris subway for a European project.
The strength of our approach, and the contribution
of this paper, lies first in that it is unsupervised. It does
not require training or calibration. This aspect is very
important for application in large video surveillance
systems composed of a large number of cameras. For
instance cities such as London or Mexico City count
up to 1 million of video surveillance cameras in their
streets. It is not conceivable to do supervised train-
ing or calibration on such numbers of cameras. Even
in a subway network such as in Paris, the number of
cameras is several tens of thousands. Our approach
allows for an automatic application on any number
of cameras, without human intervention. Moreover,
we do not impose any constraint on the camera place-
ment. Video surveillance cameras are usually placed
height up and thus see people with a strong perspec-
tive. The visual aspect of people and their motion is
very different whether they are close under the cam-
era or far from it. Our approach detects sudden move-
ments in any typical video surveillance stream with-
out constraints and without human supervision. Our
approach, detailed in section 3, is summarized by fig-
ure 1. The sample sequence used to acquire mean and
variance maps is not chosen by a human operator. It is
a random sequence that can be automatically acquired
at system set-up. The only constraint is to acquire
it during presence hours and not when the scene is
empty. Second, our approach can be combined with a
more semantic interpretation of the scene, such as the
system proposed by (Zaidenberg et al., 2012). This
system detects groups of people in the scene and rec-
ognizes pre-defined scenarios of interest. The sudden
movement detector is an addition to the group activity
recognition system. When a sudden movement is de-
tected inside a group’s bounding box, a special event
is triggered and can be used as it is, as an alert, or it
can be combined into a higher-level scenario to pro-
vide a more accurate semantical interpretation of the
scnene.
2 RELATED WORK
There are many optical flow algorithms. Among
the most popular are Large Displacement Optical
Flow (LDOF) (Brox and Malik, 2011), Anisotropic
Huber-L1 Optical Flow (Werlberger et al., 2009) and
F
¨
arneback’s algorithm (Farneb
¨
ack, 2003). Despite
the first two implementations are more accurate than
the third, however, operate very slowly on a sin-
gle CPU (more than 100 seconds for 2 consecutive
video frames of spatial resolution 640 × 480 pixels).
Therefore, for extracting dense optical flow we use
F
¨
arneback’s algorithm (more precisely its implemen-
tation from the OpenCV library
1
) as a good balance
between precision and speed (1 − 4 fps depending
upon displacement between frames).
Optical flow algorithms are widely used for track-
ing. Recently, dense tracking methods have drawn a
lot of attention and have shown to obtain high per-
formance for many computer vision problems. Wang
et al. (Wang et al., 2011) have proposed to com-
pute HOG, HOF and MBH descriptors along the ex-
tracted dense short trajectories for the purpose of ac-
tion recognition. Wu et al. (Wu et al., 2011) have pro-
posed to use Lagrangian particle trajectories which
are dense trajectories obtained by advecting optical
flow over time. Raptis et al. (Raptis and Soatto, 2010)
have proposed to extract salient spatio-temporal struc-
tures by forming clusters of dense optical flow trajec-
tories and then to assembly of these clusters into an
action class using a graphical model.
Action Recognition is currently an active field of
research. Efros et al. (Efros et al., 2003) aims at rec-
ognizing human action at a distance, using noisy opti-
cal flow. Other efficient similar techniques for action
recognition in realistic videos can be cited (Gaidon
et al., 2011; Castrodad and Sapiro, 2012). Kel-
lokumpu et al. (Kellokumpu et al., 2008) calculate lo-
cal binary patterns along the temporal dimension and
store a histogram of non-background responses in a
spatial grid. Blank et al. (Blank et al., 2005) uses sil-
houettes to construct a space time volume and uses
the properties of the solution to the Poisson equation
for activity recognition.
Another related topic is abnormality detection.
In papers such as (Jouneau and Carincotte, 2011)
and (Emonet et al., 2011), authors automatically dis-
cover recurrent activities or learn a model of what is
normal. Thus they can detect as abnormal everything
that does not fit to the leaned model. Additionally,
(Mahadevan et al., 2010) propose a method to de-
tect anomalies in crowded scenes using mixtures of
dynamic textures, but they do not focus on sudden
1
http://opencv.willowgarage.com/wiki/
TowardsUnsupervisedSuddenGroupMovementDiscoveryforVideoSurveillance
389