each newly acquired frame, we temporally localize it
against the background sequence of the previous ride.
This aims in other words at assigning each current
frame to a background frame so that their viewpoints
are the closest ones. Since efficiency is of major im-
portance in online solutions, the extraction of the cor-
responding frame relies on an image retrieval scheme
based on the SURF descriptor (Bay et al., 2008). A
temporal filter applies to the outcome of the retrieval
task in order to handle false positives (outliers). Then,
we have to spatially register the corresponding frames
into the same coordinate system. As the video acqui-
sition takes place at different times, the appearance of
corresponding frames varies. To cope with such vari-
ations, we adopt the recently proposed ECC image
alignment scheme (Evangelidis and Psarakis, 2008)
that offers the desired robustness. As a final step, dif-
ferent metrics that count on image differences are ap-
plied to detect changes and mark areas of interest.
The contribution of this paper is summarized as
follows: 1) A challenging case of night–time outdoor
surveillance by mobile cameras is investigated. 2)
The proposed FLAR scheme reflects a solution for
online surveillance instead of postprocessing. 3) It
incorporates efficient tasks that allows us to envision
a real-time execution in GPU-based environment. 4)
The desired invariance to the motion style of surveil-
lance vehicle (speed, backward motion) is fulfilled.
1.1 Related Work
The challenging problem of detecting changes be-
tween videos acquired by mobile cameras at differ-
ent times is considerably less tackled than the case
of stationary cameras (Radke et al., 2005). Marce-
naro et al. (Marcenaro et al., 2002) proposed an
outdoor–surveillance based on fixed and pan/tilt mo-
bile cameras that exceeds the limitations of the fixed
camera which monitors the entire scene, but the po-
sition of the mobile camera must be known anytime.
Primdahl et al. (Primdahl et al., 2005) presented a
method for automatic navigation of cameras in a spe-
cific, well–defined corridor. Sand and Teller (Sand
and Teller, 2004) proposed a video matching scheme
for two sequences recorded by moving cameras fol-
lowing nearly identical trajectories. Although it al-
lows pixel–wise comparisons to detect differences, its
key limitation is the computational time of computing
a robust image–alignmentfor several possible pairs of
corresponding frames. To make it efficient, Kong et
al. (Kong et al., 2010) temporally aligned sequences
using GPS data only and detect abandoned suspicious
objects via inter–sequence homographies. In contrast,
Soiban et al. (Soibam et al., 2009) and Haberdar
and Shah (Haberdar, 2010) found manually the cor-
responding frame in the first video for each observed
frame of the second one. Finally, Diego et al. (Diego
et al., 2011) proposed a video alignment framework
based on fusing image–based and GPS observations
to spot differences between sequences taken at dif-
ferent times and by independently moving cameras,
while Chakravarty et al. (Chakravarty et al., 2007)
presented a mobile robot capable of repeating a man-
ually trained route that detect any visual anomalies us-
ing stereo–based algorithm; these anomalies are sub-
sequently tracked using a particle filter.
The rest of this paper is organizedas follows: Sec-
tion 2 describes the whole frameworkand specifically,
subsection 2.1 presents the frame localization ap-
proach, while the spatial registration and the change
detection tasks are discussed in subsections 2.2 and
2.3 respectively. Experiments tovalidate the proposed
algorithm are presented in Section 3 and results are
discussed. Finally, in Section 4, the main conclusions
are drawn.
2 FRAME LOCALIZATION
AND REGISTRATION
Suppose we are given two video sequences repre-
sented as I
r
= {I
r
m
(
ˆ
x)}
M
m=1
and I
c
= {I
c
n
(x)}
N
n=1
, be-
ing M,N their number of frames and
ˆ
x = [ ˆx, ˆy]
t
, x =
[x,y]
t
their spatial coordinates respectively. The for-
mer denotes the reference or background, taken in
a previous ride, whereas the latter is the current se-
quence being recorded in the current ride following
a similar trajectory. Then, the anomalies occurred in
the meanwhile between successive rounds can be de-
tected by matching and comparing the two sequences.
That is, the proper thresholding of image differences
between spatio-temporally aligned sequences allows
the detection of changes.
To solve the above defined problem we propose a
Frame Localization And Registration (FLAR) frame-
work that is shown in Figure 1. The only assumption
we make is that the vehicles follow a similar, approx-
imately coincident, route. The most likely frame of
a previous ride is extracted for each newly acquired
frame in the current ride (localization step). This im-
plies a challenging task because of the independently
moving cameras and the non-coincident trajectories.
As a result, the speed and the position of the cam-
eras vary, while the ambient illumination can be dif-
ferent. A few video alignment approaches (Sand and
Teller, 2004; Liu et al., 2008; Diego et al., 2011) could
be adjusted to our problem. However, none of them
is able to estimate the frame correspondence during
ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods
366