scribed by a statistical model based on pixel charac-
teristics. The current image is then compared to the
background model to detect moving objects and the
background model is updated. Proposed approaches
differ by elimination of noise due to shadows or illu-
mination changes for example.
The context of the movie industry requires using
moving cameras. Approaches that dealt with moving
object detection in the stream of a moving camera can
be divided into three categories.
The first category of method uses motions to de-
tect moving objects. In the video stream of a freely
moving camera, the static background appears mov-
ing. Trajectory segmentation methods try to segment
trajectories that belong to the static scene and those
that belong to moving objects. Methods proposed
by Elqursh et al. (Elqursh and Elgammal, 2012) and
Ochs et al. (Ochs et al., 2014) cluster trajectories us-
ing their location, magnitude and direction. Elqursh
et al. (Elqursh and Elgammal, 2012) label clusters as
static or moving according to criteria such as com-
pactness or spatial closeness, and propagate label in-
formation to all pixels with a graph-cut on an MRF.
Ochs et al. (Ochs et al., 2014) merge clusters ac-
cording to the mutual fit of their affine motion models
and label information are propagated with a hierar-
chical variational approach based on color and edge
information. Sheikh et al. (Sheikh et al., 2009) rep-
resent the background motion by a subspace based
on three trajectories selectionned in the optical flow.
Each trajectory is assigned to the background or the
foreground depending on the trajectory fits to the
subspace or not. Narayana et al. (Narayana et al.,
2013) use a translational camera and exploit only
optical flow orientations which are independent of
depth. Based on a set of pre-computed orientation
fields for different motion parameters of the camera,
the method automatically detects the number of fore-
ground motions. Methods based on trajectory seg-
mentation generally assume that the apparent motion
of the scene is predominant and uniform but this is not
always true.
Another category of approach uses the back-
ground subtraction techniques with a moving camera.
Using a moving camera with constrained motions al-
lowed to construct a background model of the scene.
Each frame is then registered with the background
model to perform the background subtraction. The
Pan Tilt Zoom (PTZ) camera is a constrained mov-
ing camera with a fixed optical center. The key prob-
lem to perform background subtraction is to regis-
ter the camera image with the panoramic background
model at different scale. Xue et al. (Xue et al., 2013)
proposed a method that relies on a panoramic back-
ground model and a hierarchy of images of the scene
at different scales. A match is found between the
current image and images in the hierarchy, then the
match is propagated to upper level until registration
with the panoramic background. Cui et al. (Cui et al.,
2014) use a static camera to capture large-view im-
ages at low resolution to detect motions in order to
define a rough region with the moving object. The
high resolution image of the PTZ camera is registered
with the background model with feature point match-
ing and refines with an affine transformation model.
Background subtraction techniques can also be used
with a freely moving camera. In general, the cam-
era motion is compensated with a homography, but
when the camera undergoes a translational motion
some misalignments created by parallax can be de-
tected as moving object. Romanoni et al. (Romanoni
et al., 2014) and Kim et al. (Kim et al., 2013) used
a spatio-temporal model to classify misaligned pixels
relying on neighborhood analysis. Instead of regis-
tering the whole image, the method proposed by Yi
et al. (Yi et al., 2013) divided the image into grids,
each grid being described by a single gaussian model.
To keep the background model up-to-date, each block
at previous time is mixed with blocks in the current
image after registration. Two background models are
maintained to prevent foreground contamination. All
these approaches handled scenes with uniform appar-
ent motion of background but failed when the camera
films a complex scene closely.
The last category of methods approximating the
scene by one or several planes are Plane+Parallax
and multi-plane approaches. According to the mo-
tion of a freely moving camera, motions of two static
objects observed in the video stream can have differ-
ent magnitudes and orientations. The Plane+Parallax
methods extend the plane approximation by taking
into account the parallax. Some work have first ap-
proximated the scene by a plane (Irani and Anan-
dan, 1998; Sawhney et al., 2000; Yuan et al., 2007).
This plane is the physical dominant plane in the im-
ages and called the reference plane. After register-
ing consecutive images according to the reference
plane, the misaligned points are due to parallax. In
order to label these points as moving or static, the
authors propose geometric constraints based on the
reference plane. The Plane+Parallax methods only
handle scenes that can be approximated by one dom-
inant plane as aerial images. To be more general, the
multi-layer approaches approximate the scene by sev-
eral virtual or physical planes. Wang et al. (Wang and
Adelson, 1994) propose that each layer contains an
intensity map, an alpha map and a velocity map. The
optical flow is segmented with k-means clustering.
Coupled 2D and 3D Analysis for Moving Objects Detection with a Moving Camera
237