2 RELATED WORK
Background subtraction is a widely used approach for
detection of moving objects in video sequences. This
approach detects moving objects by differentiating
between the current frame and a reference frame,
often called the background frame, or background
model. The background frame should not contain
moving objects. In addition, it must be regularly
updated in order to adapt to varying conditions
such as illumination and geometry changes. This
section provides a review of some state-of-the-art
background subtraction techniques. These techniques
range from simple approaches, aiming to maximize
speed and minimizing the memory requirements, to
more sophisticated approaches, aiming to achieve
the highest possible accuracy under any possible
circumstances. The goal of these approaches is to
run in real-time. We focus on methods that share
the resemble the proposed algorithm in space and
time complexity, however, additional references can
be found in (Collins et al., 2000; Piccardi, 2004;
Macivor, 2000; Bouwmans, 2014; Bouwmans et al.,
2017).
Lo and Velastin (Lo and Velastin, 2001) proposed
to use the median value of the last n frames
as the background model. This provides an
adequate background model even if the n frames are
subsampled with respect to the original frame rate
by a factor of ten (Cucchiara et al., 2003). The
median filter is computed on a special set of values
that contains the last n subsampled frames and the last
computed median value. This combination increases
the stability of the background model.
In (Wren et al., 1997) an individual background
model is constructed at each pixel location (i, j) by
fitting a Gaussian probability density function (pdf)
to the last n pixels. These models are updated via
running average given each new frame that arrives.
This method has a very low memory requirement.
In order to cope with rapid changes in the
background, a multi-valued background model was
suggested in (Stauffer and Grimson, 1999). In this
model, the probability of observing a certain pixel x
at time t is represented by a mixture of k Gaussians
distributions. Each of the k Gaussian distributions
describe only one of the observable background or
foreground objects.
A Kernel Density Estimation (KDE) of the buffer
of the last n background values is used in (Elgammal
et al., 2000). The KDE guarantees a smooth,
continuous version of the histogram of the most recent
values that are classified as background values. This
histogram is used to approximate the background pdf.
Mean-shift vector techniques have proved to be
an effective tool for solving a variety of pattern
recognition problems e.g. tracking and segmentation
((Comaniciu, 2003)). One of the main advantages
of these techniques is their ability to directly detect
the main modes of the pdf while making very few
assumptions. Unfortunately, the computational cost
of this approach is very high. As such, it cannot
be applied in a straightforward manner to model
background pdfs at the pixel level, however, in
(Piccardi and Jan, 2004; Han et al., 2004) this is
mitigated by using optimization.
Seki et al. (Seki et al., 2003) use spatial co-
occurrences of image variations. They assume that
neighboring blocks of pixels that belong to the
background should have similar variations over time.
This method divides each frame to distinct blocks
of N × N pixels where each block is regarded as an
N
2
-component vector. This trades-off resolution with
high speed and better stability. During the learning
phase, a certain number of samples is acquired at a
set of points, for each block. The temporal average
is computed and the differences between the samples
and the average, called the image variations, are
calculated. Then the N
2
× N
2
covariance matrix is
computed with respect to the average. An eigenvector
transformation is applied to reduce the dimensions of
the image variations.
This approach is based on an eigen decomposition
of the whole image (Oliver et al., 2000). During a
learning phase, samples of n images are acquired. The
average image is then computed and subtracted from
all the images. The covariance matrix is computed
and the best eigenvectors are stored in an eigenvector
matrix. For each frame I, a classification phase is
executed: I is projected onto the eigenspace and then
projected back onto the image space. The output is
the background frame, which does not contain any
small moving objects. A threshold is applied to the
difference between I and the background frame.
3 THE DB ALGORITHM
Dimensionality reduction techniques represent high-
dimensional datasets using a small number features
while preserving the information that is conveyed
by the original data. This information is mostly
inherent in the geometrical structure of the dataset.
Therefore, most dimensionality reduction methods
embed the original dataset in a low dimensional space
with minimal distortion to the original structure.
Classic dimensionality reduction techniques such as
Principal Component Analysis (PCA) and Classical
A Diffusion Dimensionality Reduction Approach to Background Subtraction in Video Sequences
295