
The main purpose of this work is to use, as much
as possible, the embedded information, taking
advantage of the huge amount of analysis work
performed by the MPEG video encoder.
Furthermore, only few parameters have to be
adjusted in the detector, regarding to the class of the
moving object in scene. This class is related to the
dimension of the moving object in scene and to the
distance between the object and the camera. This
image classification will increase the accuracy of the
motion detector.
In the following sections we will describe an
efficient and low complexity scene change detector
algorithm, which is able to detect significant visual
events from a partially decoded MPEG bit stream. In
section 2 we introduce MPEG standard and in
section 3 the proposed algorithm is described. Some
results are shown in section 4 and conclusions are
presented in section 5.
2 MPEG BIT STREAM
INFORMATION
MPEG encoders use a hybrid algorithm to compress
video, by classifying and processing each frame as
intra coded (I frame) or motion compensated inter
coded (P and B) (F. Pereira and T. Ebrahimi, 2002).
Intra frame pictures are encoded only using pixels
within a frame, exploring the spatial redundancy
with 8×8 DCT (Discrete Cosine Transform) blocks
are transformed and DC and AC coefficients are
entropy coded. P frames are encoded using motion
compensated prediction from a past I/P frame, in
order to remove the temporal redundancy. B frames
are encoded using motion compensation prediction
from both past and/or future encoded I/P frames.
Video frames are organized in regular structures
called group of pictures (GOP). Each frame (VOP)
is divided into blocks of 16×16 pixels, called
macroblocks (MB). Furthermore, each macroblock
is divided into six 8×8 pixel blocks. After motion
compensation, the residual image may also be
divided into 8×8 pixel blocks, which are intra coded.
Thus, a macroblock contains information about the
type of temporal prediction used (or not) for motion
compensation, which can be classified as intra
coded, forward referenced, backward referenced,
interpolated or direct. While MBs inside an I frame
are intra coded, each MB in a P frame is either
forward predicted, intra coded or skipped. Similarly,
each MB in a B frame is either forward predicted,
backward predicted, bidirectionally predicted, intra
coded or skipped.
3 COMPRESSED DOMAIN
MOTION DETECTION
In this section, we explain how motion detection is
performed without fully decoding the bit stream.
The proposed method mainly relies on the analysis
of AC coefficient’s signal of I frames (section 3.1)
and on the motion vector information of P and B
coded frames (section 3.2). The main objective is to
detect only motion related to the moving objects in
the scene, eliminating camera switching (scene cuts)
and some typical camera movements, which occurs
in video surveillance scenes.
13
3.1 Motion detection
In most surveillance applications, systems acquire
and store images continuously, then a huge amount
of information is required to be stored. In this case a
high compression ratio is desirable. It is also
common, that for long periods of time there are no
motions in the scene. Thus, VOPs of type I can be
sparser, which increases significantly the
compression ratio. In this sense, we propose a
hierarchical algorithm that processes the compressed
video information in two stages.
At the first stage, only I VOPs are analyzed, in
order to check the signal variations between AC
coefficients of two co-localized blocks in
consecutive I VOPs. In order to speed up the
process only a small set of significant coefficients
are checked, and blocks with a number of
coefficients with signal variation larger than 5 is
used. When a number of blocks in this condition
exceed a certain threshold, the image is regarded as
containing a moving object. This threshold is
obtained regarding the average and the variance of
the number of blocks containing more than 5 signal
variations. We also have to deal with homogeneous
surfaces and illumination changes, which tend to be
detected as motion. When a VOP of type I is
detected with moving objects, the algorithm moves
to the second stage for a motion detection
refinement.
At the second stage, motion vectors of P and B
VOPs are analyzed, in order to check the amount of
motion vectors (MV) used to encode each inter
frame. If the number of non-zero MVs exceeds a
threshold given for that class (section 3.3) of
surveillance scene, then the VOP is regarded to
contain a moving object.
After this step, it may happen that some motion
detections are false, due to camera switching (scene
cuts) or camera motions. These false motion
FAST EVENT DETECTION IN MPEG VIDEO
265