2 RELATED WORK
Most video copy detection algorithms based on
global feature extract low-level feature from the
video images to represent the video, but these
algorithms are sensitive to various copy techniques,
so the detection result is not satisfactory. In contrast
to the global features, the local feature describes the
structure and texture information of neighborhood of
the interest point (Joly et al., 2007), having a good
robustness generally to brightness, viewing angle,
geometry and affine transformations. The techniques
based on local feature are divided into five types:
spatial methods, temporal methods, spatial-temporal
methods, transform-domain methods and color
methods.
On the other hand video copy detection
approaches can be classified into two large groups.
The first group includes non-key-frames based
approaches which used the whole video sequence in
the detection process. Jiang et al. (Jiang et al., 2013)
proposed a rotation invariant VCD approach; each
selected frame is partitioned into certain rings. Then
Histogram of Gradients (HOG) and RMI are
calculated as the original features. In (Cui et al.,
2010), a fast CBCD approach based on the Slice
Entropy Scattergraph (SES) is proposed. SES
employs video spatio-temporal slices which can
greatly decrease the storage and computational
complexity. Yeh et al. (Yeh et al., 2009) proposed a
frame-level descriptor for Large scale VCD. The
descriptor encodes the internal structure of a
video frame by computing the pair-wise
correlations between geometrically pre-indexed
blocks. In (Wu et al., 2009), Wu et al. introduced a
Self-Similarity Matrix (SSM) based video copy
detection scheme and a Visual Character-String
(VCS) descriptor for SSM matching. Then in (Wu et
al., 2009), the authors added a transformation
recognition module and used a self-similarity matrix
based near-duplicate video matching scheme. By
detecting the type of transformations, the near-
duplicates can be treated with the ‘best’ feature
which is decided experimentally. In (Ren et al.,
2012), Ren et al. proposed a compact video
signature representation as time series for either
global feature or local feature descriptors. It
provides a fast signature matching through major
incline-based alignment of time series.
The Second group contains key-frames based
techniques. Zhang et al. (Zhang et al., 2010)
proposed a CBVCD based on temporal features of
key-frames. Chen et al. (Chen et al., 2011)
introduced a new video copy detection method based
on the combination of video Y and U spatiotemporal
feature curves and key-frames. Tsai et al. (Tsai et al.,
2009) developed a practical CBVCD After locating
the visually similar key-frame, the
methods of Vector Quantization (VQ) and Singular
Value Decomposition (SVD) is applied to extract the
spatial features of these frames. Then, the shot
lengths are used as the temporal features for further
matching to achieve a more accurate result. In
(Chaisorn et al., 2010), Chaisorn et al. proposed
framework composed of two levels of bitmap
indexing. The first level groups videos (key-
frames) into clusters and uses them as the first
level index. The video in question need only be
matched with those clusters, rather than the entire
database. In (Kim et Nam, 2009), Kim et al.
presented a method that uses key-frames with abrupt
changes of luminance, then extracts spatio-temporal
compact feature from key-frames. Comparing with
the preregistered features stored in the video
database, this approach distinguishes whether an
uploaded video is illegally copied or not.
3 PROPOSED VIDEO COPY
DETECTION MODEL
As aforementioned above most CBVCD system
consist of three major modules: Key-frames
extraction, Extraction of fingerprint (feature vector)
and sequence matching. Fingerprint must fulfill the
diverging criteria such as discriminating capability
and robustness against various signal distortion.
Sequence matching module bears the responsibility
of devising the match strategy and verifying the test
sequence with likely originals in the database. The
architecture of our proposed CBVCD system is
shown in Figure 1.
3.1 Key-Frames Extraction Process
In this paper, Key-frames extracted from each video
shot are based on visual attention and structural
similarity. The approach produces a gradient
magnitude similarity maps from each frame. The
similarity of the maps is then measured using a
novel signal fidelity measurement, called Gradient
Magnitude Similarity Deviation (Xue et al., 2014) .
A frame will be chosen as key-frame if the value
exceeds certain threshold.
GMSD is used to estimate global variation of
gradient based local quality map for overall image
quality prediction. It is proved in (Xue et al., 2014)
AFastandRobustKey-FramesbasedVideoCopyDetectionUsingBSIF-RMI
41