
 
 
2 RELATED WORK 
Most video copy detection algorithms based on 
global feature extract low-level feature from the 
video images to represent the video, but these 
algorithms are sensitive to various copy techniques, 
so the detection result is not satisfactory. In contrast 
to the global features, the local feature describes the 
structure and texture information of neighborhood of 
the interest point (Joly et al., 2007), having a good 
robustness generally to brightness, viewing angle, 
geometry and affine transformations. The techniques 
based on local feature are divided into five types: 
spatial methods, temporal methods, spatial-temporal 
methods, transform-domain methods and color 
methods. 
On the other hand video copy detection 
approaches can be classified into two large groups. 
The first group includes non-key-frames based 
approaches which used the whole video sequence in 
the detection process. Jiang et al. (Jiang et al., 2013) 
proposed a rotation invariant VCD approach; each 
selected frame is partitioned into certain rings. Then 
Histogram of Gradients (HOG) and RMI are 
calculated as the original features. In (Cui et al., 
2010), a fast CBCD approach based on the Slice 
Entropy Scattergraph (SES) is proposed. SES 
employs video spatio-temporal slices which can 
greatly decrease the storage and computational 
complexity. Yeh et al. (Yeh  et al., 2009) proposed a 
frame-level descriptor for Large scale VCD.  The  
descriptor  encodes  the  internal  structure  of  a  
video  frame  by computing  the  pair-wise  
correlations  between  geometrically pre-indexed 
blocks. In (Wu et al., 2009), Wu et al. introduced a 
Self-Similarity Matrix (SSM) based video copy 
detection scheme and a Visual Character-String 
(VCS) descriptor for SSM matching. Then in (Wu et 
al., 2009), the authors added a transformation 
recognition module and used a self-similarity matrix 
based near-duplicate video matching scheme. By 
detecting the type of transformations, the near-
duplicates can be treated with the ‘best’ feature 
which is decided experimentally. In (Ren et al., 
2012), Ren et al. proposed a compact video 
signature representation as time series for either 
global feature or local feature descriptors. It 
provides a fast signature matching through major 
incline-based alignment of time series. 
The Second group contains key-frames based 
techniques. Zhang et al. (Zhang et al., 2010) 
proposed a CBVCD based on temporal features of 
key-frames.  Chen et al. (Chen et al., 2011) 
introduced a new video copy detection method based 
on the combination of video Y and U spatiotemporal 
feature curves and key-frames. Tsai et al. (Tsai et al., 
2009) developed a practical CBVCD After locating 
the visually similar key-frame, the 
methods of Vector Quantization (VQ) and Singular 
Value Decomposition (SVD) is applied to extract the 
spatial features of these frames. Then, the shot 
lengths are used as the temporal features for further 
matching to achieve a more accurate result. In 
(Chaisorn et al., 2010), Chaisorn et al. proposed 
framework composed of two levels of bitmap 
indexing.  The  first  level  groups  videos  (key-
frames)  into clusters  and  uses them  as  the  first 
level index.  The  video  in  question  need  only  be  
matched  with  those  clusters, rather than the entire 
database. In (Kim et Nam, 2009), Kim et al. 
presented a method that uses key-frames with abrupt 
changes of luminance, then extracts spatio-temporal 
compact feature from key-frames. Comparing with 
the preregistered features stored in the video 
database, this approach distinguishes whether an 
uploaded video is illegally copied or not.  
3 PROPOSED VIDEO COPY 
DETECTION MODEL 
As aforementioned above most CBVCD system 
consist of three major modules: Key-frames 
extraction, Extraction of fingerprint (feature vector) 
and sequence matching. Fingerprint must fulfill the 
diverging criteria such as discriminating capability 
and robustness against various signal distortion. 
Sequence matching module bears the responsibility 
of devising the match strategy and verifying the test 
sequence with likely originals in the database. The 
architecture of our proposed CBVCD system is 
shown in Figure 1.   
3.1  Key-Frames Extraction Process 
In this paper, Key-frames extracted from each video 
shot are based on visual attention and structural 
similarity. The approach produces a gradient 
magnitude similarity maps from each frame.  The 
similarity  of  the  maps  is  then measured  using  a  
novel  signal fidelity measurement, called Gradient 
Magnitude Similarity Deviation (Xue  et al., 2014) . 
A frame will be chosen as key-frame if the value 
exceeds certain threshold.  
GMSD is used to estimate global variation of 
gradient based local quality map for overall image 
quality prediction. It is proved in (Xue et al., 2014) 
AFastandRobustKey-FramesbasedVideoCopyDetectionUsingBSIF-RMI
41