
algorithm while ensuring the robustness. 
2  VIDEO COPY DETECTION 
SYSTEM BASED ON HVS 
The architecture of our approach contains two 
processes: feature extraction and similarity 
comparison. First keyframes are extracted, and then 
visual attention model is employed to extract ROI, 
followed by a Surfgram feature extraction to 
represent the video content which can effectively 
reduce the amount of data to process. In the video 
similarity comparison, this paper uses BID+ (bit-
difference) approximate nearest neighbor search 
algorithm to improve the matching speed. The 
following sections will describe the preprocessing 
steps in detail. 
2.1  Video Feature Extraction 
As the information of video data is very large, it is 
critical to decide which video features should be 
used to represent the video content. In order to 
improve the speed of the copy detection system, 
keyframes are used to represent video content. 
2.1.1 Keyframe Extraction 
In order to reduce data redundancy, we first extract 
the keyframes and employ the method of abrupt shot 
change detection (Hou, 2009). Then we extract 
keyframes between the consecutive abrupt shots 
uniformly, specifically, a keyframe is extracted from 
about every 30 to 100 frames in our experiments. 
2.1.2  the ROI Extraction 
It has been found that the HVS has some certain 
selectivity. This selectivity in HVS indicates the eye 
movements and form the focus of attention or 
ROI.VAM (Visual Attention Model) is based on the 
HVS. Therefore, the visual attention model can be 
used to extract ROI and reduce the amount of 
information to be processed.The most classic VAM 
is Itti Model proposed by Laurent Itti (Itti, 1998). Itti 
model extracted from the input image with many 
features, such as intensity, color, direction, forming 
conspicuity maps of features. The saliency map is a 
linear combination of three conspicuity maps. 
Conspicuity maps are based on the center-surround 
differences.For this reason, the region beyond a 
certain threshold T in saliency map is regarded as 
the ROI. We use T=0.6 as the threshold.After the 
extraction of keyframes,ROI further reduces the 
amount of data and improve the processing speed. 
2.1.3  Surfgram Feature Extraction 
In this paper, we propose Surfgram feature to 
represent the content of ROI because the feature is 
fast and robust. SURF(Speeded Up Robust Features) 
is several times faster than SIFT and can be robust 
against brightness changes, contrast changes, 
Gaussian noise (Bay, 2008).Therefore, we extract 
SURF from ROI and generate a frame-level 
histogram of visual codewords based on SURF to 
construct the Surfgram feature. 
As there are a lot of feature points in an image, 
SURF features are of large amount of data. 
Therefore, Surfgram feature is adopted only to 
represent the content of ROI. The extraction steps 
are as follows: 
  ExtractingSURFfeaturesfromROI. 
  Using the K-means clustering approach to 
partition SURF features into 200 clusters. 
  Computing the histogram of the features. The 
number of the SURF features in class 1 to class n 
is counted to obtain the histogram of SURF 
features, which is the Surfgram feature. 
2.2  Surfgram Features Matching and 
Scoring 
When comparing the similarity of Surfgrams from 
the query video and reference video, if the similarity 
between them is greater than a certain threshold, 
then the query video is identified as a near-duplicate 
of the reference video. In order to speed up feature 
comparison, we use the BID+ for Approximate 
Nearest Neighbor to index all features (Cui, 
2005).We define the similarity measure score as 
follows: 
 
1
1
30
1
n
q
i
i
score
n
QD
ii
i
 
(1)
 
where  q
 is the matched query frame, n is the total 
number of matched query frames, Q
 is the total 
number of keyframs in reference video, D
 is the 
distance between the query frame and matched 
frame in the reference video. 
3 EXPERIMENTAL RESULTS 
To evaluate the performance of the proposed 
AVideoCopyDetectionSystembasedonHumanVisualSystem
793