algorithm while ensuring the robustness.
2 VIDEO COPY DETECTION
SYSTEM BASED ON HVS
The architecture of our approach contains two
processes: feature extraction and similarity
comparison. First keyframes are extracted, and then
visual attention model is employed to extract ROI,
followed by a Surfgram feature extraction to
represent the video content which can effectively
reduce the amount of data to process. In the video
similarity comparison, this paper uses BID+ (bit-
difference) approximate nearest neighbor search
algorithm to improve the matching speed. The
following sections will describe the preprocessing
steps in detail.
2.1 Video Feature Extraction
As the information of video data is very large, it is
critical to decide which video features should be
used to represent the video content. In order to
improve the speed of the copy detection system,
keyframes are used to represent video content.
2.1.1 Keyframe Extraction
In order to reduce data redundancy, we first extract
the keyframes and employ the method of abrupt shot
change detection (Hou, 2009). Then we extract
keyframes between the consecutive abrupt shots
uniformly, specifically, a keyframe is extracted from
about every 30 to 100 frames in our experiments.
2.1.2 the ROI Extraction
It has been found that the HVS has some certain
selectivity. This selectivity in HVS indicates the eye
movements and form the focus of attention or
ROI.VAM (Visual Attention Model) is based on the
HVS. Therefore, the visual attention model can be
used to extract ROI and reduce the amount of
information to be processed.The most classic VAM
is Itti Model proposed by Laurent Itti (Itti, 1998). Itti
model extracted from the input image with many
features, such as intensity, color, direction, forming
conspicuity maps of features. The saliency map is a
linear combination of three conspicuity maps.
Conspicuity maps are based on the center-surround
differences.For this reason, the region beyond a
certain threshold T in saliency map is regarded as
the ROI. We use T=0.6 as the threshold.After the
extraction of keyframes,ROI further reduces the
amount of data and improve the processing speed.
2.1.3 Surfgram Feature Extraction
In this paper, we propose Surfgram feature to
represent the content of ROI because the feature is
fast and robust. SURF(Speeded Up Robust Features)
is several times faster than SIFT and can be robust
against brightness changes, contrast changes,
Gaussian noise (Bay, 2008).Therefore, we extract
SURF from ROI and generate a frame-level
histogram of visual codewords based on SURF to
construct the Surfgram feature.
As there are a lot of feature points in an image,
SURF features are of large amount of data.
Therefore, Surfgram feature is adopted only to
represent the content of ROI. The extraction steps
are as follows:
ExtractingSURFfeaturesfromROI.
Using the K-means clustering approach to
partition SURF features into 200 clusters.
Computing the histogram of the features. The
number of the SURF features in class 1 to class n
is counted to obtain the histogram of SURF
features, which is the Surfgram feature.
2.2 Surfgram Features Matching and
Scoring
When comparing the similarity of Surfgrams from
the query video and reference video, if the similarity
between them is greater than a certain threshold,
then the query video is identified as a near-duplicate
of the reference video. In order to speed up feature
comparison, we use the BID+ for Approximate
Nearest Neighbor to index all features (Cui,
2005).We define the similarity measure score as
follows:
1
1
30
1
n
q
i
i
score
n
QD
ii
i
(1)
where q
is the matched query frame, n is the total
number of matched query frames, Q
is the total
number of keyframs in reference video, D
is the
distance between the query frame and matched
frame in the reference video.
3 EXPERIMENTAL RESULTS
To evaluate the performance of the proposed
AVideoCopyDetectionSystembasedonHumanVisualSystem
793