A Fast and Robust Key-Frames based Video Copy Detection

Using BSIF-RMI

Yassine Himeur, Karima Ait-Sadi and Abdelmalik Oumamne

Centre de Développement des Technologies Avancées (CDTA), Division TELECOM, Alger, Algerie

Keywords: Video Copy Detection, Key-Frames, Gradient Magnitude Similarity Deviation, Binarized Statistical Image

Features, Relative Mean Intensity.

Abstract: Content Based Video Copy Detection (CBVCD) has gained a lot of scientific interest in recent years. One

of the biggest causes of video duplicates is transformation. This paper addresses a fast video copy detection

approach based on key-frames extraction which is robust to different transformations. In the proposed

scheme, the key-frames of videos are first extracted based on Gradient Magnitude Similarity Deviation

(GMSD). The descriptor used in the detection process is extracted using a fusion of Binarized Statistical

Image Features (BSIF) and Relative Mean Intensity (RMI). Feature vectors are then reduced by Principal

Component Analysis (PCA), which can more accelerate the detection process while keeping a good

robustness against different transformations. The proposed framework is tested on the query and reference

dataset of CBCD task of Muscle VCD 2007 and TRECVID 2009. Our results are compared with those

obtained by other works in the literature. The proposed approach shows promising performances in terms of

both robustness and time execution.

1 INTRODUCTION

CBVCD is proposed as an alternative or a

complementary to the watermarking technology. It

can detect copies without inserting any information

and without altering the multimedia content (Lian et

al., 2010). Unlike digital watermarking, Content

based copy detection (CBCD) relies only on a

similarity comparison of content between the

original video and its various possible copies.

This technology is based on the fact that a media

visually contains enough information for detecting

copies. Therefore, the problem of CBCD is

considered as video similarity detection by using the

visual similarities of video clips.

Detection of a video copy in large video database

is not an easy task because of the size of video data.

By reducing the size of data that represents each

video in the database, the video database

manipulations such as indexing, copy detection and

fingerprinting are accelerated. In fact, not all frames

from a video sequence are equally important. A few

informative frames that characterize the action for

recognition are required. The reasons are; some

video frames are irrelevant to the underlying

activity, e.g. the frames with no action in them. They

could be nuisance for the recognition. Also, the

recognition speed can be greatly improved by using

the informative key-frames without losing important

information.to enable efficient representation and

detection of digital video, many key-frames

extraction techniques have been developed (Sujatha

and Mudenagudi, 2011).

In this work, query and dataset video are

systematically and efficiently reduced via a frame

selection procedure which use GMSD (Xue et al.,

2014) to detect key-frames in a video stream.

Further refinement in the frame selection step is

achieved using a robust feature representation based

upon BSIF and the RMI of the selected subset of

decoded frames. The procedure is presented in detail

in the following sections.

The paper is organized as follows. We first

describe related work in this field. Section 3 presents

the main contribution of the paper including the key-

frames extraction process and the feature extraction

descriptor based on BSIF and RMI. We then present

a fast video copy detection framework. In Section 4,

we provide the effectiveness of the proposed

approach based on the experimental evaluation and

the comparison to other works. Finally, discussion

and concluding remarks are given in Section 5.

Himeur Y., Ait-Sadi K. and Oumamne A..

A Fast and Robust Key-Frames based Video Copy Detection Using BSIF-RMI.

DOI: 10.5220/0005060000400047

In Proceedings of the 11th International Conference on Signal Processing and Multimedia Applications (SIGMAP-2014), pages 40-47

ISBN: 978-989-758-046-8

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

2 RELATED WORK

Most video copy detection algorithms based on

global feature extract low-level feature from the

video images to represent the video, but these

algorithms are sensitive to various copy techniques,

so the detection result is not satisfactory. In contrast

to the global features, the local feature describes the

structure and texture information of neighborhood of

the interest point (Joly et al., 2007), having a good

robustness generally to brightness, viewing angle,

geometry and affine transformations. The techniques

based on local feature are divided into five types:

spatial methods, temporal methods, spatial-temporal

methods, transform-domain methods and color

methods.

On the other hand video copy detection

approaches can be classified into two large groups.

The first group includes non-key-frames based

approaches which used the whole video sequence in

the detection process. Jiang et al. (Jiang et al., 2013)

proposed a rotation invariant VCD approach; each

selected frame is partitioned into certain rings. Then

Histogram of Gradients (HOG) and RMI are

calculated as the original features. In (Cui et al.,

2010), a fast CBCD approach based on the Slice

Entropy Scattergraph (SES) is proposed. SES

employs video spatio-temporal slices which can

greatly decrease the storage and computational

complexity. Yeh et al. (Yeh et al., 2009) proposed a

frame-level descriptor for Large scale VCD. The

descriptor encodes the internal structure of a

video frame by computing the pair-wise

correlations between geometrically pre-indexed

blocks. In (Wu et al., 2009), Wu et al. introduced a

Self-Similarity Matrix (SSM) based video copy

detection scheme and a Visual Character-String

(VCS) descriptor for SSM matching. Then in (Wu et

al., 2009), the authors added a transformation

recognition module and used a self-similarity matrix

based near-duplicate video matching scheme. By

detecting the type of transformations, the near-

duplicates can be treated with the ‘best’ feature

which is decided experimentally. In (Ren et al.,

2012), Ren et al. proposed a compact video

signature representation as time series for either

global feature or local feature descriptors. It

provides a fast signature matching through major

incline-based alignment of time series.

The Second group contains key-frames based

techniques. Zhang et al. (Zhang et al., 2010)

proposed a CBVCD based on temporal features of

key-frames. Chen et al. (Chen et al., 2011)

introduced a new video copy detection method based

on the combination of video Y and U spatiotemporal

feature curves and key-frames. Tsai et al. (Tsai et al.,

2009) developed a practical CBVCD After locating

the visually similar key-frame, the

methods of Vector Quantization (VQ) and Singular

Value Decomposition (SVD) is applied to extract the

spatial features of these frames. Then, the shot

lengths are used as the temporal features for further

matching to achieve a more accurate result. In

(Chaisorn et al., 2010), Chaisorn et al. proposed

framework composed of two levels of bitmap

indexing. The first level groups videos (key-

frames) into clusters and uses them as the first

level index. The video in question need only be

matched with those clusters, rather than the entire

database. In (Kim et Nam, 2009), Kim et al.

presented a method that uses key-frames with abrupt

changes of luminance, then extracts spatio-temporal

compact feature from key-frames. Comparing with

the preregistered features stored in the video

database, this approach distinguishes whether an

uploaded video is illegally copied or not.

3 PROPOSED VIDEO COPY

DETECTION MODEL

As aforementioned above most CBVCD system

consist of three major modules: Key-frames

extraction, Extraction of fingerprint (feature vector)

and sequence matching. Fingerprint must fulfill the

diverging criteria such as discriminating capability

and robustness against various signal distortion.

Sequence matching module bears the responsibility

of devising the match strategy and verifying the test

sequence with likely originals in the database. The

architecture of our proposed CBVCD system is

shown in Figure 1.

3.1 Key-Frames Extraction Process

In this paper, Key-frames extracted from each video

shot are based on visual attention and structural

similarity. The approach produces a gradient

magnitude similarity maps from each frame. The

similarity of the maps is then measured using a

novel signal fidelity measurement, called Gradient

Magnitude Similarity Deviation (Xue et al., 2014) .

A frame will be chosen as key-frame if the value

exceeds certain threshold.

GMSD is used to estimate global variation of

gradient based local quality map for overall image

quality prediction. It is proved in (Xue et al., 2014)

AFastandRobustKey-FramesbasedVideoCopyDetectionUsingBSIF-RMI

that the pixel-wise gradient magnitude similarity

(GMS) between the reference and distorted images

combined with a pooling strategy the standard

deviation of the GMS map can predict accurately

perceptual image quality and measure efficiently the

distortion between original and distorted images.

The principle consist of convolving an image

with a linear filter such as the classic Roberts, Sobel,

Scharr and Prewitt filters and some task-specific

ones. For simplicity of computation, the Prewitt

filter is used to calculate the gradient among the 3×3

template gradient filters. Prewitt filters along

horizontal () and vertical () directions are defined

as (Xue et al., 2014):













0







,













000









(1)

Convolving 



and 



with the reference image

 and distorted images  yields the horizontal and

vertical gradient images. The gradient magnitude

images of  and  at location i, denoted by











and









 are computed by small local path

in the original image  or  as follows:















⊗













⊗









(2)















⊗













⊗













(3)

where the symbol “⊗” denotes the convolution

operation., The gradient magnitude similarity (GMS)

map is computed as follows (Xue et al., 2014):











2















































(4)

where  is a positive constant that supplies

numerical stability. By applying average pooling to

the GMS map, Gradient Magnitude Similarity Mean

(GMSM) is obtained:











(5)

where  is the total number of pixels in the image.

A higher  score means a higher overall image

quality. The standard deviations of the GMS map is

computed, it is called Gradient Magnitude Similarity

Deviation (GMSD):























(6)

Note that the value of GMSD reflects the range of

distortion severities in an image. The higher the

GMSD score, the larger the distortion range, and

thus bigger the difference between two consecutive

frames.

The proposed key-frames extraction is based on

measuring the distortion between two consecutive

frames for the whole video sequence to detect key-

frames with significant change of the visual content.

After calculating the GMSD difference between all

the video frames sequence, a vector is obtained and

each value of the vector is compared to a threshold.

Only the fames with a distortion  that exceed the

threshold value are considered as key-frames.

Figure 1: The architecture of the proposed CBVCD system.

Reference

database

Decision

Features

Matching

ReferenceVideo1

ReferenceVideo2

Referencevideo3

Key-frames

Extraction

based on

GMSD

Feature

extraction based

on fusion of BSIF

and RMI

PCA dimension

reduction

Key-frames

Extraction based on

GMSD

Feature

extraction based on

fusion of BSIF and RMI

PCA dimension

reduction

OFFLINE PROCESS

ONLINE PROCESS

Query

video

SIGMAP2014-InternationalConferenceonSignalProcessingandMultimediaApplications

















(7)

The threshold used in key-frames extraction

process is computed using the following equation:



max



min







(8)

where 01, max



, min



are the

maximum and minimum values obtained when

computing the GMSD difference between

consecutive video frames, respectively.

(a)

(b)

Figure 3: Examples of frames border removal. (a) original

frames, (b) frames after border removal (samples frames

from TRECVID 2009dataset) (Liu et al., 2009).

Remark 1: The estimation of the key-frames is a

delicate task; it must be carefully selected to

minimize the video size and to be robust against

different attacks. If key-frames extraction procedure

is not robust against attacks, most of the key-frames

will changed after applying an attack resulting in

poor matching features results.

To overcome this inconvenient, we must

introduce some preprocessing on the transformed

video frames, such as border removing. As shown in

Figure 3 borders are removed by a simple method,

which removes the first few lines of each direction

(left, right, top, bottom) whose sum of intensity is

less than a threshold (20% of the maximum in this

paper).

3.2 Binarized Statistical Image

Features

BSIF was recently proposed by Kannla and Rahtu

for face recognition and texture classification

(Kannala and Rahtu, 2012). It efficiently encodes

texture information and is suitable for histogram

based representation of image regions. To

characterize the texture properties within each frame

sub-region in a video sequence, the histograms of

pixels BSIF code values are then used. A bit string

is determined from a desired number of filters where

each bit is associated with a different filter.

The set of filters is learnt from a training set of

natural image patches by maximizing the statistical

independence of the filter responses. Hence,

statistical properties of natural image patches

determine the descriptors. Given an image patch 

of size  pixels and a linear filter 



of the same

size, the filter response is obtained by:











,







,



w





,

(9)

where  and x are vectors which represent the

pixels of 



and . By setting 



1 if 



0 and





0 otherwise, the binarized feature 



obtained. For  linear filters 



, a matrix W of size

  



is stacked and all responses at once, i.e.,

s.x are computed. A bit string  is obtained

by binarizing each element 



of  as above. Thus,

given the linear feature detectors



, computation of

the bit string  is straightforward for more details;

the reader can refer to (Kannala and Rahtu, 2012).

Independent component analysis are then used to

learn the filters by maximizing the statistical

independence of 



in order to obtain a set of filters





. Additionally, the independence of 



provides

justification for the independent quantization of the

elements of the response vector. More details about

the training set of image patches and how to obtain

the filter matrix can be found in (Kannala and Rahtu,

2012).

3.3 Video Copy Detection

3.3.1 Feature Extraction Process

In this work, a new descriptor is introduced based on

computing the BSIF characteristics of different

images and their combination with the mean relative

intensity of each region. It is extracted by encoding

the pixel information of each key-frame.

For a given video, we segment it into  key-

frames, which are the basic processing units in our

approach. Each frame 



in the key-frame selection

is first converted to grayscale and resized to

128x128 pixels. A BSIF representation is then

obtained after texture information encoding. We

segmented the BSIF representation into 



blocks to

construct a BSIF histogram. Next, for the th block

AFastandRobustKey-FramesbasedVideoCopyDetectionUsingBSIF-RMI

of a frame, the relative mean intensity (RMI) is

calculated as follows:









  ,/,

∈



∈

(10)

where , represents the intensity of point

,. Figure 4 shows an example for 



4. The

connection between two nodes is determined by the

content proximity between two blocks. We observe

that two copies may not share common visual

properties such as colors, textures, and edges;

however, they often maintain a similar inter-block

relationship.

As defined in Eq. (10), RMI is a global feature of

each block. It represents the intra-block information

and can help maintain a similar inter-block

relationship between the query video and the

reference. Besides, it is not sensitive to some

complex brightness changes. Unlike previously

reported work, which have focused on

intensity/color variations only, the proposed

algorithm adopt a combination of BSIF and RMI to

describe the local distribution of each frame 



BSIF is a new algorithm to face recognition used

to provide local image features (Kannala and Rahtu,

2012). For each point of a block , BSIF are

calculated and BSIF histograms  of each block are

constructed (Figure 4). As can be seen, if the query

is flipped from the reference, the BSIF

representation is opposite. To avoid this change,

instead of directly using the BSIF representation, we

divide their absolute values into certain number of

bins. With the increasing of bins number 



, the

discriminative power of BSIF increases. However,

the computation complexity also rises, and it will

enlarge the influence of noise. To improve the

discriminability, we combined BSIF and RMI

features into the video description.

The features combination is used as weight; it is

performed by combining the BSIF histograms to the

corresponding RMI within a block as follows:















 







,



,

 

,



,

 

,



,

⋮ ⋮ ⋮



,



,

⋯ 

,

















⋮







(11)

The final descriptor D encloses the intensity and

local texture of each frame. From the calculating

process, we find that  is overall a local descriptor

with the length of 



. It encodes the inner

relationship of a frame and the local changes of

intensities.

For a number 



of key-frames extracted from

a query video, Key-frames descriptors are

concatenated to form a matrix descriptor of size





∗



which represents the matrix feature of a

query video. The size of this matrix is then reduced

using PCA to keep only a vector of size 4



. This

means that each query video will be finally

represented by a feature vector of 4



variables.

Remark 2:

As copies may have different sizes

with the original source. Here we employ a linear

interpolation process to resize the query frames to

the same size with its reference. This process is

necessary because different sizes may cause

different forms of the descriptors. Note also that

each frame is first converted to grayscale before

resizing.

3.3.2 Matching Process

To match two descriptors 



and 



, we choose

Cosine distance as the similarity metric. In the

matching process, a minimization process is

employed. For an input query video, we find the

clips with the minimal distance (maximal similarity)

between descriptors in each source video. Then, we

select the one with the lowest distance in the source.

This distance is computed by the following equation

cos













.



‖





‖‖





‖







∑



















∑



















∑















(12)

4 EXPERIMENTAL RESULTS

4.1 Key-Frames Extraction Evaluation

A video summary should not contain too many key-

frames since the aim of the summarization process is

to allow users to quickly grasp the content of a video

sequence. For this reason, we have also evaluated

the compactness of the summary (compression

ratio). The compression ratio is computed by

dividing the number of key- frames in the summary

by the length of video sequence.

SIGMAP2014-InternationalConferenceonSignalProcessingandMultimediaApplications

Figure 4: Illustrations of the developed descriptor with 



4, 



256. (a) original frame, b) BSIF representation with

4 grids , (c) BSIF histogram of each block; (d) RMI histogram, (e) fusion of BSIF and RMI descriptor.

For a given video sequence S

, the compression rate

is thus defined as:

















1









(13)

where 



is the number of key-frames in the

summary, and 



is the total number of frames in

the video sequence. Ideally, a good summary

produced by a key-frame extraction algorithm will

present both high quality measure and a high

compression ratio (i.e. small number of key-frames).

Using the developed key-frames extraction

algorithm we obtained an average compression ratio

99.6 %. From the experiment, we can see that

the representative key-frames can be extracted

accurately and semantically from long video

sequences or videos, objectively. Figure 5 shows the

results of key-frames extraction with the proposed

algorithm. The video depicts the process where

there is an abrupt change in the video sequence.

The result illustrates that the algorithm is valid to

segment the shot and extract the key-frames and it is

of good feasibility and strong robustness.

Figure 5: Example of key-frames extracted from query

video in TRECVID’09 database (TRECVID 2009).

In order to evaluate the performance of the key-

frames extraction algorithm we first used

TRECVID’09 (TRECVID 2009

)

definitions of

detection recall and precision ( and ) as shown

in Equations (14) and (18). Using the proposed

algorithm we obtained an average recall of 97.7%

and average precision of 100%.

We are currently building a larger ground truth

data set for a more thorough evaluation of our

algorithm.











(14)

where 



 represents the number of frames

shared between detected and reference transitions

and 



represents the number of frames of

reference transitions











(15)

where 



represents the number of frames of

detection transition.

4.2 Performance Evaluation Using

MUSCLE VCD

In first time, we evaluate our approach on ten

transformations. Experiments are conducted using

the CIVR’07 Copy Detection Corpus (MUSCLE

VCD) (

Law-To et al., 2007)

. The corpus used is based

on a task which consists to retrieve copies of whole

long videos (ST1). The videos used in this corpus

are size of 288352 and come from web, TV

archives and movies, and cover documentaries,

movies, sport events, TV shows and cartoons.

Meanwhile, there are 15 queries for with different

1 2 3 4

0.02

0.04

0.06

0 0.05 0. 1 0.15 0.2 0.25

0 0.05 0.1 0.15 0.2 0.25

0 0.05 0. 1 0. 15 0. 2 0.25

0 0.05 0.1 0.15 0.2

0 0.01 0.02 0.03 0.04

(

)

(

)

(

)

(

)

AFastandRobustKey-FramesbasedVideoCopyDetectionUsingBSIF-RMI

transformations like change of colors, blur,

recording with an angle and inserting logos.

According to the evaluation plot (Law-To et al.,

2007), criterion of the video copy detection scheme

is defined as:

Qualit



NCorrect



NTotal



(16)

In our simulation, we set the BSIF parameter 



to be 4/9/12/16 and 



to be 256 bins. The threshold

 is computed using 0.75. Table 1 lists the

matching qualities with the corresponding values.

We can observe that the best results can be obtained

by using a number of grids



16. Table1 and 2

show the result obtained using CIVR’07 corpus with

the comparison to some existing approaches in the

literature in term of time execution time and

robustness to different attacks respectively.

According to the table 3, the proposed approach runs

faster than previous works.

Remark 3: As 



gets larger, the descriptor will

possess more discriminate power. However, while

the dimension increases, the descriptor is more

sensitive to the border removal technique which may

lower down the overall performance.

In the other side,

Theoretically, with more BSIF bins 



, more information

can be captured by the descriptor. However, for the

existence of noise, the performance will be degraded if

there are too many bins.

4.3 Performance Evaluation Using

TRECVID 2009 Dataset

The TRECVID 2009 dataset for CBCD (Liu et al.,

2009) is also used to evaluate our approach.

Table 1: Matching qualities with different blocks (n



.





Quality (%)

4 93

9 97

12 98.66

16 100

We prepare a corpus of 50 untransformed query

videos and 350 transformed queries by applying

seven classes of transformations to each query

video. The different types of transformations are

listed in Table 3. The queries last from 5 seconds to

2 minutes long.

Table 2: Transformation recognition qualities obtained

with Muscle VCD 2007.

Transformation Wu et

al.,

2009

Jiang et

al.,

2013

Proposed

AVC 89 93

95.33

Blur 88 100

100

Caption 95 100

100

Contrast 100 100

100

Crop 90 87

Mono 100 100

100

Noise 95 100

100

PicInPic 90 93

96.66

Ratio 100 100

100

Reduction 100 100

100

We use transformed queries to retrieve

untransformed query videos from the derived

dataset. The TRECVID 2009 dataset is challenging

because the query videos are much shorter (i.e., 81

seconds long on average) and were produced by

complicated transformations, such as, picture-in-

picture and combination of various transformations.

5 CONCLUSION

In this paper, we propose fast and robust Key-frames

for CBCD combining BSIF-RMI. RMI

characterizes a global intensity level while BSIF

represents the local distribution of the frame. The

experiments obtained by the proposed approach

show that the descriptor is effective and efficient.

The matching time has been reduced efficiently by

using only the key-frames and by reducing the

dimension of feature vectors using PCA. Promising

results are obtained for TRCVID 2009 and Muscle

VCD 2007 databases in comparison to other works

in the literature.

Table 3: Executive time of different approaches (in

seconds).

Transformation Time (sec)

Yeh and Cheng, 2009 1,394

Cui et al., 2010 849

Jiang et al., 2013 69

Proposed 32

SIGMAP2014-InternationalConferenceonSignalProcessingandMultimediaApplications

Table 4: List of TRECVID’09 transformations and

comparison of obtained recognition quality (TRECVID

2009).

T# Transformation

Description

Ren et al.

2012

Propose

T2 Picture in picture 93

96.66

T3 Insertion of pattern. 100

100

T4 Strong re-encoding. 100

100

T5 Change of gamma 100

100

T6 Three random transf.:

blur, gamma

change, frame dropping,

contrast,

compression, ratio and

noise

T8 Three random transf.:



crop,

shift, contrast, insert of

pattern,

vertical flip, picture in

picture



100

Combinations of 5

transformations

chosen from T2 - T8

100

As a perspective, we will focus our work on the

use of all the video sequence frames in the feature

extraction task and how optimizing the execution

time to deal with real time application.

REFERENCES

Chaisorn L., Sainui J. and Mander C., 2010. A Bitmap

Indexing approach for Video Signature and Copy

Detection. In The 5Th IEEE Conf. on Industrial

Electronics and Applications (ICIEA).

Chen X., Jia K. and Deng Z., 2011. An Effective Video

Copy Detection Method. In International Conference

on Consumer Electronics, Communications and

Networks (CECNet).

Cui P., Zhipeng W., Jiang S., Huang Q., 2010. Fast Copy

Detection Based on Slice Entropy Scattergraph, In

IEEE Int. Conf. on Multimedia and Expo (ICME).

Jiang S., Su L. and Huang Q., 2013. Cui P., and Wu Z., A

Rotation Invariant Descriptor for Robust Video Copy

Detection. In The Era of Interactive Media, pp 557-

567.

Joly. A, Buisson O, and Frelicot. C, 2007. Content-based

copy detection using distortion-based probabilistic

similarity search. In IEEE Trans. on Multimedia.

Kannala J. and Rahtu E., 2012. BSIF: Binarized Statistical

Image Features. In 21st Int. Conf. on Pattern

Recognition (ICPR).

Kim J. and Nam J. H., 2009. Content-based video copy

detecion using spatio-temporal compact feature', In

11th Int. Conf. on Advanced Communication

Technology, ICACT.

Law-To J., Joly A., and Boujemaa N., 2007. Muscle-

VCD-2007: a live benchmark for video copy

detection, 2007. http://wwwrocq.inria.fr/imedia/civr-

bench/.

Lian. S, Nikolaidis. N. and Sencar. H. T, 2010. Content-

Based Video Copy Detection A Survey. In Studies in

Computational Intelligence, vol 282, pp. 253-273,

Springer.

TRECVID 2009, http://www-nlpir.nist.gov/project

/tv2009 /tv2009.html

Ren J, Chang F. and Wood T., 2012. Efficient Video Copy

Detection via Aligning Video Signature Time Series.

In Proceedings of the 2nd ACM International

Conference on Multimedia Retrieval, No. 14.

Roopalakshmi R. and Ram M. R. G., 2011. A Novel

Approach to Video Copy Detection Using Audio

Fingerprints and PCA. In Procedia Computer Science,

vol 5, 2011, pp. 149–156.

Sujatha. C. and Mudenagudi. U., 2011. A Study on

Keyframe Extraction Methods for Video Summary', In

International Conference on Computational

intelligence and Communication Systems, 2011.

Tsai C. C., Wu C. S., Wu C. Y. and Su P. C., 2009.

Towards Efficient Copy Detection For digital Videos

By Using Spatial and temporal Features. In fifth Inter.

Conf. on intelligent Information Hiding and

multimedia signal Processing (IIH-MSP).

Wu Z. P., Huang Q. M. and Jiang S. O., 2009. Robust

copy Detection by Mining Temporal self-Similarities.

In IEEE Int. Conf. on Multimedia and Exp.

Wu Z., Jiang S. and Huang Q., 2009. Near-Duplicate

Video Matching with Transformation Recognition. In

Proc. Of the 17th ACM Int. Conf. on Multimedia,

Pages 549-552.

Xue W., Zhang L., Mou X. and Bovik A. C., 2014.

Gradient Magnitude Similarity Deviation: A Highly

Efficient Perceptual Image Quality Index. In IEEE

Trans. on Image Proc., vol 23(2), pp. 684-69.

Yeh M. C., Cheng K. T., 2009. A compact effective

descriptor for video copy detection. In

Proceedings of

the 17th ACM international conference on

Multimedia.

Yeh. M. C. and Cheng K. T., 2009. Video copy detection

by fast sequence matching. In Proc. Of ACM Int. Conf.

on Multimedia, pp. 633–636.

Zhang Z., Zhang R. and Cao C., 2010. Video Copy

Detection Based on Temporal Features of Key

Frames', In Int. Conf. on Art. Intelligence and

Education (ICAIE).

AFastandRobustKey-FramesbasedVideoCopyDetectionUsingBSIF-RMI