Privacy-enhanced Perceptual Hashing of Audio Data

Heiko Knospe

Institute of Communications Engineering, Cologne University of Applied Sciences, 50679 Cologne, Germany

Keywords:

Perceptual Hashing, Audio Hashing, Audio Fingerprinting, Acoustic Fingerprint, Privacy, Security.

Abstract:

Audio hashes are compact and robust representations of audio data and allow the efﬁcient identiﬁcation of

speciﬁc recordings and their transformations. Audio hashing for music identiﬁcation is well established and

similar algorithms can also be used for speech data. A possible application is the identiﬁcation of replayed

telephone spam. This contribution investigates the security and privacy issues of perceptual hashes and follows

an information-theoretic approach. The entropy of the hash should be large enough to prevent the exposure of

audio content. We propose a privacy-enhanced randomized audio hash and analyze its entropy as well as its

robustness and discrimination power over a large number of hashes.

1 INTRODUCTION

The increasing amount of multimedia data has led to

a growing interest in fast and reliable identiﬁcation

techniques. Multimedia content can have various rep-

resentations and is subject to transformations which

preserve the perceptual content, but signiﬁcantly al-

ter the underlying data. It is obvious that crypto-

graphic hash functions can not preserve similar con-

tent because of the avalanche effect of these func-

tions. They are hence of limited use for the identi-

ﬁcation of multimedia data. Instead, robust percep-

tual hashes are required which are locality-sensitive

(Slaney and Casey, 2008) and map similar input data

to similar hashes. The hashes are usually represented

by a sequence of binary vectors. The size of the orig-

inal data is substantially reduced and similarity can

be measured in the hash domain. Different copies (in-

cluding their lossy representations) of the same multi-

media document can then be identiﬁed by comparing

their hashes. We note that content recognition (for ex-

ample speech recognition and semantical correspon-

dence) is not intended here and different recordings

with identical or similar content should give different

perceptual hashes.

The problem of audio identiﬁcation can be consid-

ered as largely solved (Kurth and M¨uller, 2008) with

commercial solutions available for large music collec-

tions (Wang and Smith III, 2008). But optimizations

of the ﬁngerprint are still sensible (Grutzek et al.,

2012), e.g. for speech recordings, for very large repos-

itories, fast searching, good robustness and a very low

rate of false identiﬁcations.

Further aspects concern the security and privacy

of the perceptual hash. Here, security refers in partic-

ular to content integrity and multimedia authentica-

tion. A key-dependent perceptual hash can authenti-

cate the multimedia data: an adversary should not be

able to produce perceptually different data with the

same hash value. Different proposals for secure per-

ceptual hashes exist and we refer to Section 2.2 for

more details.

Privacy requirements for multimedia hashes have

been examined less so far. Privacy is relevant for

personal multimedia data, which is processed by dis-

tributed systems, for example telephone calls. Percep-

tual hashing can be used to identify similar copies,

e.g. replayed spam calls. The main privacy concern

thereby is that the hash may reveal information about

the original content. Since the hash computation in-

volves several reduction steps and the hash size is

usually very small compared to the original data, it

is generally impossible to reconstruct the complete

multimedia content. But even a restricted information

leakage, e.g. single words or characteristic properties

of a speaker, would be critical. Ideally, an adversary

should not be able to distinguish the hash from a ran-

dom sequence.

In this paper, we present a privacy-enhanced per-

ceptual hash for audio data. We are particularly inter-

ested in speech data where privacy is much more im-

portant than for music. The construction of the hash

is based on the well-known work of (Haitsma and

Kalker, 2002) and our contribution (Grutzek et al.,

549

Knospe H..

Privacy-enhanced Perceptual Hashing of Audio Data.

DOI: 10.5220/0004532605490554

In Proceedings of the 10th International Conference on Security and Cryptography (SECRYPT-2013), pages 549-554

ISBN: 978-989-8565-73-0

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

2012). The hash consists of a set of subhashes which

are derived from spectral audio features and subse-

quently randomized by a cryptographic hash-based

message authentication code (HMAC). We examine

the capabilities of the hash with respect to different

requirements including their robustness, discrimina-

tion performance and privacy properties.

This work is organized as follows: we review per-

ceptual hashes and in particular the existing work on

secure audio hashes in Section 2. The following Sec-

tion 3 contains the privacy requirements for multime-

dia identiﬁcation applications. Then we introduce a

privacy-enhanced perceptual audio hash. Section 4

shows the performance of this hash and the conclu-

sion is provided in Section 5.

2 RELATED WORK

2.1 Audio Fingerprinting Frameworks

Acoustic ﬁngerprints, which are also called audio

ﬁngerprints or audio hashes, have been studied for

some time (Cremer et al., 2001), (Clausen and Kurth,

2004), (Haitsma and Kalker, 2002), (Wang, 2003).

There exists a number of different algorithms but usu-

ally the ﬁngerprint is based on time-frequency fea-

tures of the waveform. In a general framework, the

ﬁngerprint is computed in a number of steps (Cano

et al., 2002): audio preprocessing, normalization,

framing with overlap, spectral transformationand fea-

ture extraction, quantization and ﬁngerprint model-

ing.

The main differences of the algorithms are due to

the combination of spectral information (Doets and

Lagendijk, 2008). The resulting ﬁngerprint is usu-

ally a sequence of vectors (subhashes) with one vec-

tor for each time frame. Adjacent frames often have

identical or similar subhashes and redundancies can

be reduced by ﬁngerprint modeling. For an efﬁcient

search of a given ﬁngerprint against a large reposi-

tory of hashes, the comparison of individual ﬁnger-

prints and the computation of their distances have to

be avoided. Index-based search algorithms (Kurth

and M¨uller, 2008) are computationally less expensive.

In the following, we consider audio ﬁngerprints

which preserve the time information, e.g. (Haitsma

and Kalker, 2002) or (Wang, 2003). For each au-

dio sample A and time window (frame) t, a subhash

h(A,t) ∈ V is computed, where V is a vector space

(e.g. V = {0,1}

) equipped with a distance function

(metric) d : V × V → R

≥0

, e.g. the Hamming dis-

tance. The complete audio ﬁngerprint is a collection

of temporal positions and their associated subhashes:

h(A) = {(t

),(t

),...,(t

)}. Two ﬁnger-

prints are equivalent if they differ only by a global

time shift. There are different possibilities to extend

the distance d from the vector space of subhashes to

equivalence classes of ﬁngerprints. For example, d

can be deﬁned as reciprocal to the maximum number

of matches. Two subsets {(t

),(t

),... , (t

)}

and {(t

′

),(t

′

),...,(t

′

)} with k elements

match if the temporal positions t

,...,t

coincide (af-

ter a possible global time shift) and d(v

′

) ≤ δ (for

example δ = 0) for all j = 1,... , k.

Standard requirements are given in (Wang, 2003),

(Cano et al., 2002), (Doets and Lagendijk, 2008):

1. Robustness: perceptually similar audio samples

A ∼ A

′

have hash vectors with a small distance

d(h(A),h(A

′

)) ≤ ε, where ε ≥ 0 is a threshold

which controls the robustness of the algorithm.

2. Discrimination: perceptually different audio

samples A and A

′

yield a large distance

d(h(A),h(A

′

)) > ε. The ﬁngerprint must be sufﬁ-

ciently entropic to allow sufﬁcient distinction and

to prevent spurious matches.

3. Localization property and translation invariance:

similar audio excerpts (e.g. only a few seconds

long) can be identiﬁed independent of their abso-

lute temporal position.

There exist various ﬁngerprinting systems with

the desired properties; robustness and discrimination

are satisﬁed statistically (with sufﬁciently low error

rate) for randomly chosen audio data. Important ex-

amples are the ﬁngerprints deﬁned in (Haitsma and

Kalker, 2002) and (Wang and Smith III, 2008).

2.2 Secure Audio Fingerprinting

The presence of adversaries who deliberately manip-

ulate the audio data or the hash gives rise to further

requirements, compare (Thiemert et al., 2009):

1. Secure Robustness: it is hard to create per-

ceptually similar audio data A and A

′

with

d(h(A),h(A

′

)) > ε.

2. Second Preimage Resistance: for a given au-

dio sample A and hash value h(A), it is hard

to ﬁnd perceptually different audio data A

′

with

d(h(A),h(A

′

)) ≤ ε.

3. Collision Resistance: it is hard to create any per-

ceptually different audio documents A and A

′

with

d(h(A),h(A

′

)) ≤ ε.

The ﬁrst requirement prevents adversaries from

generating speciﬁcally manipulated versions of the

audio content which can not be identiﬁed, e.g. for

SECRYPT2013-InternationalConferenceonSecurityandCryptography

550

tack against the secure robustness of the (Haitsma and

Kalker, 2002) ﬁngerprint is given in (Thiemert et al.,

2009). The quantization properties of the algorithm

are used to ﬂip a number of weak hash bits without

perceptually changing the audio data.

The second and the third requirement prevent that

forged audio content is accepted as authentic. This

is also relevant in connection with watermarking of

audio ﬁles where a robust hash is used to protect the

content integrity.

Since the relation between time-frequency ampli-

tudes and output hash bits is well localized and per-

mits the computation of preimages, the desired prop-

erties can hardly be achieved without additional ran-

domization. Diffusion operations similar to crypto-

graphic hashes would destroy the required robustness.

It is well known that already the feature extraction al-

gorithm should be key-dependent (Fridrich and Gol-

jan, 2000), (Swaminathan et al., 2006). Indeed, colli-

sions and forged hashes would persist if the random-

ization would be applied after the feature extraction.

It was observed by (Swaminathan et al., 2006) that

there is a trade-off between security and robustness.

They analyzed several image hash functions and used

the conditional entropy of the hash values for a given

image and an unknown key. The entropy was surpris-

ingly low with values between 6 and 16 bits.

Furthermore, an adversary could try to reveal the

key from the given hashes. (Koval et al., 2008) and

(Koval et al., 2009) analyzed the security of algo-

rithms based on block random projections (Fridrich

and Goljan, 2000) and used the conditional entropy

of the key for a given media ﬁle and hash value. They

discovered that information on the key is leaked, but

the amount of information decreases with the input

block size for the subhash computation.

(Weng and Preneel, 2011) proposed a secure im-

age hash which provides block level protection and

avoids collisions for malicious minor modiﬁcations.

Their hash shows good robustness and discrimination

properties but they did not analyze the security of the

key.

(Zmudzinski and Steinebach, 2009) deﬁned a so-

called rMAC for audio data based on the (Haitsma and

Kalker, 2002) ﬁngerprint. The rMAC can be embed-

ded as a watermark in the audio data. In their ex-

periments, a 128-bit rMAC for audio samples of 7s

length showed sufﬁcient robustness and discrimina-

tion power. Possible open issues are the shift invari-

ance, the entropy of the ﬁngerprint and information

leakage on the key.

In summary, there has been some work on secure

robust hashing, but there exist relevant open issues on

the security of variousproposed algorithms. There are

also indications that the required robustness impedes

a high level of resistance against attacks.

3 PRIVACY-ENHANCED HASHES

3.1 Privacy Requirements

The use of ﬁngerprinting techniques for multimedia

identiﬁcation can raise privacy concerns if personal

information is processed. One of the main questions

is whether the ﬁngerprint leaks information on the

original content. It is well known that the properties

of cryptographic hash functions prevent any informa-

tion gain other than the identiﬁcation of exact copies.

But robust hashes may leak partial information about

the original data. For example, audio hashes usu-

ally contain quantized time-frequency features of the

waveform. The compactness of most ﬁngerprints pre-

vents a complete reconstruction but it seems feasi-

ble to reﬁne the probability distribution of the pos-

sible content and therefore gain partial information.

A systematic analysis of the equivocation of ﬁnger-

prints with respect to the multimedia data is still ow-

ing. In this situation, telecommunication privacy laws

in many countries would not permit the use of ﬁnger-

prints for telephone data.

We have the following information-theoretic re-

quirements:

1. The entropy H(h(A)) of all hashes should be large

enough to protect against frequency analysis and

dictionary attacks. More speciﬁcally, the entropy

of the subhashes shall be high enough to prevent

the exposure of local audio content.

2. The conditional entropy H(A |h(A)) of audio data

for a given audio hash h(A) shall be large enough

to protect against information leakage. Further-

more, the conditional entropy of local audio data

for a given subhash shall be high enough to pre-

vent the exposure of local audio information.

Ideally, it should not be possible to distinguish the

hash from random data but this can hardly be achieved

with the current algorithms. In particular, the robust-

ness and the shift invariance requires a large overlap

of the audio frames. Hence the subhashes change only

slowly with time and adjacent subhashes are clearly

correlated.

Even a high entropy of the hash would not pre-

vent a partial exposure of audio content if the relation

between the input audio data and the output hash is

easily traceable. It is well known that robust hashes

require a secret key which obscures this relation. We

Privacy-enhancedPerceptualHashingofAudioData

551

remarked above that the current methods, which are

based on a randomization of the feature extraction

process, may leak information on the key. We there-

fore propose to randomize the subhashes by applying

a hash-based message authentication code (HMAC)

which can be used as a pseudorandom function prf

see (Bellare et al., 1996), but also (Bellare, 2006).

This has several advantages compared to the random-

ization of features as discussed in Section 2.2 above:

the values of prf

do not leak information on the key

and the original subhashes can not be reconstructed,

even when the key is disclosed (only dictionary at-

tacks). Furthermore, the entropy of the subhashes is

preserved by prf

and the application of prf

does not

generate new collisions.

Since collisions of the original subhashes are pre-

served by the prf

-function, this construction is not

suitable for audio authentication. In particular, an

adversary may produce perceptually different multi-

media data with the same hash value. But the privacy

is preserved since the prf

function is one-way. The

overall protection depends on the distribution of sub-

hashes and the number of known subhashes, i.e. infor-

mation can only be gained if the entropy is low and the

adversary has access to a large number of subhashes

or is capable to generate a large number of them for

given audio data.

It would be desirable to extend the randomization

operation beyond the subhashes, to add dependen-

cies between the blocks or to use salt values, but the

required shift-invariance and the localization proper-

ties impede this. But we obtain an additional ran-

domization by dropping the time position of the sub-

hashes, removing repeated entries and ﬁnally permut-

ing them. We remark that this method could also be

combined with a randomization during the feature ex-

traction as described in Section 2.2.

3.2 Implementation

The proposed construction is based on (Haitsma and

Kalker, 2002), our work (Grutzek et al., 2012) and

several privacy-related enhancements.

For identiﬁcation purposes, audio samples of sev-

eral seconds sufﬁce. The audio data is extracted ev-

ery 11.8 ms with overlapping frames of length 370

ms. Silent sections are skipped and only the ﬁrst

100 frames with sufﬁcient energy are processed. A

Fourier transform is applied to the frames and the

spectral coefﬁcients are ﬁltered by a mel ﬁlter bank

in order to determine the energy in each sub-band.

The bands are equally distributed on a logarithmic

frequency axis between 300 Hz and 1800 Hz. For

the privacy-enhanced hash, we extract for each frame

20 40 60 80 100

Figure 1: Binary hash matrix of three speech samples. The

left and central sample are similar, while the right is dissim-

ilar to both other samples. The upper 40 bit correspond to

spectral and the lower 20 bits to cepstral coefﬁcients.

41 spectral and 21 cepstral coefﬁcients (so-called

MFCCs). The spectral coefﬁcients are differentiated

in time and frequency direction, and the cepstral co-

efﬁcients only in frequency direction. This informa-

tion is quantized by only considering the sign while

disregarding magnitudes (compare (Grutzek et al.,

2012)). This yields a binary subhash vector of length

40 + 20 = 60 bits for each frame. Other common al-

gorithms use bit-lengths of approximately 32 bits, but

we can show (Section 4) that 60 bits provide addi-

tional entropy while still ensuring sufﬁcient robust-

ness. The hash has the following structure:

h(A) = {(t

),(t

),...,(t

100

)}

The subhashes are vectors v

∈ {0,1}

and the com-

plete hash can be represented by a binary matrix of

size 60× 100 (see Figure 1).

Then a key-dependent pseudorandom function

function prf

is applied to the vectors v

, the time po-

sitions are dropped and the resulting randomized hash

(A) is a set of binary vectors:

(A) = { prf

), prf

),...,prf

100

)}

Hence duplicates are removed and the ordering is

not relevant. The size of the hash h

(A) is only ap-

proximately 2 kBytes, depending on the keyed hash

function used as pseudorandom function.

We assume that the randomized hashes prf

)

are computationally indistinguishable from random

output and do not leak information on the key. Then

the security of our hash depends solely on the dis-

tribution of subhashes v

. If they have sufﬁcient en-

tropy, then an adversary obtains few information from

observing prf

). Ideally, the v

’s would be long

enough (say more than 100 bits) and uniformly dis-

tributed. In practice, it is hard to construct robust au-

dio hashes with such a large binary length and their

distribution is biased.

4 ANALYSIS

4.1 Entropy

We analyzed our hash with 5,530 real audio samples

SECRYPT2013-InternationalConferenceonSecurityandCryptography

552

0 65535

100

200

300

400

Subhash values (16 bit blocks)

Occurence

5 10 15 20

0.2

0.4

0.6

0.8

Block Length

Information Rate

Figure 2: Frequency distribution of 450,000 randomized

speech data subhashes (left) and estimated information rate

for different block lengths (right).

(see Section 4.2) and 450,000 randomized subhashes.

Figure 2 shows the approximate distribution and the

information rate (entropy per bit-symbol). The en-

tropy is estimated by counting the number of occur-

rences for blocks of length between 2 and 20 bits, and

the frequency distribution is computed for words of

length 16 bits. There may be dependencies between

the blocks, but for computational reasons it is not

possible to estimate the entropy for the given block-

length of 60, since this would require a multiple of

subhashes. Our computations show an informa-

tion rate of approximately 0.65 for the given audio

data. Additionally, the concatenated ﬁle of binary

subhashes was compressed with different algorithms

and parameters; the ﬁle size could be reduced by at

most 43%. We conclude that the subhashes provide at

least 34 bits of entropy. We therefore expect that any

information gain from the frequency of the random-

ized subhashes requires at least several million sub-

hashes. Changing the key K prevents such an attack,

but only ﬁngerprints which were randomized with the

same key can be identiﬁed.

4.2 Hypothesis Testing

The performance of the hash is analyzed with respect

to its capability to identify resp. to discriminate audio

samples. We assume a repository with a large number

of hashes when a new hash arrives. There are two

possible decisions:

• H

: The audio sample is perceptually different

from all the given ones.

• H

: The audio sample is perceptually similar to

one or more samples in the database.

This can be considered as an hypothesis testing

problem where the decision depends on the distance

d(h

(A),h

′

)) of hashes. In our case, the distance

is reciprocal to the number m of matching subhashes.

If none of the subhashes match, i.e. h

(A)∩h

′

) =

∅, then the decision is clearly H

. Otherwise, the de-

cision for either H

or H

depends on a threshold T. A

low threshold provides good robustness but less dis-

criminative power. Higher thresholds deteriorate the

1 5 10 15 20

0.25

0.5

0.75

Threshold

True Positive Rate P

All distortions and codecs

Without low bitrate codecs

0 5 10 15 20

0.05

0.1

Threshold

False Positive Rate P

Figure 3: True positive rate P

(left) and false positive rate

(right) for different thresholds.

robustness but also decrease the number of false iden-

tiﬁcations.

We analyzed 5,330 different audio samples from

Verbmobil II corpus of German telephone dialogs

(Bavarian Archive for Speech Signals, 1998) and 200

additional telephone spam ﬁles with perceptual simi-

lar copies. These ﬁles are based on 20 real telephone

spam recordings which were intentionally altered by

noise, audio- and telephone codecs. The following

types of alterations and distortions were considered

(compare (Grutzek et al., 2012)): MP3-codec at 32

and 96 kbps, GSM fullrate, G.726 codec at 16 and 32

kbps, 5% and 10% packet loss, white and pink noise

with 20dB SNR.

The performance can be characterized by the true

positive rate P

= P(m ≥ T | H

) and the false posi-

tive rate P

= P(m ≥ T | H

) where m is the number

of matching subhashes and T a threshold. For the true

positive rate, the hashes of all telephone spam record-

ings and their distorted versions are compared. P

the quotient of the number of positive identiﬁcations

and the number of expected identiﬁcations. With sub-

hashes of length 60 bits, the recognition rate is rela-

tively low (≈ 73% for T = 1) compared to the com-

mon 32-bit hashes. But the identiﬁcation mainly fails

for audio samples which are encoded with low bit rate

codecs (G.726 at 16 kBit/s and GSM fullrate at 13

kBit/s). We observe that the hit rate is much higher

(≈ 97% for T = 1) if these two codecs are not incor-

porated. The true positive rate for various thresholds

is depicted in Figure 3.

The false positive rate P

is computed relative to

a given repository of audio hashes. Hence P

de-

pends on the number of hashes in the repository, but

this reﬂects the error of ﬁrst kind in an identiﬁcation

scenario. We used N = 5,330 perceptually different

audio samples from the above corpus and performed

N(N − 1)/2 pairwise hash comparisons. A hash is

considered a false positive if it has at least T common

subhashes with any of the other N− 1 hashes. We ob-

served only 12 false positives for T = 1 and even not

a single false positive for T ≥ 2 (see Figure 3). This

advantageous property is mainly due to the large bit-

length and the entropy of our subhashes. For the usual

32-bit subhashes, random collisions occur much more

often. On the other hand, shorter subhashes provide

Privacy-enhancedPerceptualHashingofAudioData

553

more robustness and better true positive rates.

For the identiﬁcation of telephone spam, a signif-

icant rate of false negatives can be accepted since the

audio data will be replayed a number of times. But

false positive identiﬁcations of telephone spam should

be avoided, even for large hash repositories.

5 CONCLUSIONS

We studied the security and privacy requirements

of audio ﬁngerprints and analyzed the existing ap-

proaches and algorithms. There exist various pow-

erful ﬁngerprinting frameworks which permit an efﬁ-

cient identiﬁcation of audio samples. Some work has

been done on the security of audio hashes, but open

issues remain if the hash is used for multimedia au-

thentication and watermarking. This contribution an-

alyzes the privacyissues which are relevant for speech

data, for example to identify replayed telephone data

(spam calls). The ﬁngerprint should not leak informa-

tion on the original audio data.

By modifying well known audio ﬁngerprinting al-

gorithms and combining them with a cryptographic

message authentication code, we deﬁned a random-

ized audio hash which consists of a set of binary vec-

tors. We estimated the entropy of the subhash values

which is important for the security properties of the

proposed method. Furthermore, we analyzed the per-

formance in terms of robustness and discrimination

power. We showed that the hash has adequate robust-

ness, at least if the audio samples have sufﬁcient audio

quality, and excellent discrimination capabilities. The

hash permits an efﬁcient identiﬁcation of speech sig-

nals in large databases and prevents the exposure of

audio content.

Future work will incorporate additional audio ma-

terial and extend the study of the security properties

of robust keyed hash functions.

REFERENCES

Bavarian Archive for Speech Signals (1998). Verbmobil II.

Bellare, M. (2006). New proofs for NMAC and HMAC:

Security without collision-resistance. Advances in

Cryptology-CRYPTO 2006, pages 602–619.

Bellare, M., Canetti, R., and Krawczyk, H. (1996). Key-

ing hash functions for message authentication. In

Advances in Cryptology—CRYPTO’96, pages 1–15.

Springer.

Cano, P., Batlle, E., Kalker, T., and Haitsma, J. (2002). A

Review of Algorithms for Audio Fingerprinting. In

Multimedia Signal Processing, IEEE Workshop on,

pages 169–173.

Clausen, M. and Kurth, F. (2004). A uniﬁed approach

to content-based and fault-tolerant music recognition.

IEEE Transactions on Multimedia, 6(5):717–731.

Cremer, M., Froba, B., Hellmuth, O., Herre, J., and Alla-

manche, E. (2001). AudioID: Towards Content-Based

Identiﬁcation of Audio Material. In Audio Engineer-

ing Society Convention 110.

Doets, P. J. O. and Lagendijk, R. L. (2008). Distortion Esti-

mation in Compressed Music Using Only Audio Fin-

gerprints. IEEE Transactions on Audio, Speech, and

Language Processing, 16(2).

Fridrich, J. and Goljan, M. (2000). Robust Hash Functions

for Digital Watermarking. In Information Technology:

Coding and Computing, International Conference on,

pages 178–183.

Grutzek, G., Strobl, J., Mainka, B., Kurth, F., Poerschmann,

C., and Knospe, H. (2012). Perceptual hashing for the

identiﬁcation of telephone speech. Speech Commu-

nication; 10. ITG Symposium; Proceedings of, pages

1–4.

Haitsma, J. and Kalker, T. (2002). A highly robust audio ﬁn-

gerprinting system. In Proc. ISMIR, volume 2, pages

13–17.

Koval, O., Voloshynovskiy, S., Bas, P., and Cayre, F. (2009).

On security threats for robust perceptual hashing. In

IS&T/SPIE Electronic Imaging 2009.

Koval, O., Voloshynovskiy, S., Beekhof, F., and Pun, T.

(2008). Security analysis of robust perceptual hash-

ing. In IS&T/SPIE Electronic Imaging 2008.

Kurth, F. and M¨uller, M. (2008). Efﬁcient Index-Based Au-

dio Matching. IEEE Transactions on Audio, Speech,

and Language Processing, 16(2):382–395.

Slaney, M. and Casey, M. (2008). Locality-sensitive hash-

ing for ﬁnding nearest neighbors [lecture notes]. Sig-

nal Processing Magazine, IEEE, 25(2):128–131.

Swaminathan, A., Mao, Y., and Wu, M. (2006). Robust and

Secure Image Hashing. IEEE Transactions on Infor-

mation Forensics and Security, 1(2):215–230.

Thiemert, S., Nurnberger, S., Steinebach, M., and Zmudzin-

ski, S. (2009). Security of robust audio hashes. In

Information Forensics and Security, 2009. First IEEE

International Workshop on, pages 126 –130.

Wang, A. L.-C. (2003). An Industrial-Strength Audio

Search Algorithm. ISMIR 2003, 4th Symposium Con-

ference on Music Information Retrieval, pages 7–13.

Wang, A. L.-C. and Smith III, J. O. (2008). Methods for

recognizing unknown media samples using character-

istics of known media samples.

Weng, L. and Preneel, B. (2011). A secure perceptual hash

algorithm for image content authentication. In Com-

munications and Multimedia Security, pages 108–

121.

Zmudzinski, S. and Steinebach, M. (2009). Perception-

based Authentication Watermarking for Digital Audio

Data. In IS&T/SPIE Electronic Imaging 2009.

SECRYPT2013-InternationalConferenceonSecurityandCryptography

554