On Securing Communication from Proﬁlers

Sandra D´ıaz-Santiago

∗

and Debrup Chakraborty

Department of Computer Science, CINVESTAV IPN, Av. Instituto Polit´ecnico Nacional No. 2508,

Col. San Pedro Zacatenco, D.F., 07360, Mexico

Keywords:

Data Encryption, Proﬁling Adversary, User Proﬁling, CAPTCHA, Secret Sharing.

Abstract:

A proﬁling adversary is an adversary which aims to classify messages into pre-deﬁned proﬁles and thus gain

useful information regarding the sender or receiver of such messages. Usual chosen-plaintext secure encryp-

tion schemes are capable of securing information from proﬁlers, but these schemes provide more security than

required for this purpose. In this paper we study the requirements for an encryption algorithm to be secure

only against proﬁlers and ﬁnally give a precise notion of security for such schemes. We also present a full

protocol for secure (against proﬁling adversaries) communication, which neither requires a key exchange nor

a public key infrastructure. Our protocol guarantees security against non-human proﬁlers and is constructed

using CAPTCHAs and secret sharing schemes.

1 INTRODUCTION

Informally a spam email is an email which is not of

interest to the receiver. Everyday almost every one

of us ﬁnds hundreds of such spam emails waiting in

our in-boxes. A spammer (who sends spam emails)

generally has a business motiveand most spam emails

try to advertise a product, a web-page or a service. If

the spam emails can be sent in a directed manner, i.e.,

if a spammer can send a speciﬁc advertisement to a

user who would be interested in it, then the motive

of the spammer would be successful to a large extent.

Thus, one of the important objectives of a spammer

would be to know the preferences or interests of the

users to whom it is sending the un-solicited messages.

In today’s connected world we do a lot of com-

munication through emails and it is not un-realistic

to assume that a collection of email messages which

originate from a speciﬁc user U carries information

about the preferences and interests of U. Based on

this assumption a spammer can collect email infor-

mation originating from different users and based on

these emails try to make a proﬁle of each user (based

on their preferences or interests), and later use this

proﬁle for directed spamming.

Here we assume that given a message space an

adversary aims to map each message in the message

space into certain classes of its interest. Using this c-

∗

Sandra D´ıaz-Santiago is on academic leave from Es-

cuela Superior de C´omputo (ESCOM-IPN).

assiﬁcation of messages the adversary can try to con-

clude which user is associated with which class and

this is expected to reveal information regarding the

proﬁle of a given user. Thus, in the scenario of our

interest we consider an adversary that classiﬁes mes-

sages into pre-deﬁned classes. Such an adversary

would be further called as a proﬁler.

Other than directed spamming, there may be other

motives for user proﬁling. Currently there has been

a paradigm shift in the way products are advertised

in the internet. In one of the popular new paradigm of

online behavioral advertising (OBA) (Toubiana et al.,

2010), internet advertising companies display adver-

tisements speciﬁc to user preferences. This requires

proﬁling the users. To support this big business of in-

ternet advertising, innovative techniques for user pro-

ﬁling have also developed. It is known that some in-

ternet service providers perform a procedure called

deep packet inspection on all trafﬁc to detect malware

etc., but this technique has been used to generate user

proﬁles from the information contents of the packets

received or sent by an user, and this information is

later sold to advertising companies (Toubiana et al.,

2010). This currently has led to many policy related

debates, and it has been asked whether such practices

should be legally allowed (NYT, 2009).

In the context of emails, a solution to the prob-

lem of proﬁling attacks would be encrypting the com-

munications so that the contents of the emails are not

available to the proﬁler. Or to make the communi-

154

Díaz-Santiago S. and Chakraborty D..

On Securing Communication from Proﬁlers.

DOI: 10.5220/0004054501540162

In Proceedings of the International Conference on Security and Cryptography (SECRYPT-2012), pages 154-162

ISBN: 978-989-8565-24-2

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

cations anonymous so that given a message it would

not be possible for a proﬁler to trace the origin of the

message. In this paper we ask the following question:

What would be the exact security requirements for an

encryption scheme which can protect the communica-

tion from proﬁlers? Intuitively a cipher obtained from

a secure encryption algorithm should not reveal any

information regarding the plaintext which was used to

produce the cipher. Hence, a secure encryption algo-

rithm should surely resist attacks by proﬁlers. But, as

the goal of a proﬁler is only to classify the messages,

it is possible that an encryption algorithm which pro-

vides security in a weaker sense would be enough to

resist proﬁlers. We explore in this direction and try to

ﬁx the appropriate security deﬁnition of an encryption

scheme which would provide security against proﬁl-

ers.

Using any encryption scheme involves the com-

plicated machinery of key exchange (for symmetric

encryption) or a public key infrastructure (for asym-

metric encryption). When the goal is just to protect

information against proﬁlers the heavy machinery of

key exchange or public key infrastructure may be un-

necessary. Keeping in mind security against proﬁl-

ers we propose a new protocol which does not require

explicit key exchange. To do this we use the notion

of CAPTCHAs, which are programs that can distin-

guish between humans and machines by automated

Turing tests which are easy for humans to pass but

difﬁcult for any machine. The use of CAPTCHAs

makes our protocol secure from non-human proﬁlers,

but the protocol is still vulnerable to human adver-

saries. In the context that we see the activity of pro-

ﬁling, it would be only proﬁtable if a large number of

users can be proﬁled and this goal seems to be infea-

sible if human proﬁlers are employed for the task.

To our knowledge the only prior work on the is-

sue of securing email communication from proﬁlers

have been reported in (Golle and Farahat, 2004). In

(Golle and Farahat, 2004) it was pointed out that

an encryption scheme secure against proﬁlers can be

much weaker than normal encryption algorithms, and

thus using a normal encryption algorithm can be an

overkill. The solution in (Golle and Farahat, 2004)

hides the semantic of the plaintext by converting an

English text into another English text with the help

of a key. In their protocol also they do not need ex-

plicit key exchange or a public key infrastructure. The

key is derived from the email header by using a hash

function with a speciﬁc property. The hash function

they use is a “slow one-way hash function”, which

was ﬁrst proposed in (Dwork and Naor, 1992). Such

hash functions are difﬁcult to compute, i.e., may take

a few seconds to get computed and are hard to in-

vert. This high computational cost for the hash func-

tion prevents a proﬁler to derive the key for a large

number of messages. Our method is fundamentally

different from (Golle and Farahat, 2004) in its use of

CAPTCHAs. Slow hash functions which were pro-

posed long ago have not seen much use, and its suit-

ability is not well tested. But CAPTCHAs are ubiqui-

tous in today’s world and had been used successfully

in diverse applications. Also, our work presents a the-

oretical analysis of the problem, and provides the se-

curity deﬁnitions which to our knowledge is new to

the literature.

The rest of the paper is organized as follows. In

Section 2 we describe basic concepts related to in-

distinguishability, CAPTCHA and secret sharing. In

Section 3 we present a formal deﬁnition of a proﬁl-

ing adversary and security against such adversaries.

In Sections 4 and 5 we describe our protocols and ar-

gue regarding their security in terms of the security

notion given in Section 3. We conclude the paper in

Section 6 where we discuss about the limitations of

our approach and some future directions.

2 PRELIMINARIES

2.1 Notations

The set of all n bit strings would be denoted by

{0,1}

. For a string x, |x| will denote the length of

x and for a ﬁnite set A, |A| would denote the cardi-

nality of A. For a ﬁnite set S , x

← S will denote x

to be an element selected uniformly at random from

S. In what follows, by an adversary we shall mean a

probabilistic algorithm which outputs an integer or a

bit. A(x,y) ⇒ b, will denote the fact that an adver-

sary A given inputs x,y outputs b. In general an ad-

versary would have other sorts of interactions, maybe

with other adversaries and/or algorithms before it out-

puts, these would be clear from the context. In what

follows by E : K × M → C would denote an encryp-

tion scheme with K , M , C as the key space, mes-

sage space and cipher space respectively. For m ∈ M

and k ∈ K we shall usually write E

(m) instead of

E(k, m).

2.2 Indistinguishability in the Presence

of an Eavesdropper

Security of encryption schemes is best deﬁned in

terms of indistinguishability. Here we consider in-

distinguishability in presence of an eavesdropping ad-

versary. This security notion, which we call as IND-

OnSecuringCommunicationfromProfilers

155

EAV security, is deﬁned with the help of interaction

between two entities called an adversary and a chal-

lenger. It considers that an adversary chooses a pair of

plaintext messages and then ask for the encryption of

those messages to the challenger. The challenger pro-

vides the adversary with the encryption of one of the

messages chosen by the adversary. The adversary is

considered to be successful if it can correctly guess

which message of its choice was encrypted. More

formally, to deﬁne the security of an encryption al-

gorithm E : K × M → C , we consider the interaction

of an adversary A with a challenger in the experiment

below:

Experiment Exp-IND-EAV

1. The challenger selects K uniformly at random

from K .

2. The adversary A selects two messages

∈ M , such that |m

| = |m

3. The challenger selects a bit b uniformly at

random from {0, 1}, and returns

c ← E

) to A.

4. The adversary A outputs a bit b

′

5. If b = b

′

output 1 else output 0.

Deﬁnition 1. Let E : K × M → C be an encryption

scheme. The IND-EAV advantage of an adversary A

in breaking E is deﬁned as

Adv

ind-eav

(A) = Pr[Exp-IND-EAV

⇒ 1] −

Moreover, E is (ε,t) IND-EAV secure if for all adver-

saries A running for time at most t, Adv

ind-eav

(A) ≤

ε. ♦

The IND-EAV security as deﬁned above is used

only for one time encryption and it is different from

the most used security notion for symmetric encryp-

tion which is indistinguishability under chosen plain-

text attack (IND-CPA). In an IND-CPA attack the ad-

versary is given access to the encryption oracle and

thus can consult this oracle before it chooses the mes-

sages, and has the option of asking encryption of mul-

tiple pairs of messages before it outputs. IND-EAV

notion is strictly weaker than the IND-CPA notion of

security. All IND-CPA secure encryption schemes are

also IND-EAV secure.

A related notion of security is that of semantic se-

curity. Informally a symmetric encryption scheme is

called semantically secure if an adversary is unable to

compute any function on the plaintext given a cipher-

text.

Deﬁnition 2. Let E : K × M → C be an encryption

scheme. E is called (ε,t) SEM-EAV secure, if for all

functions f and for all adversaries running for time

at most t

|Pr[A(E

(x)) ⇒ f(x)]

− max

′

Pr[A

′

(.) ⇒ f(x)]| ≤ ε (1)

where the running time of A

′

is polynomially related

to t, and x is chosen uniformly at random from M . ♦

Note, in the above deﬁnition, by A

′

(.) we mean

that the adversary is given no input, i.e., A

′

is trying

to predict f(x) without seeing E

(x). And in the sec-

ond term of Equation (1) the maximum is taken over

all adversaries A

′

which runs for time at most poly(t),

for some polynomial poly(). Thus, if E is SEM-EAV

secure then no adversary can do better in predicting

f(x) from E

(x) than an adversary who does so with-

out seeing E

(x). It is well known that IND-EAV

security implies SEM-EAV security (for example see

Claim 3.11 in (Katz and Lindell, 2008)).

2.3 CAPTCHA

A CAPTCHA is a computer program designed to dif-

ferentiate a human being from a computer. The fun-

damental ideas for such a program were ﬁrst pro-

posed in an unpublished paper (Naor, 1997) and

then these ideas were formalized in (von Ahn et al.,

2003), where the name CAPTCHA was ﬁrst pro-

posed. CAPTCHA stands for Completely Automated

Public Turing test to tell Computers and Humans

Apart. In fact, a CAPTCHA is a test which is easy

to pass by a human user but hard to pass by a ma-

chine. One of the most common CAPTCHAs are dis-

torted images of short strings. For a human it is gen-

erally very easy to recover the original string from the

distorted image, but it is difﬁcult for state of the art

character recognition algorithms to recover the orig-

inal string from the distorted image. Other types of

CAPTCHAs which depend on problems of speech

recognition, object detection, classiﬁcation etc. have

also been developed.

Recently CAPTCHAs have been used in many

different scenarios for identiﬁcation of humans, like

in chat rooms, online polls etc. Also they can be used

to prevent dictionary attacks on the password based

systems (Pinkas and Sander, 2002), and more recently

for key establishment (Dziembowski, 2010).

A CAPTCHA is a randomized algorithm G, which

given a input string from a set of strings STR pro-

duces the CAPTCHA G(x). A CAPTCHA G is called

(α,β) secure, if for any human or legitimate solver S

Pr[x

← STR : S(G(x)) ⇒ x] ≥ α,

and for any efﬁcient machine C

Pr[x

← STR : C(G(x)) ⇒ x] ≤ β,

SECRYPT2012-InternationalConferenceonSecurityandCryptography

156

For a CAPTCHA to be secure it is required that

there is a large gap between α and β. In Section 4,

we will propose an alternative security deﬁnition for

CAPTCHAs.

2.4 Secret Sharing Schemes

A secret sharing scheme is a method designed to

share a secret between a group of participants. These

schemes were ﬁrst proposed by Shamir in 1979

(Shamir, 1979). Although there have been improve-

ments to these kind of schemes, here we will use the

basic construction due to Shamir. In a (u, w) thresh-

old secret sharing scheme a secret K is divided into

w pieces called shares. These w shares are given to

w participants. To recover the secret, at least u ≤ w

of the w shares are required. And it is not possible to

recover the secret with less than u shares.

We describe a speciﬁc construction proposed by

Shamir. To construct a (u,w) secret sharing scheme

we need a prime p ≥ w + 1 and the operations take

place in the ﬁeld Z

. The procedure for splitting a se-

cret K into w parts is depicted in the algorithm below:

u,w

(K)

1. Choose w distinct, non-zero elements of Z

denote them as x

, 1 ≤ i ≤ w.

2. Choose u− 1 elements of Z

independently at

random. Denote them as a

,...,a

u−1

3. Let, a(x) = K +

u−1

∑

j=1

mod p,

and y

= a(x

), 1 ≤ i ≤ w.

4. Output S = {(x

),...,(x

)} as the

set of w shares.

The secret K can be easily recovered using any

B ⊂ S such that |B| ≥ u, but if |B| < u then K cannot

be recovered. To see this, observe that the polyno-

mial used in step 3 to compute the y

s is a u − 1 de-

gree polynomial. Thus using u pairs of the type (x

)

one can generate u linear equations, each of the type

= K + a

+ ···a

u−1

. Using these equations

the value of K can be found. It can be shown that this

set of u equations would always have a unique solu-

tion.

3 PROFILING ADVERSARIES

Let M be a message space and P = {1,2,...,k} be a

set of labels for different possible proﬁles. We assume

that each message x in M can be labeled by a unique

j ∈ P . Thus, there exists a function f : M → P , which

assigns a label to each message in the message space.

In other words, we can assume that the message space

can be partitioned into disjoint subsets as M = M

∪

∪ ··· ∪ M

and for every x ∈ M , f(x) = i if and

only if x ∈ M

We call f as the proﬁling function or a classiﬁer.

Thus, in this setting we are assuming that each mes-

sage in the message space M represents some proﬁle,

and messages in M

(1 ≤ i ≤ k) correspond to the pro-

ﬁle i. The function f is a classiﬁer which given a mes-

sage can classify it into one of the proﬁles. We also

assume that the function f is efﬁciently computable

for every x ∈ M , in particular, we assume that for any

x ∈ M , f(x) can be computed in time at most µ, where

µ is a constant.

The function f is public, thus given x ∈ M any ad-

versary can efﬁciently compute f(x). We want to de-

ﬁne security for an encryption scheme which is secure

against proﬁling adversaries, i.e., we want that when

a message from M is encrypted using the encryption

algorithm no efﬁcient adversary would be able to pro-

ﬁle it.

3.1 PROF-EAV Security

Here we propose a deﬁnition for encryption schemes

secure against proﬁling adversaries.

Deﬁnition 3. [PROF-EAV Security]. Let M be a

message space and f : M → P be a proﬁling func-

tion. Let E : M × K → C be an encryption algo-

rithm. We deﬁne the advantage of an adversary A in

the PROF-EAV (read proﬁling under eavesdropping)

sense in breaking E as

Adv

prof-eav

E, f

(A) = Pr[A(E

(x)) ⇒ f(x)]

−max

′

Pr[A

′

(.) ⇒ f(x)], (2)

where K

← K , x

← M and A

′

is an adversary whose

running time is a polynomial of the running time of

A. An encryption algorithm E : M ×K → C is called

(ε,t) PROF-EAV secure for a given proﬁling function

f, if for all adversaries A running in time at most t,

Adv

prof-eav

E, f

(A) ≤ ε. ♦

In the deﬁnition above, we want to capture the no-

tion that for a PROF-EAV secure encryption scheme,

an adversary A trying to ﬁnd the proﬁle of a message

seeing its cipher cannot do much better than the best

adversary A

′

, who tries to guess the proﬁle without

seeing the ciphertext.

This deﬁnition is in accordance with the deﬁni-

tion of semantic security as discussed in Section 2.2.

Recall that an encryption scheme is called semanti-

cally secure if no adversary can efﬁciently compute

any function of the plaintext given its ciphertext. But

OnSecuringCommunicationfromProfilers

157

in the PROF-EAV deﬁnition we are interested only

on a speciﬁc function f. Thus, PROF-EAV secu-

rity is strictly weaker than semantic security. Seman-

tic security trivially implies PROF-EAV security but

PROF-EAV security does not imply IND-EAV secu-

rity, we give a concrete example to illustrate this.

Example 1. Let M = {0, 1}

= M

∪ M

be a mes-

sage space, where

= {x ∈ M : ﬁrst bit of x is 0},

and M

= M \ M

, and f be the proﬁling function

such that f(x) = i iff x ∈ M

. Let E

one

be an encryp-

tion scheme which uses a one bit key k (chosen uni-

formly from {0,1}) and given a message x ∈ M it xors

k with the ﬁrst bit of x. It is easy to see that an adver-

sary trying to guess the proﬁle of a message x given

one

(x) cannot do better than with probability half,

and this success probability can be achieved even

without seeing the ciphertext, as here |M

| = |M

Hence E

one

is PROF-EAV secure, but trivially not se-

cure in the IND-EAV sense.

4 ENCRYPTION PROTOCOL

SECURE AGAINST PROFILING

ADVERSARIES

In this section we describe a complete protocol which

would be secure against proﬁling adversaries. As

mentioned in the introduction here we care about ad-

versaries who are not humans. Our motivation is

to prevent communications getting proﬁled in large

scale mechanically. The protocol is not secure from

human adversaries, and we do not care much about

that as we hope that it would be economically infea-

sible to employ a human for large scale proﬁling.

The protocol P consists of the following entities:

• The message space M , the cipher space C.

• The set of proﬁles P and the proﬁling function f

associated with M .

• A set STR which consists of short strings over a

speciﬁed alphabet.

• An encryption scheme E : K × M → C .

• A hash function H : STR → K .

• A CAPTCHA generator G which takes inputs

from STR.

Given a message x ∈ M , P produces a ciphertext as

shown in Figure 1. In the protocol as described in Fig-

ure 1, k, an element of STR is hashed to form the key

K and k is also converted into a CAPTCHA and trans-

mitted along with the ciphertext. The only input to P

Protocol P(x)

1. k

← STR;

2. k

′

← G(k);

3. K ← H(k);

4. c ← E

(x);

5. return (c, k

′

)

Figure 1: The protocol P.

is the message and the key generation is embedded in

the protocol. It resembles the scenario of hybrid en-

cryption (Abdalla et al., 2001), which consists of two

mechanisms called key encapsulation and data encap-

sulation where an encrypted version of the key is also

transmitted along with the cipher. For a human de-

cryption is easy, as given a ciphertext (c,k

′

) a human

user can recover k from k

′

by solving the CAPTCHA

and thus compute E

−1

H(k)

4.1 Security of P

The security of a protocol P against proﬁlers is de-

ﬁned in the same way as in Deﬁnition 3.

Deﬁnition 4. The advantage of an adversary attack-

ing protocol P is deﬁned as

Adv

prof

P, f

(A)

= Pr[A(P(x)) ⇒ f(x)] − max

′

Pr[A

′

(.) ⇒ f(x)],

where x

← M and A

′

is an adversary whose run-

ning time is a polynomial of the running time of A.

Additionally P is called (ε,t) secure in the PROF

sense if for all adversaries running in time at most

t, Adv

prof

(A) < ε. ♦

The above deﬁnition is different from Deﬁnition 3

by the fact that it does not mention the key explicitly,

as key generation is embedded in the protocol itself.

To prove that P is secure in the PROF sense we need

an assumption regarding the CAPTCHA G and the

hash function H. We state this next.

Deﬁnition 5. [The Hash-Captcha Assumption].

Let G be a CAPTCHA generator, let r be a number,

let H : STR → {0,1}

be a hash function, and let A

be an adversary. We deﬁne the advantage of A in vi-

olating the Hash-Captcha assumption as

Adv

G,H

(A) = Pr[x

← STR : A(G(x),H(x)) ⇒ 1]

−Pr[x

← STR,z

← {0,1}

: A(G(x),z) ⇒ 1].

Moreover, (G,H) is called (ε,t) HC secure if

for all adversaries A running in time at most t,

Adv

G,H

(A) ≤ ε. ♦

SECRYPT2012-InternationalConferenceonSecurityandCryptography

158

This deﬁnition says that the pair formed by a

CAPTCHA generator G and a hash function H is se-

cure, if an adversary A is unable to distinguish be-

tween a (G(x),H(x)), where x is some string, and

(G(x),z), where z is a random string. This secu-

rity notion of a CAPTCHA inspired by the notion of

indistinguishability is quite different from the (α,β)

security notion as described in Section 2.3. Here

the adversary has some more information regarding

x through the value H(x). If the adversary can ef-

ﬁciently solve the CAPTCHA G then it can break

(G,H) in the HC sense irrespective of the hash func-

tion. Given the CAPTCHA is secure, i.e., no efﬁcient

adversary can ﬁnd x from G(x) still an adversary may

be able to distinguish H(x) from a string randomly

selected from the range of H.

If we consider a keyed family of hash functions

H = {H

ℓ

}

ℓ∈L

, such that for every ℓ ∈ L, H

ℓ

: D → R

for some sets D and R . Then H is called a entropy

smoothing family if for any efﬁcient adversary it is

difﬁcult to distinguish between (ℓ,H

ℓ

(x)) and (ℓ, z),

where ℓ, x, z are selected uniformly at random from

L, D and R respectively. An entropy smoothing hash

along with a secure captcha can resist HC attacks. En-

tropy smoothing hashes can be constructed from uni-

versal hash functions using the left over hash lemma

(Impagliazzo and Zuckerman, 1989), but the param-

eter sizes which would be required for such provable

guarantees can be prohibitive. We believe that using

ad-hoc cryptographic hashes like the ones from the

SHA family can provide the same security. In our

deﬁnition we do not use a keyed family of hash func-

tions, but such a family can be easily used in the pro-

tocol P, and in that case the hash key will also be a

part of the ciphertext.

With these discussions we are now ready to state

the theorem about security of P.

Theorem 1. Let P be a protocol as in Figure 1 and A

is an adversary attacking P in the PROF sense. Then

there exist adversaries B and B

′

such that

Adv

prof

P, f

(A) ≤ Adv

G,H

(B) + Adv

prof-eav

E, f

′

And, if A runs for time t, both B and B

′

runs for time

O(t).

Proof. Let A be an adversary attacking the protocol

P in Figure 1. We construct an adversary B attacking

the hash-captcha (G,H), using A as follows.

Adversary B(G(k),z)

1. x

← M ;

2. Send (E

(x),G(k)) to A;

3. A returns j;

4. if f(x) = j;

5. return 1;

6. else return 0;

As B is an adversary attacking the hash-captcha

assumption, hence there are two possibilities regard-

ing the input (G(k),z) of B, z can either be H(k) or a

uniform random element in K , and the goal of B is to

distinguish between these two possibilities.

Considering the ﬁrst possibility that z is H(k), the

way the adversary B is deﬁned, A gets a valid en-

cryption of the message x (which is a random element

in the message space) according to the protocol P.

Hence we have

Pr[k

← K : B(G(k),H(k)) ⇒ 1]

= Pr[k

← K ,x

← M : A(E

H(k)

(x),G(k)) ⇒ f(x)]

= Pr[x

← M : A(P(x)) ⇒ f(x)]. (3)

Similarly, for the second possibility, i.e., when the in-

put z to B is an element chosen uniformly at random

from K , we have

Pr[k,K

← K : B(G(k),K) ⇒ 1]

= Pr[x

← M : A(E

(x),G(k)) ⇒ f(x)]. (4)

In Equation (4), k and K are chosen independently

uniformly at random from K . Thus, the adversary

A has as input E

(x) and G(k), where k is indepen-

dent of K, thus G(k) carries no information about K.

Hence A cannot do better than some PROF-EAV ad-

versary B

′

who has only E

(x) as its input, and runs

for same time as that of A. Thus

Pr[x

← M : A(E

(x),G(k)) ⇒ f(x)]

≤ Pr[x

← M : B

′

(x)) ⇒ f(x)] (5)

From deﬁnition of PROF-EAV advantage of B

′

have

Pr[x

← M : B

′

(x)) = f(x)]

= Adv

prof-eav

E, f

′

)

+max

′

Pr[A

′

(.) ⇒ f(x)] (6)

Thus, using Equations (4), (5) and (6) we have

Pr[k,K

← K : B(G(k),K) ⇒ 1]

≤ Adv

prof-eav

E, f

′

)

+max

′

Pr[A

′

(.) ⇒ f(x)] (7)

Finally, from Equations (3) and (7) and Deﬁni-

tions 5 and 4 we have

Adv

prof

P, f

(A) ≤ Adv

G,H

(B) + Adv

prof-eav

E, f

′

OnSecuringCommunicationfromProfilers

159

as desired. Also if A runs for time t, then B

′

runs

for time t and B runs for time t + c for some small

constant c.

Some Remarks about Security of P: We deﬁned the

security of the protocol P for only a ﬁxed proﬁling

function f, but note that we can modify the deﬁnition

for any arbitrary function f which would give us a

security deﬁnition equivalentto SEM-EAV (discussed

in Section 2.2). If the encryption algorithm E used

within the protocol is SEM-EAV secure then using the

same proof we can obtain SEM-EAV security for P.

5 A PRACTICAL

INSTANTIATION

A very common problem using CAPTCHAs is that

sometimes even humans may fail to solve them. As

in the protocol P if a human user fails to solve the

CAPTCHA then (s)he will not be able to decipher

and there is no way to repeat the test (as is done in

normal CAPTCHA usage), hence this stands as a se-

rious weakness of the proposed protocolP. A solution

to this problem can be attempted by providing some

redundancy in the CAPTCHAs so that a valid user

can have more chance in solving the CAPTCHA. As

a solution we propose that the initial string k chosen

by the protocol is broken into w shares such that with

u or more of the shares would be enough to gener-

ate k. These w shares are converted into CAPTCHAs

and sent along with the ciphertext. To incorporate this

idea we changed the initial protocol P to P

′

. The pro-

tocol P

′

is a speciﬁc instantiation, thus before we de-

scribe the protocol we ﬁx some details of its compo-

nents, in particular for P

′

we would require an encod-

ing mechanism ENCD which we discuss ﬁrst.

Let AL = {A, B,...,Z} ∪ {a,b,...,z} ∪

{0,1,...,9} ∪ {+, /}, thus making |AL| = 64.

We deﬁne an arbitrary (but ﬁxed) bijection

ρ : AL → {0,1, . ..,63}, and for any σ ∈ AL and

n ≥ 6, bin

(σ) will denote the n bit binary represen-

tation of ρ(σ). Note that for all σ ∈ AL, at most 6

bits are required to represent ρ(σ). If ψ is a binary

string, then let toInt(ψ) be the positive integer corre-

sponding to ψ, similarly for a positive integer v < 2

toBin

(v) denotes the n bit binary representation of

v. We ﬁx a positive integer m and let STR be the

set of all m character strings over the alphabet AL.

Let p be the smallest prime greater than 2

and let

d = p − 2

. Let ENCD : STR × {0, 1, ..., d} → Z

be deﬁned as follows

ENCD(s,λ)

1. Parse s as σ

||σ

||...||σ

, where each σ

∈ AL;

Protocol P

′

(x)

1. k

← STR;

2. k

′

← ENCD(k,0);

3. {(x

′

),...,(x

′

)} ← SHARE

u,w

′

);

4. for i = 1 to w;

5. (k

,λ

) ← ENCD

−1

′

);

6. c

← G(k

);

7. end for

8. K ← H(k);

9. C ← E

(x);

10.return [C,{(x

,λ

),...,(x

,λ

)}]

Figure 2: The protocol P

′

which uses a secret-sharing

scheme.

2. ψ ← bin

(σ

)||...||bin

(σ

);

3. v ← toInt(ψ);

4. return v + λ;

And let ENCD

−1

: Z

→ STR × {0,1,...,d} be

deﬁned as

ENCD

−1

(y)

1. if y ≥ 2

2. λ ← y − 2

+ 1;

3. y ← 2

− 1;

4. else λ ← 0;

5. z ← toBin

(y);

6. Parse z as z

||z

||...||z

, where |z

| = 6;

7. s ← ρ

−1

(toInt(z

))||...||ρ

−1

(toInt(z

));

8. return (s, λ);

The modiﬁed protocol P

′

is shown in Figure 2. It

uses the encoding function ENCD and the secret shar-

ing scheme as depicted in Section 2.4. For P

′

we as-

sume that STR contains all m character strings over

the alphabet AL, and p is the smallest prime greater

than 2

, these can be considered the ﬁxed and public

parameters for P

′

. The encoding mechanism is specif-

ically designed to convert a string in STR to an el-

ement in Z

so that Shamir’s secret sharing can be

suitably used.

To decrypt a cipher produced by P

′

a human user

must solve at least some u of w CAPTCHAs. Us-

ing these u solutions together with x

, k can be recov-

ered. A speciﬁc recommendation for SHARE can be

Shamir (2,5)-threshold scheme. Thus the user would

have much ﬂexibility on solving the CAPTCHAs.

5.1 Security of P

′

The security of P

′

can be easily proved in the sense of

Deﬁnition 4 in a similar way as we proveTheorem 1 if

SECRYPT2012-InternationalConferenceonSecurityandCryptography

160

we make a new assumption regarding the CAPTCHA

as follows :

Deﬁnition 6. [The Hash-MultiCaptcha Assump-

tion]. Let G be a CAPTCHA generator, let r be a num-

ber, let H : STR → {0,1}

be a hash function, and let

A be an adversary. Also, let x = g(x

,...x

) be such

that if at least u out of w of x

,...,x

are known then

x can be recovered. We deﬁne the advantage of A in

violating the Hash-MultiCaptcha assumption as

Adv

hmc

G,H

(A)

= Pr[A(G(x

),...G(x

),H(x)) ⇒ 1]

−Pr[z

← {0,1}

: A(G(x

),...G(x

),z) ⇒ 1].

where x

← STR. Moreover, (G,H) is called (ε,t)

HMC secure if for all adversaries A running in time

at most t, Adv

hmc

G,H

(A) ≤ ε. ♦

As in the deﬁnition of Hash-Captcha assumption,

in this deﬁnition if the adversary can efﬁciently solve

at least u of w CAPTCHAs, then it can break (G, H)

in the HMC sense irrespective of the hash function. If

this assumption is true, then we can show the security

of protocol P

′

just as we did for protocol P.

A CAPTCHA is an example of a weakly-

veriﬁable puzzle (Canetti et al., 2005), since a legit-

imate solver S may not be able to verify the correct-

ness of its answer. For this kind of puzzles, it has been

proved (Impagliazzo et al., 2009) that if it is difﬁcult

for an attacker to solve a weakly-veriﬁable puzzle P,

then trying to solve multiple instances of a puzzle in

parallel is harder. Most recently, Jutla found a bet-

ter bound to show how hard it is for an attacker to

solve multiple instances of weakly-veriﬁable puzzles

(Jutla, 2010). The next theorem is based on the main

theorem proposed by Jutla, but it has been adapted to

CAPTCHAs, which are of our interest in this work.

Theorem 2. Let G be a CAPTCHA generator which

is (α,β) secure. Let k ∈ N, δ = 1− β and γ (0 < γ <

1) be arbitrary. Let A be an arbitrary polynomial

time adversary, which is given as input k CAPTCHAs

(G(x

),...,G(x

)) and outputs a set X of solutions of

the k CAPTCHAs. If InCorr(X) denotes the number

of incorrect solutions in X, then

Pr[InCorr(X) < (1− γ)δk] < e

−(1−γ)γ

δk/2

This theorem establishes that for any adversary if

the probability of failure in solving a CAPTCHA is

at least δ, then the probability of failing on less than

(1− γ)δk out of k puzzles, is at most e

−(1−γ)γ

δk/2

Based on this fact, it may be possible to show that

for any arbitrary adversary A attacking the HMC as-

sumption, there exists a HC adversary B such that

Adv

hmc

G,H

(A) < Adv

G,H

(B). This would imply that

the HC assumption implies the HMC assumption.

But, for now we are not sure whether such a result

holds.

5.2 Discussions

• About the Encryption Scheme: In this work

we have not said anything about the encryption

scheme to be used in the protocol. We only

said that we require our encryption scheme to be

PROF-EAV secure and any IND-EAV secure en-

cryption scheme can provide such security. Thus

most symmetric encryption schemes which are

usually in use like CBC mode, counter mode etc.

( which provide security in the IND-CPA sense)

can be used for the encryption function E in P

′

A more efﬁcient scheme which provides security

only in the PROF-EAV sense would be much in-

teresting, we would like to explore in this direc-

tion.

• Key Sizes: Another important thing to consider is

that the effective size of a key for the protocol is

dictated by the parameter m, i.e., the size of each

string in STR. This value cannot be made arbitrar-

ily large as solving big CAPTCHAs for human be-

ings may be tiresome, a usual CAPTCHA length

is ﬁve to eight characters. If we use eight char-

acter strings from the alphabet AL then the effec-

tive size of the key space would be 2

. Increas-

ing the alphabet size is also not feasible as we

need un-ambiguous printable characters to make

CAPTCHAs. Thus, the key space is not sufﬁ-

ciently large for a modern cryptographic appli-

cation, but for the application which we have in

mind this may be sufﬁcient, as we do not expect

that a proﬁler would be ready to use so much com-

putational resource for proﬁling a single message.

6 FINAL REMARKS

In this paper we did a theoretical analysis of proﬁling

adversaries and ultimately described a protocol which

is secure against proﬁling adversaries. Our protocol

does not require any key exchange or public key in-

frastructure and uses CAPTCHAs and secret sharing

schemes in a novel way.

Encryption may not be the only way to protect a

user from proﬁlers. As proﬁlers can use many dif-

ferent techniques which cannot be stopped using en-

cryption. For example it is possible to track the web

usage of a speciﬁc user and proﬁle him/her on that ba-

sis. Here (probably) encryption has no role to play, or

OnSecuringCommunicationfromProfilers

161

at least cannot be used in the way we propose in our

protocol. Anonymity is probably the correct direction

to explore in solving such problems. Also, as user

proﬁling is a big business, and some think that the

free content in the web is only possible due to online

advertisements, so putting a total end to user proﬁl-

ing may not be desirable. So there have been current

attempts to develop systems which would allow tar-

geted advertisements without compromising user se-

curity (Toubiana et al., 2010). These issues are not

covered in our current work.

ACKNOWLEDGEMENTS

The authors thank Francisco Rodr´ıguez Henr´ıquez for

his comments on an early draft of this paper. Debrup

Chakraborty acknowledge the support from CONA-

CYT project 166763.

REFERENCES

Abdalla, M., Bellare, M., and Rogaway, P. (2001). The or-

acle Difﬁe-Hellman assumptions and an analysis of

DHIES. In Naccache, D., editor, CT-RSA, volume

2020 of Lecture Notes in Computer Science, pages

143–158. Springer.

Canetti, R., Halevi, S., and Steiner, M. (2005). Hardness

ampliﬁcation of weakly veriﬁable puzzles. In Kilian,

J., editor, TCC, volume 3378 of Lecture Notes in Com-

puter Science, pages 17–33. Springer.

Dwork, C. and Naor, M. (1992). Pricing via processing

or combatting junk mail. In Brickell, E. F., editor,

CRYPTO, volume 740 of Lecture Notes in Computer

Science, pages 139–147. Springer.

Dziembowski, S. (2010). How to pair with a human. In

Garay, J. A. and Prisco, R. D., editors, SCN, volume

6280 of Lecture Notes in Computer Science, pages

200–218. Springer.

Golle, P. and Farahat, A. (2004). Defending email commu-

nication against proﬁling attacks. In Atluri, V., Syver-

son, P. F., and di Vimercati, S. D. C., editors, WPES,

pages 39–40. ACM.

Impagliazzo, R., Jaiswal, R., and Kabanets, V. (2009).

Chernoff-type direct product theorems. J. Cryptology,

22(1):75–92.

Impagliazzo, R. and Zuckerman, D. (1989). How to recycle

random bits. In FOCS, pages 248–253. IEEE.

Jutla, C. S. (2010). Almost optimal bounds for direct prod-

uct threshold theorem. In Micciancio, D., editor, TCC,

volume 5978 of Lecture Notes in Computer Science,

pages 37–51. Springer.

Katz, J. and Lindell, Y. (2008). Introduction to Modern

Cryptography. Chapman & Hall/ CRC.

Naor, M. (1997). Veriﬁcation of a human in

the loop or identiﬁcation via the turing test.

http://www.wisdom.weizmann.ac.il/∼naor/PAPERS/

human.pdf.

NYT (2009). Congress begins deep

packet inspection of internet providers.

http://bits.blogs.nytimes.com/2009/04/24/congress-

begins-deep-packet-inspection-of-internet-providers/.

Pinkas, B. and Sander, T. (2002). Securing passwords

against dictionary attacks. In Atluri, V., editor, ACM

Conference on Computer and Communications Secu-

rity, pages 161–170. ACM.

Shamir, A. (1979). How to share a secret. Commun. ACM,

22(11):612–613.

Toubiana, V., Narayanan, A., Boneh, D., Nissenbaum,

H., and Barocas, S. (2010). Privacy preserving tar-

geted advertising. In Proceedings of annual net-

work and distributed systems security symposium.

http://www.isoc.org/isoc/conferences/ndss/10/pdf/05

.pdf.

von Ahn, L., Blum, M., Hopper, N. J., and Langford, J.

(2003). CAPTCHA: Using hard AI problems for se-

curity. In Biham, E., editor, EUROCRYPT, volume

2656 of Lecture Notes in Computer Science, pages

294–311. Springer.

SECRYPT2012-InternationalConferenceonSecurityandCryptography

162