On Securing Communication from Profilers
Sandra D´ıaz-Santiago
and Debrup Chakraborty
Department of Computer Science, CINVESTAV IPN, Av. Instituto Polit´ecnico Nacional No. 2508,
Col. San Pedro Zacatenco, D.F., 07360, Mexico
Keywords:
Data Encryption, Profiling Adversary, User Profiling, CAPTCHA, Secret Sharing.
Abstract:
A profiling adversary is an adversary which aims to classify messages into pre-defined profiles and thus gain
useful information regarding the sender or receiver of such messages. Usual chosen-plaintext secure encryp-
tion schemes are capable of securing information from profilers, but these schemes provide more security than
required for this purpose. In this paper we study the requirements for an encryption algorithm to be secure
only against profilers and nally give a precise notion of security for such schemes. We also present a full
protocol for secure (against profiling adversaries) communication, which neither requires a key exchange nor
a public key infrastructure. Our protocol guarantees security against non-human profilers and is constructed
using CAPTCHAs and secret sharing schemes.
1 INTRODUCTION
Informally a spam email is an email which is not of
interest to the receiver. Everyday almost every one
of us nds hundreds of such spam emails waiting in
our in-boxes. A spammer (who sends spam emails)
generally has a business motiveand most spam emails
try to advertise a product, a web-page or a service. If
the spam emails can be sent in a directed manner, i.e.,
if a spammer can send a specific advertisement to a
user who would be interested in it, then the motive
of the spammer would be successful to a large extent.
Thus, one of the important objectives of a spammer
would be to know the preferences or interests of the
users to whom it is sending the un-solicited messages.
In today’s connected world we do a lot of com-
munication through emails and it is not un-realistic
to assume that a collection of email messages which
originate from a specific user U carries information
about the preferences and interests of U. Based on
this assumption a spammer can collect email infor-
mation originating from different users and based on
these emails try to make a profile of each user (based
on their preferences or interests), and later use this
profile for directed spamming.
Here we assume that given a message space an
adversary aims to map each message in the message
space into certain classes of its interest. Using this c-
Sandra ıaz-Santiago is on academic leave from Es-
cuela Superior de C´omputo (ESCOM-IPN).
assification of messages the adversary can try to con-
clude which user is associated with which class and
this is expected to reveal information regarding the
profile of a given user. Thus, in the scenario of our
interest we consider an adversary that classifies mes-
sages into pre-defined classes. Such an adversary
would be further called as a profiler.
Other than directed spamming, there may be other
motives for user profiling. Currently there has been
a paradigm shift in the way products are advertised
in the internet. In one of the popular new paradigm of
online behavioral advertising (OBA) (Toubiana et al.,
2010), internet advertising companies display adver-
tisements specific to user preferences. This requires
profiling the users. To support this big business of in-
ternet advertising, innovative techniques for user pro-
filing have also developed. It is known that some in-
ternet service providers perform a procedure called
deep packet inspection on all traffic to detect malware
etc., but this technique has been used to generate user
profiles from the information contents of the packets
received or sent by an user, and this information is
later sold to advertising companies (Toubiana et al.,
2010). This currently has led to many policy related
debates, and it has been asked whether such practices
should be legally allowed (NYT, 2009).
In the context of emails, a solution to the prob-
lem of profiling attacks would be encrypting the com-
munications so that the contents of the emails are not
available to the profiler. Or to make the communi-
154
Díaz-Santiago S. and Chakraborty D..
On Securing Communication from Profilers.
DOI: 10.5220/0004054501540162
In Proceedings of the International Conference on Security and Cryptography (SECRYPT-2012), pages 154-162
ISBN: 978-989-8565-24-2
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
cations anonymous so that given a message it would
not be possible for a profiler to trace the origin of the
message. In this paper we ask the following question:
What would be the exact security requirements for an
encryption scheme which can protect the communica-
tion from profilers? Intuitively a cipher obtained from
a secure encryption algorithm should not reveal any
information regarding the plaintext which was used to
produce the cipher. Hence, a secure encryption algo-
rithm should surely resist attacks by profilers. But, as
the goal of a profiler is only to classify the messages,
it is possible that an encryption algorithm which pro-
vides security in a weaker sense would be enough to
resist profilers. We explore in this direction and try to
fix the appropriate security definition of an encryption
scheme which would provide security against profil-
ers.
Using any encryption scheme involves the com-
plicated machinery of key exchange (for symmetric
encryption) or a public key infrastructure (for asym-
metric encryption). When the goal is just to protect
information against profilers the heavy machinery of
key exchange or public key infrastructure may be un-
necessary. Keeping in mind security against profil-
ers we propose a new protocol which does not require
explicit key exchange. To do this we use the notion
of CAPTCHAs, which are programs that can distin-
guish between humans and machines by automated
Turing tests which are easy for humans to pass but
difficult for any machine. The use of CAPTCHAs
makes our protocol secure from non-human profilers,
but the protocol is still vulnerable to human adver-
saries. In the context that we see the activity of pro-
filing, it would be only profitable if a large number of
users can be profiled and this goal seems to be infea-
sible if human profilers are employed for the task.
To our knowledge the only prior work on the is-
sue of securing email communication from profilers
have been reported in (Golle and Farahat, 2004). In
(Golle and Farahat, 2004) it was pointed out that
an encryption scheme secure against profilers can be
much weaker than normal encryption algorithms, and
thus using a normal encryption algorithm can be an
overkill. The solution in (Golle and Farahat, 2004)
hides the semantic of the plaintext by converting an
English text into another English text with the help
of a key. In their protocol also they do not need ex-
plicit key exchange or a public key infrastructure. The
key is derived from the email header by using a hash
function with a specific property. The hash function
they use is a “slow one-way hash function”, which
was first proposed in (Dwork and Naor, 1992). Such
hash functions are difficult to compute, i.e., may take
a few seconds to get computed and are hard to in-
vert. This high computational cost for the hash func-
tion prevents a profiler to derive the key for a large
number of messages. Our method is fundamentally
different from (Golle and Farahat, 2004) in its use of
CAPTCHAs. Slow hash functions which were pro-
posed long ago have not seen much use, and its suit-
ability is not well tested. But CAPTCHAs are ubiqui-
tous in today’s world and had been used successfully
in diverse applications. Also, our work presents a the-
oretical analysis of the problem, and provides the se-
curity definitions which to our knowledge is new to
the literature.
The rest of the paper is organized as follows. In
Section 2 we describe basic concepts related to in-
distinguishability, CAPTCHA and secret sharing. In
Section 3 we present a formal definition of a profil-
ing adversary and security against such adversaries.
In Sections 4 and 5 we describe our protocols and ar-
gue regarding their security in terms of the security
notion given in Section 3. We conclude the paper in
Section 6 where we discuss about the limitations of
our approach and some future directions.
2 PRELIMINARIES
2.1 Notations
The set of all n bit strings would be denoted by
{0,1}
n
. For a string x, |x| will denote the length of
x and for a finite set A, |A| would denote the cardi-
nality of A. For a finite set S , x
$
S will denote x
to be an element selected uniformly at random from
S. In what follows, by an adversary we shall mean a
probabilistic algorithm which outputs an integer or a
bit. A(x,y) b, will denote the fact that an adver-
sary A given inputs x,y outputs b. In general an ad-
versary would have other sorts of interactions, maybe
with other adversaries and/or algorithms before it out-
puts, these would be clear from the context. In what
follows by E : K × M C would denote an encryp-
tion scheme with K , M , C as the key space, mes-
sage space and cipher space respectively. For m M
and k K we shall usually write E
k
(m) instead of
E(k, m).
2.2 Indistinguishability in the Presence
of an Eavesdropper
Security of encryption schemes is best defined in
terms of indistinguishability. Here we consider in-
distinguishability in presence of an eavesdropping ad-
versary. This security notion, which we call as IND-
OnSecuringCommunicationfromProfilers
155
EAV security, is defined with the help of interaction
between two entities called an adversary and a chal-
lenger. It considers that an adversary chooses a pair of
plaintext messages and then ask for the encryption of
those messages to the challenger. The challenger pro-
vides the adversary with the encryption of one of the
messages chosen by the adversary. The adversary is
considered to be successful if it can correctly guess
which message of its choice was encrypted. More
formally, to define the security of an encryption al-
gorithm E : K × M C , we consider the interaction
of an adversary A with a challenger in the experiment
below:
Experiment Exp-IND-EAV
A
1. The challenger selects K uniformly at random
from K .
2. The adversary A selects two messages
m
0
,m
1
M , such that |m
0
| = |m
1
|.
3. The challenger selects a bit b uniformly at
random from {0, 1}, and returns
c E
K
(m
b
) to A.
4. The adversary A outputs a bit b
.
5. If b = b
output 1 else output 0.
Definition 1. Let E : K × M C be an encryption
scheme. The IND-EAV advantage of an adversary A
in breaking E is defined as
Adv
ind-eav
E
(A) = Pr[Exp-IND-EAV
A
1]
1
2
.
Moreover, E is (ε,t) IND-EAV secure if for all adver-
saries A running for time at most t, Adv
ind-eav
E
(A)
ε.
The IND-EAV security as defined above is used
only for one time encryption and it is different from
the most used security notion for symmetric encryp-
tion which is indistinguishability under chosen plain-
text attack (IND-CPA). In an IND-CPA attack the ad-
versary is given access to the encryption oracle and
thus can consult this oracle before it chooses the mes-
sages, and has the option of asking encryption of mul-
tiple pairs of messages before it outputs. IND-EAV
notion is strictly weaker than the IND-CPA notion of
security. All IND-CPA secure encryption schemes are
also IND-EAV secure.
A related notion of security is that of semantic se-
curity. Informally a symmetric encryption scheme is
called semantically secure if an adversary is unable to
compute any function on the plaintext given a cipher-
text.
Definition 2. Let E : K × M C be an encryption
scheme. E is called (ε,t) SEM-EAV secure, if for all
functions f and for all adversaries running for time
at most t
|Pr[A(E
K
(x)) f(x)]
max
A
Pr[A
(.) f(x)]| ε (1)
where the running time of A
is polynomially related
to t, and x is chosen uniformly at random from M .
Note, in the above definition, by A
(.) we mean
that the adversary is given no input, i.e., A
is trying
to predict f(x) without seeing E
K
(x). And in the sec-
ond term of Equation (1) the maximum is taken over
all adversaries A
which runs for time at most poly(t),
for some polynomial poly(). Thus, if E is SEM-EAV
secure then no adversary can do better in predicting
f(x) from E
K
(x) than an adversary who does so with-
out seeing E
K
(x). It is well known that IND-EAV
security implies SEM-EAV security (for example see
Claim 3.11 in (Katz and Lindell, 2008)).
2.3 CAPTCHA
A CAPTCHA is a computer program designed to dif-
ferentiate a human being from a computer. The fun-
damental ideas for such a program were first pro-
posed in an unpublished paper (Naor, 1997) and
then these ideas were formalized in (von Ahn et al.,
2003), where the name CAPTCHA was first pro-
posed. CAPTCHA stands for Completely Automated
Public Turing test to tell Computers and Humans
Apart. In fact, a CAPTCHA is a test which is easy
to pass by a human user but hard to pass by a ma-
chine. One of the most common CAPTCHAs are dis-
torted images of short strings. For a human it is gen-
erally very easy to recover the original string from the
distorted image, but it is difficult for state of the art
character recognition algorithms to recover the orig-
inal string from the distorted image. Other types of
CAPTCHAs which depend on problems of speech
recognition, object detection, classification etc. have
also been developed.
Recently CAPTCHAs have been used in many
different scenarios for identification of humans, like
in chat rooms, online polls etc. Also they can be used
to prevent dictionary attacks on the password based
systems (Pinkas and Sander, 2002), and more recently
for key establishment (Dziembowski, 2010).
A CAPTCHA is a randomized algorithm G, which
given a input string from a set of strings STR pro-
duces the CAPTCHA G(x). A CAPTCHA G is called
(α,β) secure, if for any human or legitimate solver S
Pr[x
$
STR : S(G(x)) x] α,
and for any efficient machine C
Pr[x
$
STR : C(G(x)) x] β,
SECRYPT2012-InternationalConferenceonSecurityandCryptography
156
For a CAPTCHA to be secure it is required that
there is a large gap between α and β. In Section 4,
we will propose an alternative security definition for
CAPTCHAs.
2.4 Secret Sharing Schemes
A secret sharing scheme is a method designed to
share a secret between a group of participants. These
schemes were first proposed by Shamir in 1979
(Shamir, 1979). Although there have been improve-
ments to these kind of schemes, here we will use the
basic construction due to Shamir. In a (u, w) thresh-
old secret sharing scheme a secret K is divided into
w pieces called shares. These w shares are given to
w participants. To recover the secret, at least u w
of the w shares are required. And it is not possible to
recover the secret with less than u shares.
We describe a specific construction proposed by
Shamir. To construct a (u,w) secret sharing scheme
we need a prime p w + 1 and the operations take
place in the field Z
p
. The procedure for splitting a se-
cret K into w parts is depicted in the algorithm below:
SHARE
p
u,w
(K)
1. Choose w distinct, non-zero elements of Z
p
,
denote them as x
i
, 1 i w.
2. Choose u 1 elements of Z
p
independently at
random. Denote them as a
1
,...,a
u1
.
3. Let, a(x) = K +
u1
j=1
a
j
x
j
mod p,
and y
i
= a(x
i
), 1 i w.
4. Output S = {(x
1
,y
1
),...,(x
w
,y
w
)} as the
set of w shares.
The secret K can be easily recovered using any
B S such that |B| u, but if |B| < u then K cannot
be recovered. To see this, observe that the polyno-
mial used in step 3 to compute the y
i
s is a u 1 de-
gree polynomial. Thus using u pairs of the type (x
i
,y
i
)
one can generate u linear equations, each of the type
y
i
= K + a
1
x
i
+ ···a
u1
x
u1
i
. Using these equations
the value of K can be found. It can be shown that this
set of u equations would always have a unique solu-
tion.
3 PROFILING ADVERSARIES
Let M be a message space and P = {1,2,...,k} be a
set of labels for different possible profiles. We assume
that each message x in M can be labeled by a unique
j P . Thus, there exists a function f : M P , which
assigns a label to each message in the message space.
In other words, we can assume that the message space
can be partitioned into disjoint subsets as M = M
1
M
2
··· M
k
and for every x M , f(x) = i if and
only if x M
i
.
We call f as the profiling function or a classifier.
Thus, in this setting we are assuming that each mes-
sage in the message space M represents some profile,
and messages in M
i
(1 i k) correspond to the pro-
file i. The function f is a classifier which given a mes-
sage can classify it into one of the profiles. We also
assume that the function f is efficiently computable
for every x M , in particular, we assume that for any
x M , f(x) can be computed in time at most µ, where
µ is a constant.
The function f is public, thus given x M any ad-
versary can efficiently compute f(x). We want to de-
fine security for an encryption scheme which is secure
against profiling adversaries, i.e., we want that when
a message from M is encrypted using the encryption
algorithm no efficient adversary would be able to pro-
file it.
3.1 PROF-EAV Security
Here we propose a definition for encryption schemes
secure against profiling adversaries.
Definition 3. [PROF-EAV Security]. Let M be a
message space and f : M P be a profiling func-
tion. Let E : M × K C be an encryption algo-
rithm. We define the advantage of an adversary A in
the PROF-EAV (read profiling under eavesdropping)
sense in breaking E as
Adv
prof-eav
E, f
(A) = Pr[A(E
K
(x)) f(x)]
max
A
Pr[A
(.) f(x)], (2)
where K
$
K , x
$
M and A
is an adversary whose
running time is a polynomial of the running time of
A. An encryption algorithm E : M ×K C is called
(ε,t) PROF-EAV secure for a given profiling function
f, if for all adversaries A running in time at most t,
Adv
prof-eav
E, f
(A) ε.
In the definition above, we want to capture the no-
tion that for a PROF-EAV secure encryption scheme,
an adversary A trying to find the profile of a message
seeing its cipher cannot do much better than the best
adversary A
, who tries to guess the profile without
seeing the ciphertext.
This definition is in accordance with the defini-
tion of semantic security as discussed in Section 2.2.
Recall that an encryption scheme is called semanti-
cally secure if no adversary can efficiently compute
any function of the plaintext given its ciphertext. But
OnSecuringCommunicationfromProfilers
157
in the PROF-EAV definition we are interested only
on a specific function f. Thus, PROF-EAV secu-
rity is strictly weaker than semantic security. Seman-
tic security trivially implies PROF-EAV security but
PROF-EAV security does not imply IND-EAV secu-
rity, we give a concrete example to illustrate this.
Example 1. Let M = {0, 1}
n
= M
1
M
2
be a mes-
sage space, where
M
1
= {x M : first bit of x is 0},
and M
2
= M \ M
1
, and f be the profiling function
such that f(x) = i iff x M
i
. Let E
one
be an encryp-
tion scheme which uses a one bit key k (chosen uni-
formly from {0,1}) and given a message x M it xors
k with the first bit of x. It is easy to see that an adver-
sary trying to guess the profile of a message x given
E
one
k
(x) cannot do better than with probability half,
and this success probability can be achieved even
without seeing the ciphertext, as here |M
1
| = |M
2
|.
Hence E
one
is PROF-EAV secure, but trivially not se-
cure in the IND-EAV sense.
4 ENCRYPTION PROTOCOL
SECURE AGAINST PROFILING
ADVERSARIES
In this section we describe a complete protocol which
would be secure against profiling adversaries. As
mentioned in the introduction here we care about ad-
versaries who are not humans. Our motivation is
to prevent communications getting profiled in large
scale mechanically. The protocol is not secure from
human adversaries, and we do not care much about
that as we hope that it would be economically infea-
sible to employ a human for large scale profiling.
The protocol P consists of the following entities:
The message space M , the cipher space C.
The set of profiles P and the profiling function f
associated with M .
A set STR which consists of short strings over a
specified alphabet.
An encryption scheme E : K × M C .
A hash function H : STR K .
A CAPTCHA generator G which takes inputs
from STR.
Given a message x M , P produces a ciphertext as
shown in Figure 1. In the protocol as described in Fig-
ure 1, k, an element of STR is hashed to form the key
K and k is also converted into a CAPTCHA and trans-
mitted along with the ciphertext. The only input to P
Protocol P(x)
1. k
$
STR;
2. k
G(k);
3. K H(k);
4. c E
K
(x);
5. return (c, k
)
Figure 1: The protocol P.
is the message and the key generation is embedded in
the protocol. It resembles the scenario of hybrid en-
cryption (Abdalla et al., 2001), which consists of two
mechanisms called key encapsulation and data encap-
sulation where an encrypted version of the key is also
transmitted along with the cipher. For a human de-
cryption is easy, as given a ciphertext (c,k
) a human
user can recover k from k
by solving the CAPTCHA
and thus compute E
1
H(k)
(c) to decipher.
4.1 Security of P
The security of a protocol P against profilers is de-
fined in the same way as in Definition 3.
Definition 4. The advantage of an adversary attack-
ing protocol P is defined as
Adv
prof
P, f
(A)
= Pr[A(P(x)) f(x)] max
A
Pr[A
(.) f(x)],
where x
$
M and A
is an adversary whose run-
ning time is a polynomial of the running time of A.
Additionally P is called (ε,t) secure in the PROF
sense if for all adversaries running in time at most
t, Adv
prof
P
(A) < ε.
The above definition is different from Definition 3
by the fact that it does not mention the key explicitly,
as key generation is embedded in the protocol itself.
To prove that P is secure in the PROF sense we need
an assumption regarding the CAPTCHA G and the
hash function H. We state this next.
Definition 5. [The Hash-Captcha Assumption].
Let G be a CAPTCHA generator, let r be a number,
let H : STR {0,1}
r
be a hash function, and let A
be an adversary. We define the advantage of A in vi-
olating the Hash-Captcha assumption as
Adv
hc
G,H
(A) = Pr[x
$
STR : A(G(x),H(x)) 1]
Pr[x
$
STR,z
$
{0,1}
r
: A(G(x),z) 1].
Moreover, (G,H) is called (ε,t) HC secure if
for all adversaries A running in time at most t,
Adv
hc
G,H
(A) ε.
SECRYPT2012-InternationalConferenceonSecurityandCryptography
158
This definition says that the pair formed by a
CAPTCHA generator G and a hash function H is se-
cure, if an adversary A is unable to distinguish be-
tween a (G(x),H(x)), where x is some string, and
(G(x),z), where z is a random string. This secu-
rity notion of a CAPTCHA inspired by the notion of
indistinguishability is quite different from the (α,β)
security notion as described in Section 2.3. Here
the adversary has some more information regarding
x through the value H(x). If the adversary can ef-
ficiently solve the CAPTCHA G then it can break
(G,H) in the HC sense irrespective of the hash func-
tion. Given the CAPTCHA is secure, i.e., no efficient
adversary can find x from G(x) still an adversary may
be able to distinguish H(x) from a string randomly
selected from the range of H.
If we consider a keyed family of hash functions
H = {H
}
L
, such that for every L, H
: D R
for some sets D and R . Then H is called a entropy
smoothing family if for any efficient adversary it is
difficult to distinguish between (,H
(x)) and (ℓ, z),
where , x, z are selected uniformly at random from
L, D and R respectively. An entropy smoothing hash
along with a secure captcha can resist HC attacks. En-
tropy smoothing hashes can be constructed from uni-
versal hash functions using the left over hash lemma
(Impagliazzo and Zuckerman, 1989), but the param-
eter sizes which would be required for such provable
guarantees can be prohibitive. We believe that using
ad-hoc cryptographic hashes like the ones from the
SHA family can provide the same security. In our
definition we do not use a keyed family of hash func-
tions, but such a family can be easily used in the pro-
tocol P, and in that case the hash key will also be a
part of the ciphertext.
With these discussions we are now ready to state
the theorem about security of P.
Theorem 1. Let P be a protocol as in Figure 1 and A
is an adversary attacking P in the PROF sense. Then
there exist adversaries B and B
such that
Adv
prof
P, f
(A) Adv
hc
G,H
(B) + Adv
prof-eav
E, f
(B
).
And, if A runs for time t, both B and B
runs for time
O(t).
Proof. Let A be an adversary attacking the protocol
P in Figure 1. We construct an adversary B attacking
the hash-captcha (G,H), using A as follows.
Adversary B(G(k),z)
1. x
$
M ;
2. Send (E
z
(x),G(k)) to A;
3. A returns j;
4. if f(x) = j;
5. return 1;
6. else return 0;
As B is an adversary attacking the hash-captcha
assumption, hence there are two possibilities regard-
ing the input (G(k),z) of B, z can either be H(k) or a
uniform random element in K , and the goal of B is to
distinguish between these two possibilities.
Considering the first possibility that z is H(k), the
way the adversary B is defined, A gets a valid en-
cryption of the message x (which is a random element
in the message space) according to the protocol P.
Hence we have
Pr[k
$
K : B(G(k),H(k)) 1]
= Pr[k
$
K ,x
$
M : A(E
H(k)
(x),G(k)) f(x)]
= Pr[x
$
M : A(P(x)) f(x)]. (3)
Similarly, for the second possibility, i.e., when the in-
put z to B is an element chosen uniformly at random
from K , we have
Pr[k,K
$
K : B(G(k),K) 1]
= Pr[x
$
M : A(E
K
(x),G(k)) f(x)]. (4)
In Equation (4), k and K are chosen independently
uniformly at random from K . Thus, the adversary
A has as input E
K
(x) and G(k), where k is indepen-
dent of K, thus G(k) carries no information about K.
Hence A cannot do better than some PROF-EAV ad-
versary B
who has only E
K
(x) as its input, and runs
for same time as that of A. Thus
Pr[x
$
M : A(E
K
(x),G(k)) f(x)]
Pr[x
$
M : B
(E
K
(x)) f(x)] (5)
From definition of PROF-EAV advantage of B
we
have
Pr[x
$
M : B
(E
K
(x)) = f(x)]
= Adv
prof-eav
E, f
(B
)
+max
A
Pr[A
(.) f(x)] (6)
Thus, using Equations (4), (5) and (6) we have
Pr[k,K
$
K : B(G(k),K) 1]
Adv
prof-eav
E, f
(B
)
+max
A
Pr[A
(.) f(x)] (7)
Finally, from Equations (3) and (7) and Defini-
tions 5 and 4 we have
Adv
prof
P, f
(A) Adv
hc
G,H
(B) + Adv
prof-eav
E, f
(B
),
OnSecuringCommunicationfromProfilers
159
as desired. Also if A runs for time t, then B
runs
for time t and B runs for time t + c for some small
constant c.
Some Remarks about Security of P: We defined the
security of the protocol P for only a fixed profiling
function f, but note that we can modify the definition
for any arbitrary function f which would give us a
security definition equivalentto SEM-EAV (discussed
in Section 2.2). If the encryption algorithm E used
within the protocol is SEM-EAV secure then using the
same proof we can obtain SEM-EAV security for P.
5 A PRACTICAL
INSTANTIATION
A very common problem using CAPTCHAs is that
sometimes even humans may fail to solve them. As
in the protocol P if a human user fails to solve the
CAPTCHA then (s)he will not be able to decipher
and there is no way to repeat the test (as is done in
normal CAPTCHA usage), hence this stands as a se-
rious weakness of the proposed protocolP. A solution
to this problem can be attempted by providing some
redundancy in the CAPTCHAs so that a valid user
can have more chance in solving the CAPTCHA. As
a solution we propose that the initial string k chosen
by the protocol is broken into w shares such that with
u or more of the shares would be enough to gener-
ate k. These w shares are converted into CAPTCHAs
and sent along with the ciphertext. To incorporate this
idea we changed the initial protocol P to P
. The pro-
tocol P
is a specific instantiation, thus before we de-
scribe the protocol we fix some details of its compo-
nents, in particular for P
we would require an encod-
ing mechanism ENCD which we discuss first.
Let AL = {A, B,...,Z} {a,b,...,z}
{0,1,...,9} {+, /}, thus making |AL| = 64.
We define an arbitrary (but fixed) bijection
ρ : AL {0,1, . ..,63}, and for any σ AL and
n 6, bin
n
(σ) will denote the n bit binary represen-
tation of ρ(σ). Note that for all σ AL, at most 6
bits are required to represent ρ(σ). If ψ is a binary
string, then let toInt(ψ) be the positive integer corre-
sponding to ψ, similarly for a positive integer v < 2
n
,
toBin
n
(v) denotes the n bit binary representation of
v. We fix a positive integer m and let STR be the
set of all m character strings over the alphabet AL.
Let p be the smallest prime greater than 2
6m
and let
d = p 2
6m
. Let ENCD : STR × {0, 1, ..., d} Z
p
be defined as follows
ENCD(s,λ)
1. Parse s as σ
0
||σ
1
||...||σ
m
, where each σ
i
AL;
Protocol P
(x)
1. k
$
STR;
2. k
ENCD(k,0);
3. {(x
1
,k
1
),...,(x
w
,k
w
)} SHARE
p
u,w
(k
);
4. for i = 1 to w;
5. (k
i
,λ
i
) ENCD
1
(k
i
);
6. c
i
G(k
i
);
7. end for
8. K H(k);
9. C E
K
(x);
10.return [C,{(x
1
,c
1
,λ
1
),...,(x
w
,c
w
,λ
w
)}]
Figure 2: The protocol P
which uses a secret-sharing
scheme.
2. ψ bin
6
(σ
0
)||...||bin
6
(σ
m
);
3. v toInt(ψ);
4. return v + λ;
And let ENCD
1
: Z
p
STR × {0,1,...,d} be
defined as
ENCD
1
(y)
1. if y 2
6m
,
2. λ y 2
6m
+ 1;
3. y 2
6m
1;
4. else λ 0;
5. z toBin
6m
(y);
6. Parse z as z
0
||z
1
||...||z
m
, where |z
i
| = 6;
7. s ρ
1
(toInt(z
0
))||...||ρ
1
(toInt(z
m
));
8. return (s, λ);
The modified protocol P
is shown in Figure 2. It
uses the encoding function ENCD and the secret shar-
ing scheme as depicted in Section 2.4. For P
we as-
sume that STR contains all m character strings over
the alphabet AL, and p is the smallest prime greater
than 2
6m
, these can be considered the fixed and public
parameters for P
. The encoding mechanism is specif-
ically designed to convert a string in STR to an el-
ement in Z
p
so that Shamir’s secret sharing can be
suitably used.
To decrypt a cipher produced by P
a human user
must solve at least some u of w CAPTCHAs. Us-
ing these u solutions together with x
i
, k can be recov-
ered. A specific recommendation for SHARE can be
Shamir (2,5)-threshold scheme. Thus the user would
have much flexibility on solving the CAPTCHAs.
5.1 Security of P
The security of P
can be easily proved in the sense of
Definition 4 in a similar way as we proveTheorem 1 if
SECRYPT2012-InternationalConferenceonSecurityandCryptography
160
we make a new assumption regarding the CAPTCHA
as follows :
Definition 6. [The Hash-MultiCaptcha Assump-
tion]. Let G be a CAPTCHA generator, let r be a num-
ber, let H : STR {0,1}
r
be a hash function, and let
A be an adversary. Also, let x = g(x
1
,...x
w
) be such
that if at least u out of w of x
1
,...,x
w
are known then
x can be recovered. We define the advantage of A in
violating the Hash-MultiCaptcha assumption as
Adv
hmc
G,H
(A)
= Pr[A(G(x
1
),...G(x
w
),H(x)) 1]
Pr[z
$
{0,1}
r
: A(G(x
1
),...G(x
w
),z) 1].
where x
$
STR. Moreover, (G,H) is called (ε,t)
HMC secure if for all adversaries A running in time
at most t, Adv
hmc
G,H
(A) ε.
As in the definition of Hash-Captcha assumption,
in this definition if the adversary can efficiently solve
at least u of w CAPTCHAs, then it can break (G, H)
in the HMC sense irrespective of the hash function. If
this assumption is true, then we can show the security
of protocol P
just as we did for protocol P.
A CAPTCHA is an example of a weakly-
verifiable puzzle (Canetti et al., 2005), since a legit-
imate solver S may not be able to verify the correct-
ness of its answer. For this kind of puzzles, it has been
proved (Impagliazzo et al., 2009) that if it is difficult
for an attacker to solve a weakly-verifiable puzzle P,
then trying to solve multiple instances of a puzzle in
parallel is harder. Most recently, Jutla found a bet-
ter bound to show how hard it is for an attacker to
solve multiple instances of weakly-verifiable puzzles
(Jutla, 2010). The next theorem is based on the main
theorem proposed by Jutla, but it has been adapted to
CAPTCHAs, which are of our interest in this work.
Theorem 2. Let G be a CAPTCHA generator which
is (α,β) secure. Let k N, δ = 1 β and γ (0 < γ <
1) be arbitrary. Let A be an arbitrary polynomial
time adversary, which is given as input k CAPTCHAs
(G(x
1
),...,G(x
k
)) and outputs a set X of solutions of
the k CAPTCHAs. If InCorr(X) denotes the number
of incorrect solutions in X, then
Pr[InCorr(X) < (1 γ)δk] < e
(1γ)γ
2
δk/2
This theorem establishes that for any adversary if
the probability of failure in solving a CAPTCHA is
at least δ, then the probability of failing on less than
(1 γ)δk out of k puzzles, is at most e
(1γ)γ
2
δk/2
.
Based on this fact, it may be possible to show that
for any arbitrary adversary A attacking the HMC as-
sumption, there exists a HC adversary B such that
Adv
hmc
G,H
(A) < Adv
hc
G,H
(B). This would imply that
the HC assumption implies the HMC assumption.
But, for now we are not sure whether such a result
holds.
5.2 Discussions
About the Encryption Scheme: In this work
we have not said anything about the encryption
scheme to be used in the protocol. We only
said that we require our encryption scheme to be
PROF-EAV secure and any IND-EAV secure en-
cryption scheme can provide such security. Thus
most symmetric encryption schemes which are
usually in use like CBC mode, counter mode etc.
( which provide security in the IND-CPA sense)
can be used for the encryption function E in P
.
A more efficient scheme which provides security
only in the PROF-EAV sense would be much in-
teresting, we would like to explore in this direc-
tion.
Key Sizes: Another important thing to consider is
that the effective size of a key for the protocol is
dictated by the parameter m, i.e., the size of each
string in STR. This value cannot be made arbitrar-
ily large as solving big CAPTCHAs for human be-
ings may be tiresome, a usual CAPTCHA length
is ve to eight characters. If we use eight char-
acter strings from the alphabet AL then the effec-
tive size of the key space would be 2
48
. Increas-
ing the alphabet size is also not feasible as we
need un-ambiguous printable characters to make
CAPTCHAs. Thus, the key space is not suffi-
ciently large for a modern cryptographic appli-
cation, but for the application which we have in
mind this may be sufficient, as we do not expect
that a profiler would be ready to use so much com-
putational resource for profiling a single message.
6 FINAL REMARKS
In this paper we did a theoretical analysis of profiling
adversaries and ultimately described a protocol which
is secure against profiling adversaries. Our protocol
does not require any key exchange or public key in-
frastructure and uses CAPTCHAs and secret sharing
schemes in a novel way.
Encryption may not be the only way to protect a
user from profilers. As profilers can use many dif-
ferent techniques which cannot be stopped using en-
cryption. For example it is possible to track the web
usage of a specific user and profile him/her on that ba-
sis. Here (probably) encryption has no role to play, or
OnSecuringCommunicationfromProfilers
161
at least cannot be used in the way we propose in our
protocol. Anonymity is probably the correct direction
to explore in solving such problems. Also, as user
profiling is a big business, and some think that the
free content in the web is only possible due to online
advertisements, so putting a total end to user profil-
ing may not be desirable. So there have been current
attempts to develop systems which would allow tar-
geted advertisements without compromising user se-
curity (Toubiana et al., 2010). These issues are not
covered in our current work.
ACKNOWLEDGEMENTS
The authors thank Francisco Rodr´ıguez Henr´ıquez for
his comments on an early draft of this paper. Debrup
Chakraborty acknowledge the support from CONA-
CYT project 166763.
REFERENCES
Abdalla, M., Bellare, M., and Rogaway, P. (2001). The or-
acle Diffie-Hellman assumptions and an analysis of
DHIES. In Naccache, D., editor, CT-RSA, volume
2020 of Lecture Notes in Computer Science, pages
143–158. Springer.
Canetti, R., Halevi, S., and Steiner, M. (2005). Hardness
amplification of weakly verifiable puzzles. In Kilian,
J., editor, TCC, volume 3378 of Lecture Notes in Com-
puter Science, pages 17–33. Springer.
Dwork, C. and Naor, M. (1992). Pricing via processing
or combatting junk mail. In Brickell, E. F., editor,
CRYPTO, volume 740 of Lecture Notes in Computer
Science, pages 139–147. Springer.
Dziembowski, S. (2010). How to pair with a human. In
Garay, J. A. and Prisco, R. D., editors, SCN, volume
6280 of Lecture Notes in Computer Science, pages
200–218. Springer.
Golle, P. and Farahat, A. (2004). Defending email commu-
nication against profiling attacks. In Atluri, V., Syver-
son, P. F., and di Vimercati, S. D. C., editors, WPES,
pages 39–40. ACM.
Impagliazzo, R., Jaiswal, R., and Kabanets, V. (2009).
Chernoff-type direct product theorems. J. Cryptology,
22(1):75–92.
Impagliazzo, R. and Zuckerman, D. (1989). How to recycle
random bits. In FOCS, pages 248–253. IEEE.
Jutla, C. S. (2010). Almost optimal bounds for direct prod-
uct threshold theorem. In Micciancio, D., editor, TCC,
volume 5978 of Lecture Notes in Computer Science,
pages 37–51. Springer.
Katz, J. and Lindell, Y. (2008). Introduction to Modern
Cryptography. Chapman & Hall/ CRC.
Naor, M. (1997). Verification of a human in
the loop or identification via the turing test.
http://www.wisdom.weizmann.ac.il/naor/PAPERS/
human.pdf.
NYT (2009). Congress begins deep
packet inspection of internet providers.
http://bits.blogs.nytimes.com/2009/04/24/congress-
begins-deep-packet-inspection-of-internet-providers/.
Pinkas, B. and Sander, T. (2002). Securing passwords
against dictionary attacks. In Atluri, V., editor, ACM
Conference on Computer and Communications Secu-
rity, pages 161–170. ACM.
Shamir, A. (1979). How to share a secret. Commun. ACM,
22(11):612–613.
Toubiana, V., Narayanan, A., Boneh, D., Nissenbaum,
H., and Barocas, S. (2010). Privacy preserving tar-
geted advertising. In Proceedings of annual net-
work and distributed systems security symposium.
http://www.isoc.org/isoc/conferences/ndss/10/pdf/05
.pdf.
von Ahn, L., Blum, M., Hopper, N. J., and Langford, J.
(2003). CAPTCHA: Using hard AI problems for se-
curity. In Biham, E., editor, EUROCRYPT, volume
2656 of Lecture Notes in Computer Science, pages
294–311. Springer.
SECRYPT2012-InternationalConferenceonSecurityandCryptography
162