Symmetric Searchable Encryption

for Exact Pattern Matching

using Directed Acyclic Word Graphs

∗

Rolf Haynberg

, Jochen Rill

, Dirk Achenbach

and J¨orn M¨uller-Quade

1&1 Internet AG, Karlsruhe, Germany

Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

Keywords:

Searchable Encryption, SSE, Exact Pattern Matching, Directed Acyclic Word Graphs.

Abstract:

Searchable Encryption schemes allow searching within encrypted data without prior decryption. Various

index-based schemes have been proposed in the past, which are only adequate for certain use cases. There is a

lack of schemes with exact pattern matching capabilities. We introduce Symmetric Searchable Encryption for

Exact Pattern Matching, a new class of searchable encryption schemes. To this end, we deﬁne the XPM-SSE

primitive and two privacy notions for the new primitive. Our own construction, SEDAWG, is a XPM-SSE

scheme which uses Directed Acyclic Word Graphs. We discuss and prove its properties.

1 INTRODUCTION

Cloud computing is one of the most promising trends

in the IT industry. In terms of data security however,

cloud computing brings a new threat: Users lose con-

trol over their data. Cloud providers can access their

customer’s data at will.

To challenge this drawback technologically, one

needs to seek methods that enhance the privacy of the

user’s data without negating the advantages of cloud

computing. Notably, computational and storage over-

head should be handled in the cloud, not on the client.

Consider for example the scenario of an out-

sourced e-mail archive: Users wish to employ a cloud

provider to store their e-mails in order to be able to ac-

cess them with a mobile device and without letting the

provider gain knowledge of the archive’s content. To

conserve bandwidth, they want to perform searches

on their archived e-mail online instead of download-

ing each and every message individually and search-

ing locally.

In this paper, we address the problem of secure

exact pattern matching: A user encrypts a long string

he later wishes to query for the occurrence of certain

patterns. This encrypted string is then uploaded to a

∗

This work has been partially funded by the Federal

Ministry of Education and Research, Germany (BMBF,

Contract No. 16BY1172). The responsibility for the con-

tent of this article lies solely with the authors.

server and to perform a search, the user can interact

with the server. The server should never learn neither

the string itself, nor the patterns searched for.

1.1 Our Contribution

We introduce a primitive for a symmetric searchable

encryption for exact pattern matching and two secu-

rity notions for the primitive. To our knowledge, this

is the ﬁrst such primitive. Further, we offer a con-

struction that realizes this primitive. Our construction

involves precomputation in order to ensure a particu-

larly efﬁcient search performance.

1.2 Related Work

In the literature, there are two general approaches

to searchable encrypted data: Symmetric Searchable

Encryption (SSE) (Goh, 2003; Curtmola et al., 2006;

Golle et al., 2004) and Public Key Encryption with

Keyword Search (PEKS) (Chang and Mitzenmacher,

2005; Abdalla et al., 2008). In almost all cases

these approaches are keyword-based. Keyword-based

schemes generally allow arbitrary keywords to be

stored in an index and not only keywords contained

in the actual document. Furthermore, they allow in-

dices to be created for arbitrary documents, not just

strings. On the other hand, keyword-based schemes

don’t support substring search, exact pattern match-

403

Haynberg R., Rill J., Achenbach D. and Müller-Quade J..

Symmetric Searchable Encryption for Exact Pattern Matching using Directed Acyclic Word Graphs.

DOI: 10.5220/0004530004030410

In Proceedings of the 10th International Conference on Security and Cryptography (SECRYPT-2013), pages 403-410

ISBN: 978-989-8565-73-0

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

ing or return the number of occurrencesof a substring.

Also, keywords must be deﬁned when constructing

the index, whereas our approach only requires the pat-

tern at search time. The work of D. Xiaodong Song

et al. (Song et al., 2000) presents a keywordless ap-

proach based on stream ciphers and supports pattern

matching. However, the time required for performing

a search scales linearly with the size of the document.

There is a rich body of literature on exact pattern

matching algorithms, going back to the late seven-

ties. The general method for improving performance

is precomputation. Approaches can be separated by

where the precomputation occurs: There are algo-

rithms that perform precomputation on the search pat-

tern (Boyer and Moore, 1977; Knuth et al., 1977;

Karp and Rabin, 1987; Baeza-Yates and Gonnet,

1992) as well as algorithms that perform precompu-

tation on the string (Manber and Myers, 1990; Ukko-

nen, 1995; Blumer et al., 1985; Blumer et al., 1987;

Crochemore and V´erin, 1997). Our scheme can be

assigned to the latter. The aforementioned algorithms

have been engineered for performance and efﬁciency

alone and were not conceived in an adversarial sce-

nario, however. Our scenario involves an honest-but-

curious adversary and we therefore seek to hide as

much information as possible from him. Our goal of

hiding access patterns is very similar to that of Private

Information Retrieval (Chor et al., 1998; Kushilevitz

and Ostrovsky, 1997; Di Crescenzo et al., 2000) and

that of the work on Oblivious RAMs (Goldreich and

Ostrovsky, 1996). In fact, we can use a PIR construc-

tion in our scheme to improve its privacy. Moreover,

an ideal solution to our problem statement can be used

to construct a PIR scheme.

This paper is organized as follows: In the remain-

der of this section, we deﬁne a new primitive XPM-

SSE. We then deﬁne the optional property of pat-

tern privacy and show that Private Information Re-

trieval can be reduced to a XPM-SSE scheme with

pattern privacy. In Section 3, we present our con-

struction SEDAWG (Searchable Encryption using Di-

rected Acyclic Word Graphs) and discuss its proper-

ties. Section 4 concludes.

2 DEFINITION OF XPM-SSE

In this section we deﬁne a searchable encryption

scheme which allows to perform a full text search

within encrypted data.

Deﬁnition 1 (Exact Pattern Matching

XPM

). An al-

gorithm A is a technique for exact pattern matching

(

XPM) over the alphabet

if it returns upon input

S ∈ Σ

∗

and

m ∈ Σ

∗

the exact number of occurrences

Based on the exact pattern matching algorithm,

we are deﬁning Symmetric Searchable Encryption for

Exact Pattern Matching (XPM-SSE). Note that pre-

existing notions for encryption schemes don’t sufﬁce

in this scenario: Assume a protocol in which, to per-

form a query, the client uploads the decryption key

to the server and lets the server decrypt the ciphertext

and return the result. Such a protocol does not exhibit

data privacy and therefore should not be considered a

XPM-SSE scheme. Also, it is easy to provide a secu-

rity notion that can only be met with inefﬁcient solu-

tions, simply by storing an encrypted ﬁle on the server

which is downloaded and decrypted prior to search-

ing. Hence, any meaningful notion must also account

for the protocol messages that are exchanged in the

evaluation of a query.

Deﬁnition 2 (XPM-SSE Scheme). Let S ∈ Σ

be an

arbitrary but ﬁnite string over the encoding Σ. A tuple

((Gen, Enc, Dec), I = (S , C )) is a Symmetric Search-

able Encryption scheme for Exact Pattern Matching

(XPM-SSE scheme), if

• Gen : 1

→ {0, 1}

is a PPT algorithm which,

given a key K, the plaintext S and a security pa-

rameter k, generates a key K ← Gen(1

• Enc : {0, 1}

× Σ

∗

→ {0, 1}

∗

is a PPT algorithm

which generates a ciphertext D ← Enc

(S).

• Dec : {0, 1}

∗

× {0, 1}

→ Σ

∗

is a polynomially

bounded algorithm which outputs the plaintext

S ← Dec

(Enc

(S)) given a key K and the cipher

text Enc

(S).

• I = (S , C ) is a protocol for a pair of machines

(server and client) which perform an algorithm

for Exact Pattern Matching. The server S is pro-

vided with an encryption of the string Enc

(S),

the client C is supplied with the search pattern w,

the encryption key K and outputs the exact num-

ber of occurrences of w in S. With view

(S, w) we

denote all state transitions of the server S and all

its received received and sent messages.

• It has data privacy, that is, the advantage of the

adversary in PrivK

cppa

A,Enc

(k) (Security Game 1) is

negligible, that is

∀PPTA , c ∈ N∃k ∈ N :

Pr[PrivK

cppa

A,Enc

(k) = 1] ≤

+ k

−c

The notion of data privacy captures the goal of hiding

the string itself from the adversary.

Note that by this deﬁnition, any XPM-SSE

scheme also has result privacy: If an adversary could

learn the result of a query she supplies, she can use

SECRYPT2013-InternationalConferenceonSecurityandCryptography

404

that information to distinguish two encryptions of

plain texts which does not fulﬁll the above deﬁnition.

Next, we describe Security Game 1 which is used

in Deﬁnition 2.

Security Game 1 (PrivK

cppa

A,Enc

(k)).

1. The experiment chooses a key K ← Gen(1

) and a

random bit b ← {0, 1}.

2. The adversary A is given input 1

, oracle access

to Enc

(·) and to a view oracle T

(·, ·). T

(S, x)

returns the server’s view to any query x, given a

plaintext S: view

(S, x).

3. A outputs two plaintexts m

and m

of the same

length to the experiment.

4. A is given Enc

5. A outputs a number of queries x

, . . . , x

and an

integer i ∈ {0, . . . , q} to the experiment.

6. The queries x

, ··· , x

are evaluated in that order.

7. A is given the view on the challenge ciphertext to

query i: view

, x

8. A continues to have access to Enc

(·) and

(·, ·).

9. A submits a guess b

′

for b.

The result of the experiment is deﬁned to be 1 if b

′

= b

and 0 else.

The complementary property to data privacy is

that of pattern privacy: A scheme that has pattern

privacy hides the contents of queries from the server.

We deﬁne pattern privacy as (computational) indistin-

guishability of transcripts.

Deﬁnition 3 (Pattern Privacy). A scheme

((Gen, Enc, Dec), (S , C )) has pattern privacy if

for all c ∈ N, PPT algorithms A, search patterns

x, x

′

∈ Σ

∗

with |x| = |x

′

| and S ∈ Σ

∗

there is a k ∈ N,

so that

|Pr[A(1

, view

(S, x) → 1]−

Pr[A(1

, view

(S, x

′

) → 1]| ≤ k

−c

If a XPM-SSE scheme has pattern privacy (Deﬁ-

nition 3), we call the scheme XPM-SSE with pattern

privacy. In the remainder of this section, we show

that Private Information Retrieval can be reduced to a

XPM-SSE scheme with pattern privacy.

First we give a deﬁnition of a Private Information

Retrieval scheme here. The deﬁnition follows that of

Kenan (Kenan, 2005) closely, but uses our notation.

Informally, a PIR scheme is a scheme that allows a

client to retrieve bits from a remote database in such a

way that the database doesn’t learn which bit has been

retrieved.

Deﬁnition 4 (PIR). Let D ∈ {0, 1}

be a database of

n bits. The (i + 1)th bit shall be denoted as D[i]. A

scheme for Computational Single-Server Private In-

formation Retrieval (PIR) is a pair of machines (S , C )

for which the following properties hold:

1. Correctness: ∀x ∈ {0, . . . , n− 1}, D ∈ {0, 1}

, c ∈

N∃k ∈ N :

Pr[C (1

, view(D, x)) = D[x]] ≥ 1− k

2. User Privacy: ∀x, x

′

∈ N

, D ∈ {0, 1}

, PPTA, c ∈

N∃k ∈ N :

|Pr[A(1

, view(D, x)) = 1]−

Pr[A(1

, view(D, x

′

)) = 1]| ≤ k

−c

We can now state our claim.

Proposition 1. Computational Single-Server Private

Information Retrieval can be reduced to XPM-SSE

with pattern privacy.

Proof. To prove the proposition, we give a construc-

tion using a XPM-SSE scheme and show that it

achieves PIR. We show how to construct a string from

a binary database with n entries. We then explain how

to perform a database query and how to interpret the

result.

Table 1: Example database with the corresponding string to

be used for a database retrieval with a XPM-SSE scheme.

i 0 1 2 3 4 5

D[i] 1 0 1 0 1 0

⇓

!000$!010$ !100$

The alphabet for the XPM-SSE construction is

Σ = {0, 1, !, $}. Construct S as follows: Start with

an empty string S. Iterate i through {0, . . . , n − 1},

and, if D[i] = 1, append the binary encoding of i (with

leading zeroes to hide the size of i), enclosed in the

delimiters ! and $, to S. Output S. (See Table 1 for an

example.)

To retrieve bit i from S (stored on the server), run

the search protocol for the binary encoding of i, en-

closed in the delimiters ! and $. If the search is a

success, the retrieved bit is 1 and 0 otherwise.

By construction, the binary encoding of i is ∈ S

iff D[i] = 1. It follows that the above construction

delivers correct results.

In the above construction, the bit ID i that is to be

retrieved is transmitted in the form of a search query.

This information is, kept secret from the adversary be-

cause of the required pattern privacy. If the abovecon-

struction didn’t provide Private Information Retrieval,

SymmetricSearchableEncryptionforExactPatternMatchingusingDirectedAcyclicWordGraphs

405

it would leak information about i to the adversary and

the XPM-SSE scheme wouldn’t have pattern privacy

which contradicts the proposition.

3 SEDAWG: A XPM-SSE

CONSTRUCTION

Our Searchable Encryption Scheme is based on the

idea to store the encrypted data on the server and

to perform the searching on the client. The encryp-

tion algorithm uses Directed Acyclic Word Graphs

(DAWGs) to prepare the encrypted data for searching.

They are described in Section 3.1. We then present

our XPM-SSE scheme in Section 3.2. In Section 3.3,

we discuss its performance and proof its security.

3.0.1 Notation and Conventions

If not mentioned otherwise, S will be an arbitrary

string of length n over the encoding Σ, that is S ∈ Σ

∗

The length of a string S will be denoted as |S|. ε is the

empty string and has a length of 0. S[i] denotes the

(i+ 1)th character of S for i ∈ N

with i < |S|. S[i.. j]

denotes the string S[i] · . . . · S[ j] for 0 ≤ i ≤ j < n. If

i = j let S[i.. j] = S[i]. S[i.. j] is called a substring. The

set of all substrings of a string S is written as

substrings

= {S[i.. j] | 0 ≤ i ≤ j < n} ∪ {ε}

3.1 The Underlying Data Structure

DAWG

The Directed Acyclic Word Graph (DAWG) is a data

structure derived from a string. It is similar to the

Sufﬁx Tree data structure and reduces the time com-

plexity for several string analysis algorithms, such as

pattern matching, by making use of pre-computation.

DAWGs were ﬁrst introduced by Blumer et al. in

(Blumer et al., 1985). Our deﬁnition follows theirs.

Deﬁnition 5 (Endpoint Set). Let x be a substring of a

string S. Then

(x) := { j | ∃i : x = S[i.. j]}

is called the Endpoint Set for x with respect to S, i. e.

the endpoints of all occurrences of x in S.

The substrings that specify the same Endpoint Set

are important for the data structure. An equivalence

class encompasses them.

Deﬁnition 6 (≡

, [x]

). Two substrings x and y of

a string S are equivalent with respect to ≡

, if they

specify the same Endpoint Set:

x ≡

y ⇔ E

(x) = E

(y)

The equivalence class of x with respect to ≡

is de-

noted as [x]

. The representative of an equivalence

class is deﬁned as the longest substring of that class

and is denoted as

←−

x .

Deﬁnition 7 (DAWG(S)). The Directed Acyclic

Word Graph of a preﬁx-free string S, in symbols

DAWG(S) is the directed graph (V, E) with

V ={[x]

| x ∈

substrings

}

E ={([x]

, [xa]

) | a ∈ Σ,

x, xa ∈

substrings

←−

x 6=

←−

xa}

The equivalence class of the empty string [ε]

plays a special role. The corresponding node will be

called the root node in the following chapters as it

does not have any incoming edges.

Moreover, a label is associated with every edge of

the graph.

Deﬁnition 8 (Edge Label

edgeLabel

(e)). Let

([x]

, [xa]

) = e be an edge of the DAWG(S) data

structure. Then

edgeLabel

(e) = a is called the edge

label of e.

The edge labels are deﬁned in such a way that

every path in the graph DAWG(S) corresponds to a

substring of S. The path that corresponds to S it-

self is important for the decryption algorithm of our

scheme. We will refer to the edges of this path as nat-

ural edges.

Deﬁnition 9 (Natural Edge). Let ([x]

, [xa]

) = e be

an edge of the DAWG(S) data structure. If there exists

a string w in [xa]

such that w is a preﬁx of S, e is

called a natural edge for this graph.

3.2 Pattern Matching using DAWGs

We now sketch how we utilize DAWG(S) to decide

whether w ∈

substrings

with O(|w|) character com-

parisons. The algorithm is based on a central property

of the DAWG: w ∈

substrings

if and only if there

is a path [ε]

, v

, . . . , v

|w|

with

edgeLabel

, v

) · . . . ·

edgeLabel

|w|−1

, v

|w|

) = w. Since, for any pattern w,

this path starts at the root node, we can tell if it exists

by matching the edge labels of the path along the pat-

tern: We traverse the DAWG by comparing the labels

on the outgoing edges of a node to the corresponding

character in w. The edge that has a matching label

leads to the next node on the path. If all characters of

the pattern have been matched, w ∈

substrings

and

the search ends. Otherwise, w does not occur in S.

In our XPM-SSE construction, we store the graph

using adjacency lists. Also, with every node v we

SECRYPT2013-InternationalConferenceonSecurityandCryptography

406

NS#0

NS#2 NS#0 NS#3 NS#2 NS#1

“ε”

“a”

“an”

“ana”

“anan”

“anana”

“ananas”

a s

Figure 1: The DAWG for the input string S=

ananas

. It consists of seven nodes (v

to v

) which are mapped to four node sets

(NS#0 to NS#3). The node representative is written underneath each node. The transition edges are drawn as directed arrows

which are labeled with their respective edge labels. Every outgoing path from v

is a subword of S.

Figure 2: General communication pattern for searching a string w in S. We assume that Enc(DAWG (S)) is available on the

server. The case of w 6∈

substrings

is not covered here. Also note that for privacy reasons, we explicitly do not employ

caching mechanisms.

store a property v.numOccurs. It represents the num-

ber of occurences of the representative of v in S and

allows to determine the number of occurences of the

search pattern when reading v.

To store the DAWG, we randomly distribute all

of its nodes into a ﬁxed number of disjoint node sets

which depends on the size of the text. Then, we aug-

ment every edge in the DAWG with a reference to

the node set that contains the target node. To ensure

the indistinguishability of the encryptions of different

strings of the same size, we pad the node sets indi-

vidually to their maximum size. Finally, we create

ﬁles from the node sets and encrypt them individu-

ally using the symmetric encryptionscheme (E, D, G).

See Figure 1 for a visualization of how the DAWG is

stored.

For a schematic description of the search proce-

dure with pseudocode, see Figure 2. The detailed

descriptions of the algorithms for encryption and de-

cryption can be found in Algorithms 1 and 2, respec-

tively.

3.3 Properties of SEDAWG

In this subsection, we discuss the security and perfor-

mance properties of our scheme.

3.3.1 Performance and Space Complexity

The space complexity of the output of Enc

(S),

i. e. the ciphertext size, is in O(|S|) because the size

of DAWG(S) lies in O(|S|) and the encryption algo-

SymmetricSearchableEncryptionforExactPatternMatchingusingDirectedAcyclicWordGraphs

407

Algorithm 1: Enc

(S).

Input: String S with encoding Σ, Key K

Output: A set B of ﬁles encrypted under K

Data: Set of Nodes V, Set of Edges E, Set of Node Sets

N, s

⋆

(V, E) = DAWG(S)

// Step 1

position

(◦) ← 0

nodeset

(◦) ← 0

getNodeSetById(

)

← {◦}

// Add root node to

node set 0

foreach v ∈ V do

// Step 2

ns ←

{ns ∈ N | s

⋆

≥

size

(ns∪ v)}

position

(v) ←

size

(ns)

nodeset

(v) ←

(ns)

ns ← ns∪ v

end

foreach ns ∈ N do

// Step 3

foreach v ∈ ns do

foreach (v, w) ∈

edges

(v) do

v[w].targetNodePosition ←

position

(w)

v[w].targetNodeSetId ←

nodeset

(w)

end

foreach ns ∈ N do

// Step 4

b ← ””

foreach v ∈ ns do

foreach (v, w) ∈

edges

(v) do

b ← concatenate(b,

v[w].isNaturalEdge,

v[w].edgeLabel,

v[w].targetNodePosition,

v[w].targetNodeSetId)

end

b.append(#)

end

b ← pad (b, s

⋆

)

b ← E

(b)

B ← B∪ {b}

end

return B

rithm Enc only adds information which size is linear

in S (see Algorithm 1). Also the time complexity,

i. e. the execution time of Enc

(S), is in O(|S|) for

similar reasons. The communication complexity of

the search protocol I

(Enc

(S), w) is asymptotically

optimal and depends only linearly on |w|.

We performed preliminary benchmarks with an

unoptimized implementation of our scheme on Ama-

zon’s S3 cloud storage. Results show an advantage

of our scheme regarding search times in comparison

to a trivial approach (which is: downloading the full

encrypted text, decrypting it and searching with the

Boyer-Moore algorithm (Boyer and Moore, 1977)) if

the bandwidth is limited. In our tests we used a 3G

mobile network connection and the search was up to

10-15 times faster for short words (4 to 8 characters)

Algorithm 2: Dec

(B).

Input: Set of encrypted ﬁles B, Key K

Output: Plaintext S

Data: Set of Nodes V, Set of Edges E, Set of Node Sets

foreach b ∈ B do

// Decrypt all files

N ← N ∪ D

(b)

end

nsid ← 0

npos ← 0

while v ← D

(

getFileFromServer(

nsid

)

)[npos] do

// For node ...

foreach AdjacencyListEntry e ∈ v do

// ...find

the natural edge

if e.isNaturalEdge then

nsid ← e.targetNodeSetId

npos ← e.targetNodePosition

append(

e.edgeLabel

)

break

end

return S

and 2-4 times for long words (32 to 64 characters) us-

ing an email archive with an unencrypted size of 60

MiB. The precomputational overhead is unpractical

for many use cases however, since the encrypted text

was about 100 times larger than the original text and

the memory demand for the precomputation grew lin-

erally with the size of the input text with a factor of

approximately 1300. This must be improved upon in

future work.

3.3.2 Security

We can now present our result that SEDAWG is a

XPM-SSE scheme. Given the ability to hide access

patterns from the adversary, it also exhibits pattern

privacy. With Proposition 1 from Section 2, this im-

plies that pattern privacy and PIR are equivalent.

Proposition 2. SEDAWG is a XPM-SSE scheme if

(E, D, G) is a IND-CPA secure scheme.

Proof. We must prove that the data privacy prop-

erty holds for our construction. Suppose (E, D, G)

is a IND-CPA secure encryption scheme and A is a

PPT adversary who can win game PrivK

cppa

A,Enc

(k) (Se-

curity Game 1) with nonnegligible probability. In

PrivK

cppa

A,Enc

(k), A receives one encryption Enc

)

and one server’s view to a protocol run view

, x

She then outputs a guess b

′

for b.

The server’s view of a protocol run consists of the

IDs of the requested ﬁles and the ﬁles themselves.

Consider a modiﬁcation of the game, PrivK

′

cppa

A,Enc

(k):

Instead of Enc

) and view

, x

), the adversary

SECRYPT2013-InternationalConferenceonSecurityandCryptography

408

is sent the encryption of a zero string with the same

length as m

: Enc

). Also, view

, x

) is al-

tered in such a way that the transmitted ﬁles are taken

from Enc

) instead of Enc

) (“so the story

ﬁts”). Call A’s output from the modiﬁed game b

′′

Because (E, D, G) is IND-CPA secure, A cannot dis-

tinguish Enc

) from Enc

). Hence, b

′′

statistically close to b

′

In a second modiﬁcation PrivK

′′

cppa

A,Enc

(k), we re-

place the IDs from the server’s view with random IDs

(chosen uniformly at random from the set of avail-

able ﬁle IDs, without replacement), except for the ﬁrst

request—the ﬁle ID that is ﬁrst requested is always

0. Call A ’s output from this modiﬁed game b

′′′

. Be-

cause the original ﬁle IDs have been chosen in the

same manner as the IDs in our modiﬁed game, and

the adversary is only supplied with one view, her out-

put b

′′′

is again statistically close to b

′′

Following our argument, if the adversary’s out-

put in PrivK

cppa

A,Enc

(k) is correlated to b, its output

in PrivK

′′

cppa

A,Enc

(k) is also correlated to b. But in

PrivK

′′

cppa

A,Enc

(k), A receives no input that correlates

with b. This is a contradiction.

In Proposition 1 we showed that PIR is at least as

hard as XPM-SSE. We now show that they are in fact

equivalent. Considering that no practical scheme for

Single Server PIR is known, this implies that achiev-

ing XPM-SSE with pattern privacy is a difﬁcult task.

Proposition 3. SEDAWG with PIR is a XPM-SSE

scheme with pattern privacy if E is a IND-CPA secure

encryption.

Proof. Assume an adversary A who can distin-

guish two search patterns from their transcripts of

SEDAWG with nonnegligible probability. Requested

ﬁle IDs depend on the search pattern in a determin-

istic manner. Because, by deﬁnition, A cannot learn

any information from Enc

(S), she solely uses the re-

quested IDs for the distinction. Hence, she can dis-

tinguish two series of server requests and violate the

PIR assumption.

4 SUMMARY AND

CONCLUSIONS

In this paper, we introduced Symmetric Searchable

Encryption for Exact Pattern Matching, a new class

of searchable encryption schemes, with which a client

can privately search an encrypted string stored on a

server. We deﬁned the new primitive XPM-SSE and

| = |m

two security notions for this primitive, data privacy

and pattern privacy. Data privacy captures the idea

that the data stored on the server should be kept hid-

den from the server. Pattern privacy ensures the server

can learn nothing from search logs except the pattern

length. We showed that pattern privacy is equivalent

to Computational Single-Server Private Information

Retrieval.

We provided our construction SEDAWG for

XPM-SSE. It uses directed acyclic word graphs

(DAWGs) to ensure good performance for the cost of

precomputational overhead. During precomputation,

the DAWG for the string is computed and split into

ﬁles. These ﬁles are then encrypted with a symmetric

IND-CPA secure cipher. The search protocol navi-

gates the DAWG, successively downloading required

ﬁles.

There is a preliminary implementation that shows

the practicality of our approach. However, while the

search operation performs very efﬁciently, the pre-

computation is memory intense. Algorithm engineer-

ing might improve the overall performance of our im-

plementation.

Further research can be directed at extending the

scheme to allow modiﬁcations or extensions of the

encrypted text without the need for a complete re-

encryption.

REFERENCES

Abdalla, M., Bellare, M., Catalano, D., Kiltz, E., Kohno, T.,

Lange, T., Malone-Lee, J., Neven, G., Paillier, P., and

Shi, H. (2008). Searchable encryption revisited: Con-

sistency properties, relation to anonymous ibe, and ex-

tensions. Journal of Cryptology, 21:350–391.

Baeza-Yates, R. and Gonnet, G. H. (1992). A new approach

to text searching. Commun. ACM, 35(10):74–82.

Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A.,

Chen, M. T., and Seiferas, J. (1985). The smallest

automaton recognizing the subwords of a text. The-

oretical Computer Science, 40:31 – 55. Eleventh In-

ternational Colloquium on Automata, Languages and

Programming.

Blumer, A., Blumer, J., Haussler, D., McConnell, R., and

Ehrenfeucht, A. (1987). Complete inverted ﬁles for ef-

ﬁcient text retrieval and analysis. J. ACM, 34(3):578–

595.

Boyer, R. S. and Moore, J. S. (1977). A fast string searching

algorithm. Commun. ACM, 20(10):762–772.

Chang, Y.-C. and Mitzenmacher, M. (2005). Privacy pre-

serving keyword searches on remote encrypted data.

In Ioannidis, J., Keromytis, A., and Yung, M., editors,

Applied Cryptography and Network Security, volume

3531 of Lecture Notes in Computer Science, pages

442–455. Springer Berlin / Heidelberg.

SymmetricSearchableEncryptionforExactPatternMatchingusingDirectedAcyclicWordGraphs

409

Chor, B., Kushilevitz, E., Goldreich, O., and Sudan, M.

(1998). Private information retrieval. J. ACM,

45(6):965–981.

Crochemore, M. and V´erin, R. (1997). On compact di-

rected acyclic word graphs. In Structures in Logic and

Computer Science, A Selection of Essays in Honor of

Andrzej Ehrenfeucht, pages 192–211, London, UK.

Springer-Verlag.

Curtmola, R., Garay, J., Kamara, S., and Ostrovsky, R.

(2006). Searchable symmetric encryption: Improved

deﬁnitions and efﬁcient constructions. In CCS ’06:

Proceedings of the 13th ACM conference on Com-

puter and communications security, pages 79–88,

New York, NY, USA. ACM.

Di Crescenzo, G., Malkin, T., and Ostrovsky, R. (2000).

Single database private information retrieval implies

oblivious transfer. In Preneel, B., editor, Advances

in Cryptology EUROCRYPT 2000, volume 1807 of

Lecture Notes in Computer Science, pages 122–138.

Springer Berlin / Heidelberg.

Goh, E.-J. (2003). Secure indexes.

http://eprint.iacr.org/2003/216/.

Goldreich, O. and Ostrovsky, R. (1996). Software pro-

tection and simulation on oblivious rams. J. ACM,

43(3):431–473.

Golle, P., Staddon, J., and Waters, B. (2004). Secure con-

junctive keyword search over encrypted data.

Karp, R. M. and Rabin, M. O. (1987). Efﬁcient random-

ized pattern-matching algorithms. IBM Journal of Re-

search and Development, 31(2):249 –260.

Kenan, K. (2005). Cryptography in the Database : The

Last Line of Defense. Addison-Wesley, Upper Saddle

River, NJ.

Knuth, D. E., James H. Morris, J., and Pratt, V. R. (1977).

Fast pattern matching in strings. SIAM Journal on

Computing, 6(2):323–350.

Kushilevitz, E. and Ostrovsky, R. (1997). Replication is

not needed: Single database, computationally-private

information retrieval. In FOCS ’97: Proceedings of

the 38th Annual Symposium on Foundations of Com-

puter Science, page 364, Washington, DC, USA. IEEE

Computer Society.

Manber, U. and Myers, G. (1990). Sufﬁx arrays: A new

method for on-line string searches. In SODA ’90: Pro-

ceedings of the ﬁrst annual ACM-SIAM symposium on

discrete algorithms, pages 319–327, Philadelphia, PA,

USA. Society for Industrial and Applied Mathematics.

Song, D. X., Wagner, D., and Perrig, A. (2000). Practi-

cal techniques for searches on encrypted data. IEEE

Symposium on Security and Privacy, pages 44–55.

http://citeseer.nj.nec.com/song00practical.html.

Ukkonen, E. (1995). On-line construction of sufﬁx trees.

Algorithmica, 14(3):249–260.

SECRYPT2013-InternationalConferenceonSecurityandCryptography

410