Deﬁnition and Efﬁcient Construction of Encrypted k–anonymization

Scheme

Masayuki Yoshino

, Takayuki Suzuki

, Ken Naganuma

1,2

and Hisayoshi Sato

Hitachi, Ltd. Research & Development Group, Center for Technology Innovation-Systems Engineering, Kanagawa, Japan

The University of Tokyo, Chiba, Japan

Keywords:

Encrypted Database, k–anonymization, Searchable Symmetric Encryption, Domain Generalization Hierarchy.

Abstract:

In this paper, we propose an encrypted k–anonymization scheme (EAS) to k–anonymize an encrypted database

using a domain generalized hierarchy while maintaining the encryption state. Preparation of the domain gen-

eralized hierarchy is optional; the proposed EAS can generate domain generalized hierarchies using a Huff-

man code tree from a database encrypted with searchable encryption. As a result, the user can delegate

k–anonymization processing to a third party organization such as the cloud while retaining the conﬁdential-

ity of the database without preparing a generalized hierarchy. In addition, third-party organizations that are

entrusted also have the advantage to eliminate possible of misconduct such as information leakage. In a stan-

dard computer experiment, we performed a generalization process, which is the major procedure for our EAS.

The generalization process takes around 168 seconds only to achieve k–anonymity with k = 3 on 1, 000, 000

records consisting of 4 attributes. As a consequence, this high-speed performance means our EAS is applicable

to not only batch processing but also real-time processing.

1 INTRODUCTION

Numerous studies have proposed technology to make

individual identiﬁcation hard, i.e., anonymize per-

sonal information. In these studies, k–anonymity is

known as a representative index that quantiﬁes the

speciﬁc difﬁculty of identifying an individual (Sama-

rati and Sweeney, 1998; Samarati, 2001; Sweeney,

2002b). k–anonymity means that “the value of the

record must be converted so that there are more than

(k − 1) records that all have the same attribute val-

ues” . Finding an optimal solution that satisﬁes k–

anonymity is known to be a computationally hard

problem, and some of computationally hard prob-

lems are even proven to be NP Hard (Meyerson and

Williams, 2004). Therefore, a k–anonymization tech-

nique that gives up ﬁnding an optimal solution and

obtains an approximate solution to work in polyno-

mial time is widely used in practice (Sweeney, 2002a;

LeFevre et al., 2005; LeFevre et al., 2006; Wang et al.,

2004).

On the other hand, due to the progress of In-

ternet of Things (IoT) technology in recent years,

accumulated data has increased, and single systems

have nearly reached their limit to store and manage

this data. Therefore, in collaboration with external

systems such as clouds that have abundant compu-

tational resources, more data management is being

outsourced. One of the key primitives is search-

able encryption, which realizes an encrypted data

management system such as an encrypted database

and encrypted ﬁle storage to protect sensitive infor-

mation including personal data from being seen by

not only system intruders but also cloud administra-

tors. Searchable encryption is a cryptographic tech-

nique that enables matching of two kinds of encrypted

data while maintaining encryption. Although encryp-

tion requires a secret key, the matching process does

not. Several searchable encryption methods have

been proposed (Boneh et al., 2004; Boneh and Wa-

ters, 2007; Curtmola et al., 2006; Song et al., 2000;

Yoshino et al., 2011) that are classiﬁed into secret key

cryptosystems and public key cryptosystems. In this

paper, we target large-scale data and adopt the sym-

metric key cryptosystem (Yoshino et al., 2011), which

is superior to high-speed processing.

Methods have been proposed to have external or-

ganizations conduct data anonymization processing

while protecting privacy and utilizing data. For ex-

ample, in the methods of Gentry (Gentry, 2009) and

Ducas and Micciancio (Ducas and Micciancio, 2015),

arbitrary arithmetic processing can be executed while

Yoshino, M., Suzuki, T., Naganuma, K. and Sato, H.

Deﬁnition and Efﬁcient Construction of Encrypted k–anonymization Scheme.

DOI: 10.5220/0007354702930300

In Proceedings of the 5th International Conference on Information Systems Security and Privacy (ICISSP 2019), pages 293-300

ISBN: 978-989-758-359-9

293

encrypting data so that anonymization processing can

be delegated safely. However, in these methods, the

overhead required for arithmetic processing in the en-

crypted state is still far from the practical level, so

even anonymization of small databases is not realis-

tic

1.1 Our Contribution

We point out that outsourcing the anonymization pro-

cess may lead to information leakage, thus we pro-

pose an encrypted k–anonymization scheme (EAS).

Our contributions are brieﬂy described as follows.

• We deﬁne the EAS played by a user and a server

that can k–anonymize given encrypted data with-

out a secret key, and deﬁne a semantic security

model for EAS; an honest-but-curious server will

not learn any useful information about the given

encrypted database.

• We propose a construction of EAS and prove its

security. We design EAS using domain gener-

alization hierarchies, however the user does not

need to prepare them. By combining genera-

tion technique for domain generalization hierar-

chy from database (Harada et al., 2012), and

searchable symmetric encryption technique for an

encrypted database (Kamara and Lauter, 2010;

Yoshino et al., 2011; Popa et al., 2012), our con-

struction is equipped with a method to generate

domain generalization hierarchies from search-

able encrypted database. Furthermore, our con-

struction is proved to be secure under the security

model.

• We implemented the proposed EAS on a general-

purpose PC and carried out experiments, where a

generalization technique achieving k–anonymity

with k = 3 takes 168 seconds on 1, 000, 000

records consisting of 4 attributes. Thanks to the

high-speed processing, the proposed EAS is ap-

plicable to not only batch processing but also real-

time processing.

2 PRELIMINARY

2.1 Table Notation

First, we deﬁne a plaintext table P T in a database to

be k–anonymized.

To execute the 1-NAND operation on a general-

purpose computer, Gentry’s method takes about 30 minutes

whereas Ducas and Micciancio’s method takes about 1 sec-

ond.

• Let table P T be a combination of (A , C) where

A is an array of n attributes (a

, . . . a

) and C is an

array of n columns (C

, . . . C

• Each attribute a

contains a word w called as

quasi-identiﬁer, which is selected from a dictio-

nary D

: w ∈ D

• Each column C

consists of m cells (c

1,i

, . . . c

m,i

Each cell c

i, j

contains a word w, which is selected

from a dictionary D

: w ∈ D

Let an encrypted table ET be a same structure as P T

except that each attribute a

∈ A and each cell c

i, j

∈ C

contains an encrypted word ew.

2.2 k-anonymization Techniques

k–anonymization is a de-identiﬁcation technique to

achieve k–anonymity, which is an index to quan-

tify the difﬁculty of individual identiﬁcation proposed

by Samarati and Sweeney in 1998 (Samarati and

Sweeney, 1998). To satisfy k–anonymity, the value

of the record must be converted so that there are

more than (k − 1) records that all have the same at-

tribute values. This conversion process is called re-

coding and can be roughly divided into a local re-

coding method and a global recoding method. Since

the local recoding method calculates the distance be-

tween records for grouping, many calculations are re-

quired. Although precise recoding is performed, due

to the high calculation volume, usage tends to be lim-

ited to use cases with a small number of records.

On the other hand,many global recoding methods use

auxiliary information called a generalized hierarchy

do not calculate distance, and regularly perform re-

coding. High speed is an advantage and is suitable

for k–anonymization targeting large-scale data. Since

this paper deals with large-scale data, we use a global

recoding method with high speed.

In the global recording method, each attribute to

be anonymized is associated with a domain gener-

alized hierarchy (DGH) from which the values can

be generalized to form a group of at least k tuples

with identical values (Sweeney, 2002a). Examples

of DGH and k–anonymized tables are given in Fig-

ures 1 and 2, respectively. The lowest values of DGH

are called leaf nodes, and the highest node of DGH

is called the root node. Relationships are deﬁned be-

tween nodes from leaf nodes to root nodes. The upper

node holds the generalized value of the lower node.

Figure 1 shows the leaf node at the lowest level, which

is the nationality unit {(Japan, China), (Russia, Eng-

land, Germany) }, the more generalized regional unit

There is k–anonymization technology without a gener-

alized hierarchy such as (LeFevre et al., 2006).

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

294

{Asia, Europe }, and the top root, which is located

in the most generalized unit {*}. Figure 2 shows the

two attributes (occupation and nationality) associate

with their DGH. Figure 2 shows P T where the com-

binations of four attributes are 2-anonymity, i.e., there

are at least 2 identical records for every four attribute

combinations. We refer to the k–anonymized P T as

kP T and the k–anonymized ET as kE T .

Japan

China Russia England Germany

Asia Europe

Researcher

Student Sales Engineer

Researcher or Student

Sales or Engineer

NationalityOccupation

Figure 1: Example of DGH.

Researcher or Student Asia * 19**

Sales or Engineer Europe * 20**

Researcher or Student Asia * 19**

Sales or Engineer Europe * 20**

Nationality Gender Birth dateOccupation

Figure 2: Example of kP T with k = 2.

2.3 Searchable Symmetric Encryption

In this paper, we use searchable symmetric encryp-

tion SSE as cryptographic primitives. An SSE scheme

has two phases called as a store phase, which is per-

formed once, and a search phase, which is performed

a polynomial number of times. In the store phase, an

user encrypts all data and stores them on a server. In

the search phase, the user sends a trapdoor of a word

w, the server returns the encrypted word ew, which is

matched by a comparison function for a trapdoor of a

word and an encrypted word.

Formally, the searchable symmetric encryption

SSE consists of ﬁve polynomial-time algorithms

SSE = (Gen, Enc, Trpdr, Cmpr, Dec) as follows (Curt-

mola et al., 2011):

• (sk, pp) ← Gen(1

): is a probabilistic algorithm

that takes security parameter λ as input and out-

puts a public parameter pp and a secret key sk.

• ew ← Enc(sk, w): is a probabilistic algorithm that

takes a secret key sk and word w as input and out-

puts an encrypted word ew. We sometimes write

ew ← Enc(sk, w) where w is a set of w and ew is

a set of ew.

• td(w

′

) ← Trpdr(sk, w

′

): is a deterministic algo-

rithm that takes a secret key sk and a word w

′

input and outputs a trapdoor td(w

′

). We some-

times write td(w

′

) ← Trpdr(sk, w

′

) where w

′

is a

set of w

′

and td(w

′

) is a set of td(w

′

• 0 or 1 ← Cmpr(ew, td(w

′

)): is a deterministic al-

gorithm that takes encrypted word ew and a trap-

door td(w

′

) as input and outputs either 1 (if w =

′

) or 0 (if w ̸= w

′

) with provability 1 − ε(λ) and

ε is a negligible function.

• w ← Dec(sk, ew): is a deterministic algorithm that

takes a secret key sk and encrypted word ew as

input and outputs a word w.

In SSE, the server should learn almost no informa-

tion on w and w

′

. Let L

(w) denote the information

that the server can learn in the store phase, and let

(w, w

′

) denote that in the search phase.

Most SSE reveals L

(w) = (. . . , |w

|, . . .) and

a dictionary D such that w

∈ D, and L

(w, w

′

)

consists of search result {(i, j)|w

= w

′

, w

∈

w, w

′

∈ w

′

} and the search pattern such that

searchpattern((w

′

, . . . w

′

), w

′

) = (b

, . . . b

) where

= 1 if (w

′

= w

′

) and b

= 0 if (w

′

̸= w

′

). The

search pattern caused by deterministic Trpdr algo-

rithm.

We introduce deﬁnitions of semantic security

against an adversary in (Curtmola et al., 2006). The

security is deﬁned by using two games: Real

SSE,A

played by an adversary A and a challenger C, and

Ideal

SSE,A,S

is played by A, C and a simulator S.

Deﬁnition 1.

((Curtmola et al., 2006)) Let SSE =

(Gen, Enc, Trpdr, Cmpr, Dec) be a searchable sym-

metric encryption, and consider the following prob-

abilistic experiments where A is an adversary, C is a

challenger, S is a simulator and L

and L

are leak-

age algorithms:

Real

SSE,A

(λ) : The adversary A chooses a set of

word w and sends it to the challenger C. C be-

gins by running Gen(1

) to generate a secret key

sk and a public parameter pp and send them to

A. A outputs a set of word w and receives ew

from C by ew ← Enc(sk, w). A gives a polyno-

mial number of words w to C, and receives trap-

doors td(w

′

) ← Trpdr(sk, w

′

) from C. Finally, A

returns a bit b, which is output by the experiment.

Ideal

SSE,A,S

(λ) : The adversary A chooses a set of

word w and sends it to the challenger C. C sends

(w) to the simulator S. S generates ew from

(w), and send it to C. C relays ew to A. A

gives a polynomial number of words w to C. C

Deﬁnition and Efﬁcient Construction of Encrypted k–anonymization Scheme

295

sends L

(w, w

′

) to S. S generates a set of trap-

door td(w

′

) from L

(w, w

′

) and sends them to C.

C relays them to A. Finally, A returns a bit b,

which is output by the experiment.

We say that SSE is (L

, L

)-secure against chosen-

keyword attacks if for all PPT adversaries A, there

exists a PPT simulator S such that

Pr[Real

SSE,A

(λ) = 1]−Pr[Ideal

SSE,A,S

(λ) = 1] ≤ negl.

3 DEFINITION

We begin by reviewing the formal deﬁnition of an en-

crypted k–anonymization scheme. The participants

include a user that wants to store sensitive informa-

tion P T on an honest-but-curious server in such a

way that (1) the server will not learn any useful in-

formation about P T and (2) the server is given the

ability to k–anonymize E T and return kE T to the

user.

3.1 Encrypted k –anonymization Scheme

In this subsection, we deﬁne an encrypted k–

anonymization scheme.

Deﬁnition 2. EAS consists of ﬁve polynomial time al-

gorithms.

EAS = (Gen, Enc, Trpdr, Annmz, Dec)

such that

• K ← Gen(1

): is a probabilistic algorithm, which

is run by the user. It takes security parameter λ as

input and outputs a secret key K.

• ET ← Enc(K, P T ): is a probabilistic algorithm

run by the user. It takes a secret key K and a table

P T as input and outputs an encrypted table E T .

• td(w

′

) ← Trpdr(K, w

′

): is a deterministic algo-

rithm run by the user. It takes a secret key K and

a set of word w

′

and outputs a set of trapdoor

td(w

′

• kE T or ⊥ ← Annmz(ET , td(w

′

), kv): is a de-

terministic algorithm run by the server to k–

anonymize an encrypted table E T with selected

attributes. It takes an encrypted table E T , a set

of trapdoor td(w

′

), an integer kv. If it is not pos-

sible to generate an encrypted k–anonymized ta-

ble kET , then output is ⊥. Otherwise, it outputs

kE T with provability 1 − ε(λ) and ε is a negligi-

ble function.

• kP T ← Dec(K, kE T ): is a deterministic algo-

rithm run by the user. It takes a secret key K and

an encrypted k–anonymized table k E T and out-

puts a k–anonymized table kP T .

We say that an EAS satisﬁes correctness if for any

K output by Gen(1

), any P T , any set of word w

′

P T , and any positive integer kv,

kE T or ⊥

= Annmz(Enc(K, P T ), Trpdr(K, w

′

), kv) (1)

and

kP T = Dec(K, kET ), (2)

where Equation (1) holds with provability 1 − ε(λ)

and ε is a negligible function, and Equation (2) holds

with probability 1.

Similar with SSE, EAS has two phases called as

a store phase and which is performed once, and an

anonymization phase, which is performed a polyno-

mial number of times. In the store phase, an user

generates a plain table, encrypts it and stores it on a

server. In the search phase, the user selects attributes

on the table, and sends a trapdoor and an integer k

for k–anonymization. The server k–anonymized the

encrypted table and send it to the user.

Deﬁnition 3. EAS played by a user U and a server S

consists of two phases:

• Store phase

1. U generates a secret key K ← Gen(1

2. U encrypts a plain table P T by ET ←

Enc(K, P T ).

3. U gives an encrypted table E T to S .

• Anonymization phase

1. U selects s attributes from A.

2. U generates a trapdoor set td(w

′

) ←

Trpdr(K, w

′

) and send them and an inte-

ger kv to S .

3. S receives td(w

′

) and kv, gener-

ates an encrypted k–anonymized table

kE T ← Annmz(E T , td(w

′

), kv) and give kE T

to S .

U gets a k–anonymized table kP T by P T ←

Dec(K, kET ).

3.2 Security Deﬁnition

Let L

(w| all w

∈ w are stored in attributes or cells

of P T ) denote the information that the server can

learn in the store phase, and we abbreviate this leak-

age information L

(w) as for simplicity. Further-

more, let L

(w| all w

∈ w are stored in attributes or

cells of P T , w

′

, kv) denote that in the anonymization

phase, and we abbreviate this leakage information as

(w, w

′

, kv) for simplicity. Using these leakage in-

formation, we next deﬁne the semantic security of

EAS.

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

296

Deﬁnition 4. Let EAS =

(Gen, Enc, Trpdr, Annmz, Dec) be an EAS, and

consider the following probabilistic experiments

where A is an adversary, C is a challenger, S is a

simulator and L

and L

are leakage algorithms:

Real

EAS,A

(λ) : The adversary A chooses a plain ta-

ble P T and sends it to the challenger C. C begins

by running Gen(1

) to generate a secret key K and

send it to A. A outputs a plain table P T and re-

ceives ET from C by E T ← Enc(K, P T ). A re-

peats the following Step 1–3 polynomial times.

1. A selects attributes of P T .

2. A gives a polynomial number of words w

′

for

the attributes to C.

3. A receives trapdoors td(w

′

) ← Trpdr(K, w

′

)

from C.

Finally, A returns a bit b, which is output by the

experiment.

Ideal

EAS,A,S

(λ) : The adversary A chooses a plain

table P T and sends it to the challenger C. C

sends L

(w) to the simulator S. S generates E T

from L

(w), and send it to C. C relays E T to

A. A repeats the following step 1–3 polynomial

times.

- A selects attributes of P T .

- A sends a polynomial number of words w

′

for the attributes to C. C sends L

(w, w

′

, kv)

to S. S generates trapdoors td(w

′

) from

(w, w

′

, kv) and sends them to C.

- A receives them from C.

Finally, A returns a bit b, which is output by the

experiment.

We say that EAS is (L

, L

)-secure against chosen-

keyword attacks if for all PPT adversaries A, there

exists a PPT simulator S such that

Pr[Real

EAS,A

(λ) = 1]−Pr[Ideal

EAS,A,S

(λ) = 1] ≤ negl.

4 EAS CONSTRUCTION

k–anonymization techniques are roughly classiﬁed

into methods that use a generalized hierarchy, often

called global recoding, and methods that do not, of-

ten called local recoding. If the purpose of anony-

mous data is clear, it is more advantageous to use a

generalized hierarchy that can clarify the policies of

anonymization. Section 4.1 describes a concrete EAS.

In practice, generalized hierarchies are troublesome to

create. Section 4.3 describes an algorithm to generate

a domain generalization hierarchy from E T .

4.1 EAS Construction using SSE

We now present our EAS construction.

• K ← Gen(1

): generate a secret key

sk ←SSE.Gen(1

) and output K = sk.

• ET ← Enc(K, P T )

1. encrypt each word in all attributes and cells of

P T .

- ew ← SSE.Enc(K, w) for all w ∈ A ∪ C

2. output the table consisting of ew as E T .

• td(w

′

) ← Trpdr(K, w

′

1. select attributes from A. Without losing gen-

erality, we assume that (a

, . . . a

) are selected

from A for simplicity.

2. generate trapdoors of each word in the selected

attributes and the corresponding columns.

- td(w

′

) ← SSE.Trpdr(K, w

′

) for all w

′

∈ a

∪

. . . a

∪ D

∪ . . . D

- ew

′

← SSE.Enc(K, w

′

) for all w

′

∈ D

∪

. . . D

3. output the set of td(w

′

) and ew

′

as td(w

′

• kE T or ⊥ ← Annmz(td(w

′

), E T , kv):

1. for all j ∈ [1, s]

(a) ﬁnd an attribute a

such that

SSE.Cmpr(ew

, td(w

′

)) = 1 for i ∈ [1, n]

where ew

is stored in an attribute a

. If the a

is not found, output ⊥ and stop.

(b) replace ew stored in all cells in C

with ew

′

Cmpr(ew, td(w

′

)) = 1 holds for all w

′

∈ D

c j

′

) of each trapdoor such

that f (w

′

) =

∑

ew∈C

Cmpr(ew, td(w

′

)) for all

′

∈ D

c j

and all ew stored in C

, and gener-

ate a domain generalization hierarchy by Al-

gorithm 1 described in Section 4.3.

2. this generalization step depends on selected

generalization algorithms. Here we give it

based on the algorithm proposed by Wang et

al. (Wang et al., 2004).

(repeat the following step (a)–(c) until k-

anonymity is satisﬁed or one node in every do-

main generation hierarchy is left)

(a) select two leaf nodes among

DGH

, . . . DGH

with the smallest infor-

mation entropy lost by generalization to

the nearest parent node, i.e. two leaf nodes

containing ew

′

or ew

′

are selected by using

the measure of information entropy and the

parent node contains (ew

′

(b) replace a value on the cells to that of the

parent node, i.e. ew

′

and ew

′

in the cells are

replaced with (ew

′

Deﬁnition and Efﬁcient Construction of Encrypted k–anonymization Scheme

297

lected at Step 2 .(a) and set the parent node as

a new leaf node of the DGH.

3. if the table is satisﬁed with k–anonymity for

the selected attributes, shufﬂe record order ran-

domly, and output it as k E T . Otherwise, out-

put ⊥.

• kP T ← Dec(K, k E T ):

1. decrypt each word in all attributes and cells of

kE T .

- w ← SSE.Dec(K, ew) for all ew ∈ A ∪ C

2. output the table as kP T .

Step 2.(a)–(c) applies the greedy method proposed

by Wang et al. (Wang et al., 2004). The method se-

quentially selects optimal anonymization target data

in accordance with a given generalized hierarchy.

Through this sophisticated anonymization process, it

is expected to output higher quality anonymous data

than the method of LeFerve et al.(LeFevre et al.,

2005), which is a representative global re-encoding

method and performs rough anonymization process-

ing.

4.2 Security

We deﬁne w

′

[s]

consists of words for queries of k–

anonymization with s attributes: w

′

[s]

= {w

′

| all w

′

stored in (a

, . . . a

) and w

′

∈ D

∪ . . . D

Theorem 1. If SSE is (L

, L

)-secure against chosen-

keyword attacks, then the proposed EAS construc-

tion is (L

, L

)-secure against chosen-keyword at-

tacks such that L

(w) = L

(w), and L

(w, w

′

[s]

, kv) =

′

[s]

), L

(w, w

′

[s]

) and kv.

Proof. Let S

′

be a simulator of the (L

, L

)-secure

SSE scheme. We construct a simulator S of EAS,

which achieves (L

, L

)-secure as follows.

(Store phase) In Ideal

EAS,A,S

, S takes L

(w) = L

(w)

as input. S runs S

′

(w)) and gets its output ew

from S

′

. S constructs E T from ew and sends it to C.

(Anonymization phase) In Ideal

EAS,A,S

, S takes

(w, w

′

[s]

, kv) as input. S runs S

′

(w, w

′

stored

in a

)), S

′

(w, w

′

| all w

′

∈ D

)) for i = 1, . . . s,

and get its trapdoor td(w

′

) respectively. S runs

′

| all w

′

∈ D

)) for i = 1, . . . s, and get its

trapdoor ew

′

respectively. S constructs kET by

EAS.Annmz(td(w

′

), E T , kv), and send kE T to

4.3 Generation of Domain

Generalization Hierarchy

We design Algorithm 1, which generate domain

generalization hierarchies from searchable encrypted

database. This work is inspired by the research of

Harada et al., which use a data compression rule such

as Huffman code (Huffman, 1952) to create a domain

generalization hierarchy from a table P T (Harada

et al., 2012).

Algorithm 1: Generation of Domain Generalization

Hierarchy.

INPUT: trapdoor set td(w

′

), attribute a

, column C

frequency set

(

′

)

;

OUTPUT: domain generalization hierarchy DGH

;

1. extract all ew

′

from td(w

′

) and set each ew

′

as a value

of leaf nodes of DGH

2. store each combination of ew

′

and a frequency

f (w

′

) in a list Q in ascending order of f (w

′

), i.e.

(ew

′

, f (w

′

)), (ew

′

, f (w

′

)), . . . with f (w

′

) ≤ f (w

′

) ≤

. . . .

3. (repeat the following (a)–(d) until the number of nodes

in the list Q is 1)

(a) retrieve two nodes with the lowest and the 2nd low-

est frequencies such as ew

′

, ew

′

, and delete them

from the list Q.

(b) create a new parent node of children ew

′

and ew

′

such as (ew

′

)

frequencies of the child nodes. e.g. frequency of the

node (ew

′

) is f (w

′

) + f (w

′

(d)

add the parent node to the list Q and sort it again.

4. the last attribute value in list Q is taken as the root node.

Output this tree as the domain generalization hierarchy

DGH

On the basis of the results of aggregation, up-

per nodes of the generalized hierarchy are gradually

formed from nodes with low appearance frequency.

The appearance frequency of the upper node is the

sum of the appearance frequencies of the lower nodes,

and so that the degree of generalization is not too

strong, Huffman code or the like is used. Also, an

upper node generalizing lower nodes is represented

by a logical sum (or) of values of lower nodes.

In this paper, for simplicity, we ignore the order

relation of attributes and use Huffman code for gen-

eralized hierarchy. In the case of handling numeric

column such as age and geographical information, in-

stead of Huffman code and SSE, one can apply Hu-

Tucker code (Hu and Tucker, 1971) and order pre-

serving encryption to maintain the order relation.

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

298

5 EXPERIMENTAL

PERFORMANCE EVALUATION

We implement the proposed scheme to utilize 256-

bit AES and SHA-256 for SSE (Yoshino et al., 2011).

The performance is evaluated on a conventional com-

puter, which is equipped with 3.4 GHz Core i7 6700

CPU, 32GB memory, Cent OS 7.4, and JVM 1.8.0.

To measure the mounting performance for data scal-

ability, we generated dummy data, which is shown in

Table 1. Three attributes (occupation, gender, and ad-

dress) are randomly selected in these value ranges,

and the last attribute (birth date) is sampled from

statistics based on Japanese population estimates (Bu-

reau, 2016).

Table 1: Test Dataset Consisting of Attributes and Possible

Values.

attribute possible value

occupation integer in [1, 24]

gender female or male

address 5000 types

birth date DD/MM/YYYY

Performance of the Enc, Trpdr, or Dec function

of the proposed EAS depends on SSE.Enc, SSE.Trpdr,

or SSE.Dec, respectively, and the running time in-

creases linearly to data size of P T , E T , or message

space. Thus, we measure the last function Annmz of

the proposed EAS, in which running time is not eas-

ily estimated. The generalization process consisting

of Step 2.(a)–(c) is the major process for Annmz. Fig-

ure 3 shows performance of the generalization pro-

cess on the dummy data with 4 attributes and k = 3.

It takes about 440 milliseconds for 10

records, about

6 seconds for 10

records, about 80 seconds for 10

records, and about 168 seconds for 10

records. Com-

pared with the case of the plaintext (Wang et al.,

2004), the generalization step of the proposed EAS

is comparable and almost equivalent. Annmz has a

branch process: if domain generation hierarchies are

not given, then Step 2 generates them. Figure 4

shows the generation time of the generalized hierar-

chy: about 77 milliseconds to 229 milliseconds for

to 10

records. That is negligible compared with

the generalization time.

6 CONCLUSION

We pointed out that outsourcing k–anonymization

processes may lead to information leakage, thus we

deﬁned an encrypted k–anonymization scheme (EAS)

and a semantic security model of EAS. Furthermore,

433.877

6006.994

71086.054

150111.391

440.767

5976.579

79593.499

168447.332

20000

40000

60000

80000

100000

120000

140000

160000

180000

1000 10000 100000 1000000

plain data encrypted data

[records]

[ms]

Figure 3: Performance on 3–anonymization

77.305

157.574

186.757

201.908

79.419

173.886

226.306

228.667

100

150

200

250

1000 10000 100000 1000000

plain data encrypted data

[records]

[ms]

Figure 4: Performance on DGH Generation

we give a construction of EAS and prove the security

under the semantic security model. Finally, we im-

plemented the proposal on a general-purpose PC and

demonstrated its efﬁciency. As a consequence, our

high-speed EAS makes it feasible not only to prevent

information leakage from database but also to gain the

advantage that even the server can prevent unintended

observation of the given database.

REFERENCES

Boneh, D., Di Crescenzo, G., Ostrovsky, R., and Per-

siano, G. (2004). Public key encryption with keyword

search. In Advances in Cryptology – EUROCRYPT,

pages 506–522. Springer.

Boneh, D. and Waters, B. (2007). Conjunctive, subset, and

range queries on encrypted data. In Theory of Cryp-

tography Conference, pages 535–554. Springer.

Bureau, S. (2016). Ministry of internal affairs and commu-

nications. Government of Japan.

Curtmola, R., Garay, J., Kamara, S., and Ostrovsky, R.

(2006). Searchable symmetric encryption: Improved

deﬁnitions and efﬁcient constructions. In Proceedings

Deﬁnition and Efﬁcient Construction of Encrypted k–anonymization Scheme

299

of the 13th ACM Conference on Computer and Com-

munications Security, CCS, pages 79–88, New York,

NY, USA. ACM.

Curtmola, R., Garay, J., Kamara, S., and Ostrovsky, R.

(2011). Searchable symmetric encryption: improved

deﬁnitions and efﬁcient constructions. Journal of

Computer Security, 19(5):895–934.

Ducas, L. and Micciancio, D. (2015). FHEW: bootstrapping

homomorphic encryption in less than a second. In

Advances in Cryptology – EUROCRYPT, pages 617–

640.

Gentry, C. (2009). Fully homomorphic encryption using

ideal lattices. In Proceedings of the Annual ACM Sym-

posium on Theory of Computing, STOC, pages 169–

178, New York, NY, USA. ACM.

Harada, K., Sato, Y., and Togashi, Y. (2012). Reducing

amount of information loss in k-anonymization for

secondary use of collected personal information. In

SRII Annual Global Conference, pages 61–69. IEEE.

Hu, T. C. and Tucker, A. C. (1971). Optimal computer

search trees and variable-length alphabetical codes.

SIAM Journal on Applied Mathematics, 21(4):514–

532.

Huffman, D. A. (1952). A method for the construction of

minimum-redundancy codes. Proceedings of the IRE,

40(9):1098–1101.

Kamara, S. and Lauter, K. (2010). Cryptographic cloud

storage. In International Conference on Financial

Cryptography and Data Security, pages 136–149.

Springer.

LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2005).

Incognito: Efﬁcient full-domain k-anonymity. In Pro-

ceedings of the ACM SIGMOD International Confer-

ence on Management of Data , pages 49–60. ACM.

LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2006).

Mondrian multidimensional k-anonymity. In Proceed-

ings of the International Conference on Data Engi-

neering (ICDE), pages 25–25. IEEE.

Meyerson, A. and Williams, R. (2004). On the complexity

of optimal k-anonymity. In Proceedings of the ACM

SIGMOD-SIGACT-SIGART Symposium on Principles

of Database Systems, pages 223–228. ACM.

Popa, R. A., Redﬁeld, C. M. S., Zeldovich, N., and Bal-

akrishnan, H. (2012). Cryptdb: Processing queries

on an encrypted database. ACM Communications,

55(9):103–111.

Samarati, P. (2001). Protecting respondents identities in mi-

crodata release. IEEE Transactions on Knowledge and

Data Engineering, 13(6):1010–1027.

Samarati, P. and Sweeney, L. (1998). Protecting privacy

when disclosing information: k-anonymity and its

enforcement through generalization and suppression.

Technical report, technical report, SRI International.

Song, D. X., Wagner, D., and Perrig, A. (2000). Practical

techniques for searches on encrypted data. In Pro-

ceedings of the IEEE Symposium on Security and Pri-

vacy, pages 44–55. IEEE.

Sweeney, L. (2002a). Achieving k-anonymity privacy pro-

tection using generalization and suppression. In-

ternational Journal of Uncertainty, Fuzziness and

Knowledge-Based Systems, 10(05):571–588.

Sweeney, L. (2002b). k-anonymity: A model for protecting

privacy. International Journal of Uncertainty, Fuzzi-

ness and Knowledge-Based Systems, 10(05):557–570.

Wang, K., Philip, S. Y., and Chakraborty, S. (2004).

Bottom-up generalization: A data mining solution to

privacy protection. In Proceedings of the IEEE Inter-

national Conference on Data Mining.

Yoshino, M., Naganuma, K., and Satoh, H. (2011). Sym-

metric searchable encryption for database applica-

tions. In International Conference on Network-Based

Information Systems (NBiS), pages 657–662. IEEE.

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

300