encrypting data so that anonymization processing can
be delegated safely. However, in these methods, the
overhead required for arithmetic processing in the en-
crypted state is still far from the practical level, so
even anonymization of small databases is not realis-
tic
1
.
1.1 Our Contribution
We point out that outsourcing the anonymization pro-
cess may lead to information leakage, thus we pro-
pose an encrypted k–anonymization scheme (EAS).
Our contributions are briefly described as follows.
• We define the EAS played by a user and a server
that can k–anonymize given encrypted data with-
out a secret key, and define a semantic security
model for EAS; an honest-but-curious server will
not learn any useful information about the given
encrypted database.
• We propose a construction of EAS and prove its
security. We design EAS using domain gener-
alization hierarchies, however the user does not
need to prepare them. By combining genera-
tion technique for domain generalization hierar-
chy from database (Harada et al., 2012), and
searchable symmetric encryption technique for an
encrypted database (Kamara and Lauter, 2010;
Yoshino et al., 2011; Popa et al., 2012), our con-
struction is equipped with a method to generate
domain generalization hierarchies from search-
able encrypted database. Furthermore, our con-
struction is proved to be secure under the security
model.
• We implemented the proposed EAS on a general-
purpose PC and carried out experiments, where a
generalization technique achieving k–anonymity
with k = 3 takes 168 seconds on 1, 000, 000
records consisting of 4 attributes. Thanks to the
high-speed processing, the proposed EAS is ap-
plicable to not only batch processing but also real-
time processing.
2 PRELIMINARY
2.1 Table Notation
First, we define a plaintext table P T in a database to
be k–anonymized.
1
To execute the 1-NAND operation on a general-
purpose computer, Gentry’s method takes about 30 minutes
whereas Ducas and Micciancio’s method takes about 1 sec-
ond.
• Let table P T be a combination of (A , C) where
A is an array of n attributes (a
1
, . . . a
n
) and C is an
array of n columns (C
1
, . . . C
n
).
• Each attribute a
i
contains a word w called as
quasi-identifier, which is selected from a dictio-
nary D
a
: w ∈ D
a
.
• Each column C
i
consists of m cells (c
1,i
, . . . c
m,i
).
Each cell c
i, j
contains a word w, which is selected
from a dictionary D
ci
: w ∈ D
ci
.
Let an encrypted table ET be a same structure as P T
except that each attribute a
i
∈ A and each cell c
i, j
∈ C
j
contains an encrypted word ew.
2.2 k-anonymization Techniques
k–anonymization is a de-identification technique to
achieve k–anonymity, which is an index to quan-
tify the difficulty of individual identification proposed
by Samarati and Sweeney in 1998 (Samarati and
Sweeney, 1998). To satisfy k–anonymity, the value
of the record must be converted so that there are
more than (k − 1) records that all have the same at-
tribute values. This conversion process is called re-
coding and can be roughly divided into a local re-
coding method and a global recoding method. Since
the local recoding method calculates the distance be-
tween records for grouping, many calculations are re-
quired. Although precise recoding is performed, due
to the high calculation volume, usage tends to be lim-
ited to use cases with a small number of records.
On the other hand,many global recoding methods use
auxiliary information called a generalized hierarchy
2
,
do not calculate distance, and regularly perform re-
coding. High speed is an advantage and is suitable
for k–anonymization targeting large-scale data. Since
this paper deals with large-scale data, we use a global
recoding method with high speed.
In the global recording method, each attribute to
be anonymized is associated with a domain gener-
alized hierarchy (DGH) from which the values can
be generalized to form a group of at least k tuples
with identical values (Sweeney, 2002a). Examples
of DGH and k–anonymized tables are given in Fig-
ures 1 and 2, respectively. The lowest values of DGH
are called leaf nodes, and the highest node of DGH
is called the root node. Relationships are defined be-
tween nodes from leaf nodes to root nodes. The upper
node holds the generalized value of the lower node.
Figure 1 shows the leaf node at the lowest level, which
is the nationality unit {(Japan, China), (Russia, Eng-
land, Germany) }, the more generalized regional unit
2
There is k–anonymization technology without a gener-
alized hierarchy such as (LeFevre et al., 2006).
ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy
294