anonymity (Truta and Vinay, 2006) is a property sim-
ilar to l-diversity, which shares similar shortcomings.
See (Domingo-Ferrer, 2008) for a summary of criti-
cisms to l-diversity and p-sensitive k-anonymity.
t-Closeness (Li et al., 2007) is another extension
of k-anonymity which also tries to solve the attribute
disclosure problem. A data set is said to satisfy t-
closeness if, for each group of records sharing a com-
bination of quasi-identifier attribute values, the dis-
tance between the empirical distribution of each con-
fidential attribute within the group and the empirical
distribution of the same confidential attribute in the
whole data set is no more than a threshold t. This
property clearly solves the attribute disclosure vul-
nerability, although the original t-closeness paper did
not propose a computational procedure to achieve this
property and did not mention the large utility loss that
this property is likely to inflict on the original data.
Differential privacy, as originally proposed for
interactive databases, assumes that an anonymiza-
tion mechanism mediates between the user submit-
ting queries and the database. In this way, instead of
getting responses to a query function f computed on
the database, the user gets responses to a randomized
query function κ. This randomized κ is said to satisfy
ε-differential privacy if, for all data sets D
1
, D
2
such
that one can be obtained from the other by modifying
a single record, and all subsets S of the range of κ, it
holds that
Pr(κ(D
1
) ∈ S) ≤ exp(ε) × Pr(κ(D
2
) ∈ S). (1)
In plain words, Expression (1) means that the influ-
ence of any single record on the returned value of
κ is negligible. The computational procedure origi-
nally proposed to reach ε-differential privacy is to ob-
tain κ by adding Laplace noise to the query function
f (Dwork, 2006).
We have recently shown in (Soria-Comas et al.,
2013) that microaggregation-based k-anonymity can
be used as a prior step towards achieving ε-differential
privacy of a data set. The advantage of doing so is
that much less Laplace noise addition is thereafter
needed to attain ε-differential privacy, in such a way
that the utility of the resulting differentially private
data is substantially higher.
1.1 Contribution and Plan of this Paper
In the same spirit of (Soria-Comas et al., 2013)
about finding connections between models based on
k-anonymity and differential privacy, we explore here
how t-closeness and ε-differential privacy are related
to each other regarding anonymization of data sets.
We highlight the formal similarities between t-
closeness and ε-differential privacy in Section 2.
In the same section, we give a lemma showing
that k-anonymity for the quasi-identifiers combined
with differential privacy for the confidential attributes
yields t-closeness in expectation. Section 3 is a con-
clusion.
2 FROM DIFFERENTIAL
PRIVACY TO (EXPECTED)
T-CLOSENESS
Let X be a data set with quasi-identifier attributes
Q
1
, ··· , Q
m
and confidential attributes C
1
, ··· , C
n
. Let
N be the number of records of X. Further, let I
r
(·)
be the function that returns all the attribute values
contained in record r ∈ X; let IC
r
(·) be the function
that returns the values of the confidential attributes in
record r ∈ X.
Consider the multivariate query
(I
1
(X), ··· , I
N
(X)); the answer to that query returns
the entire data set X. Further, let (Y
1
(X), ··· , Y
N
(X))
be the noise that needs to be added to the answer
to that query to achieve ε-differential privacy. A
differentially private version of the data set X can be
obtained as:
(I
1
(X), ··· , I
N
(X)) + (Y
1
(X), ··· , Y
N
(X)).
From the definition of ε-differential privacy (Ex-
pression (1)), it holds that
Pr((I
1
(X
1
), ··· , I
N
(X
1
)) + (Y
1
(X
1
), ··· , Y
N
(X
1
)) ∈ S
N
)
≤ exp(ε)×
Pr((I
1
(X
2
), ··· , I
N
(X
2
)) + (Y
1
(X
2
), ··· , Y
N
(X
2
)) ∈ S
N
)
(2)
for any pair of data sets X
1
, X
2
such one can be ob-
tained from the other by suppressing/modifying a sin-
gle record, and all S ⊂ Range(I
i
() + Y
i
()), where we
assume this range to be the same for all i = 1, ··· , N.
Let us now introduce expected t-closeness. This
means t-closeness in expectation, that is at the level
of the distributions of the noise used to generate
the anonymized confidential attributes, respectively,
within each group of records sharing a combination
of quasi-identifier attributes and in the overall data
set. Actual t-closeness (Li et al., 2007), however, is
defined in terms of the actual values obtained for the
confidential attributes.
Definition 1 (Expected t-closeness). Let X
′
be an
anonymized data set with N records obtained from an
original data set X by k-anonymizing quasi-identifiers
and adding random noise to the projection of X on its
confidential attributes. Call the latter projection C
OntheConnectionbetweent-ClosenessandDifferentialPrivacyforDataReleases
479