incorporate tamper resistant hardware (e.g. smartcard,
secure chip, secure USB token) securing the data and
code against attackers and users’ misusages. Despite
the diversity of existing tamper-resistant devices, a
TDS can be abstracted by (1) a Trusted Execution En-
vironment and (2) a (potentially untrusted but cryp-
tographically protected) mass storage area where the
personal data resides. The important assumption is
that the TDS code is executed by the secure device
hosting it and then cannot be tampered, even by the
TDS holder herself.
By construction, secure hardware exhibit limited
storage and computing resources and TDSs inherit
these restrictions. Moreover, they are not necessarily
always connected since their owners can disconnect
them at will. A second party, called hereafter Sup-
porting Server Infrastructure (SSI), is thus required
to manage the communications between TDSs, run
the distributed query protocol and store the interme-
diate results produced by this protocol. Because SSI
is implemented on regular server(s), e.g. in the Cloud,
it exhibits the same low level of trustworthiness.
The resulting computing architecture is said asym-
metric in the sense that it is composed of a very large
number of low power, weakly connected but highly
secure TDSs and of a powerful, highly available but
untrusted SSI.
2.3 Reference Query Processing
Protocol
By avoiding delegating the storage of personal data
to untrusted cloud providers, Trusted Cells is key to
achieve user empowerment. Each individual keeps
her data in her hands and can control its disclo-
sure. However, the decentralized nature of the Trusted
Cells architecture must not hinder global computa-
tions and queries, impeding the development of ser-
vices of great interest for the community. SQL/AA
(SQL Asymmetric Architecture) is a protocol to exe-
cute standard SQL queries on the Trusted Cells archi-
tecture (To et al., 2014; To et al., 2016). It has been
precisely designed to tackle this issue, that is execut-
ing global queries on a set of TDSs without recentral-
izing microdata and without leaking any information.
The protocol, illustrated by the Figure 2, works as
follows. Once an SQL query is issued by a querier
(e.g. a statistic institute), it is computed in three
phases: first the collection phase where the querier
broadcasts the query to all TDSs, TDSs decide to par-
ticipate or not in the computation (they send dummy
tuples in that case to hide their denial of participa-
tion), evaluate the WHERE clause and each TDS returns
its own encrypted data to the SSI. Second, the aggre-
gation phase, where SSI forms partitions of encrypted
tuples, sends them back to TDSs and each TDS par-
ticipating to this phase decrypts the input partition,
removes dummy tuples and computes the aggregation
function (e.g. AVG, COUNT). Finally the filtering phase,
where TDSs produce the final result by filtering out
the HAVING clause and send the result to the querier.
Note that the TDSs participating to each phase can be
different. Indeed, TDSs contributing to the collection
phase act as data producers while TDSs participating
to the aggregation and filtering phases act as trusted
computing nodes. The tamper resistance of TDSs is
the key in this protocol since a given TDS belonging
to individual i
1
is likely to decrypt and aggregate tu-
ples issued by TDSs of other individuals i
2
, . .., i
n
.
Finally, note that the aggregation phase is recursive
and runs until all tuples belonging to a same group
have been actually aggregated. We refer the interested
reader to (To et al., 2014; To et al., 2016) for a more
detailed presentation of the SQL/AA protocol.
2.4 Problem Statement
In order to protect the privacy of users, queries must
respect a certain degree of anonymity. Our primary
objective is to push personalized privacy guarantees
in the processing of regular statistical queries so that
individuals can disclose different amount of informa-
tion (i.e., data at different level of accuracy) depend-
ing on their own perception of the risk. To the best of
our knowledge, no existing work has addressed this
issue. For the sake of simplicity, we consider SQL as
the reference language to express statistical/aggregate
queries because of its widespread usage. Similarly,
we consider personalized privacy guarantees derived
from the k-anonymity and `-diversity models because
(1) they are the most used in practice, (2) they are rec-
ommended by the European Union (European Union,
2014) and (3) they can be easily understood by in-
dividuals
6
. The next step in our research agenda is
to extend our approach to other query languages and
privacy guarantees but this ambitious goal exceeds the
scope and expectation of this paper
Hence, the problem addressed in this paper is to
propose a (SQL) query paradigm incorporating per-
sonalized (k-anonymity and `-diversity) privacy guar-
antees and enforcing these individual guarantees all
along the query processing without any possible leak-
age.
6
The EU Article 29 Working Group mention these char-
acteristics as strong incentives to make these models ef-
fectively used in practice or tested by several european
countries (e.g., the Netherlands and French statistical in-
stitutes).
DATA 2017 - 6th International Conference on Data Science, Technology and Applications
110