each partition P
i
is bounded by ||cmp|| · |P
i
|, ∀P
i
re-
gardless of the number of the computations involv-
ing o ∈ P
i
. Consequently, the Data set leakage over
large numbers n of computations is also bounded:
L
f
n→+∞
(O) ≤
∑
i
||cmp|| · |P
i
| = ||cmp|| · |O|. Regard-
ing Object leakage, for any P
i
, the attacker has the
liberty to choose the distribution of the |P
i
| leak frag-
ments among the objects in P
i
. At the extreme, all |P
i
|
fragments can concern a single object in P
i
. For any
object o ∈ P
i
, the Object leakage is thus bounded by
L
f
n→+∞
(o) ≤ min(||cmp|| · k,||o||).
Minimal Leakage. From above formulas, a decom-
posed Data Task execution of f = agg ◦ cmp is op-
timal in terms of limiting the potential data leakage
with both minimum Data set and Object leakages,
when a maximum degree of decomposition is chosen,
i.e., a partition at the object level fixing k = 1 as leak-
age factor.
’Green bonus’ Leakage Analysis. Let us assume
that ||cmp|| = 1 (i.e., 1 bit to indicate if a GPS trace
is a bike commute or not), ||agg|| = 6 (i.e., 6 bits to
count the number of monthly bike commutes with a
maximum admitted value of 60) and ||o|| = 600 (i.e.,
each GPS trace is encoded with 600 bits of informa-
tion). Without any countermeasure on f , an attacker
needs 100 queries to obtain an object o. With a state-
less f , the number of queries to obtain o is (much)
higher due to random leakage and o has to be con-
tained in the input of each query. With a stateless de-
terministic f , the number of queries to obtain o is at
least the same as with a stateless f but each query has
to have a different input while containing o. Finally,
with a fully decomposed execution of f (i.e., k = 1),
only ||cmp|| = 1 bit of o can be leaked regardless of
the number of queries.
6 RESEARCH CHALLENGES
The introduced building blocks allow an evaluation
of f with low and bounded leakage. However,
their straightforward implementation may lead to pro-
hibitive execution cost with large data sets, mainly be-
cause (i) too many Data Tasks must be allocated at
execution (up to one per object o ∈ O
σ
to reach the
minimal leakage) and (ii) many unnecessary compu-
tations are needed (objects o
0
/∈ O
σ
must be processed
given the ’static’ partitioning if they belong to any part
containing an object o ∈ O
σ
). Hence, a first major re-
search challenge is to devise evaluation strategies hav-
ing reasonable cost, while achieving low data leakage.
The security guarantees of our strategies are based
on the hypothesis that the Core is able to evaluate
the σ selection predicates of the App. This is a rea-
sonable assumption if basic predicates are considered
over some objects metadata (e.g., temporal, object
type or size, tags). However, because of the Core min-
imality, it is not reasonable to assume the support for
more complex selections within the Core (e.g., spa-
tial search, content-based image search). Advanced
selection would require specific data indexing and
should be implemented as Data Tasks, which calls for
revisiting the threat model and related solutions.
Another issue is that our study considers a single
cmp function for a given App. For Apps requiring
several computations, our leakage analysis still ap-
plies for each cmp, but the total leak can be accumu-
lated across the set of functions. Also, the considered
cmp does not allow parameters from the App (e.g.,
cmp is a similarity functions for time series or images
having also an input parameter sent by the app). Pa-
rameters may introduce an additional attack channel
allowing the attacker to increase the data leakage.
This study does not discuss data updates. Personal
historical data (mails, photos, energy consumption,
trips) is append-only (with deletes) and is rarely mod-
ified. Since an object update can be seen as the dele-
tion and reinsertion of the modified object, at each
reinsertion, the object is exposed to some leakage.
Hence, with frequently updated and queried objects,
new security building blocks may be required.
To reduce the potential data leakage, complemen-
tary security mechanisms can be employed for some
Apps, e.g., imposing a query budget or limiting the
σ predicates. Defining such restrictions and incor-
porating them into App manifests would definitely
make sense as future work. Also, aggregate compu-
tations are generally basic and could be computed by
the Core. The computation of agg by the Core intro-
duces an additional trust assumption which could help
to further reduce the potential data leakages.
REFERENCES
Anciaux, N., Bonnet, P., Bouganim, L., Nguyen, B.,
Pucheral, P., Popa, I. S., and Scerri, G. (2019). Per-
sonal data management systems: The security and
functionality standpoint. Information Systems, 80:13–
35.
Chaudhry, A., Crowcroft, J., Howard, H., Madhavapeddy,
A., Mortier, R., Haddadi, H., and McAuley, D. (2015).
Personal Data: Thinking Inside the Box. Aarhus Se-
ries on Human Centered Com.
Costan, V. and Devadas, S. (2016). Intel SGX explained.
IACR Cryptol. ePrint Arch., 2016:86.
de Montjoye, Y.-A., Shmueli, E., Wang, S. S., and Pent-
land, A. S. (2014). openPDS: Protecting the Privacy
of Metadata through SafeAnswers. PLoS one, 9.
DATA 2022 - 11th International Conference on Data Science, Technology and Applications
526