[R
8
] Erasure:
According to the right to be forgot-
ten, a privacy mechanism must also ensure that certain
data can be erased. For ML models, however, this
also means that they must be able to “forget” certain
aspects if they were learned on these erased data.
[R
9
] Security:
In addition to privacy issues, a pri-
vacy mechanism must also cover security issues, i. e.,
data must be protected against unauthorized access.
To this end, raw data have to be isolated from data
consumers and any communication between these two
parties has to be secured. In addition, raw data must
be protected against manipulation.
[R
10
] Data Protection by Design:
In order to min-
imize the organizational burden on users and still guar-
antee full data protection, a privacy mechanism should
adopt a Privacy by Design approach, i. e., it has to be
integrated seamlessly into the ML system.
It is evident that both, Global Privacy (a global view
on all available data sources is required for, e. g.,
[R
4
]
,
[R
5
]
, and
[R
6
]
) and Local Privacy (individual privacy
rules defined by the user are required for, e. g.,
[R
1
]
and [R
2
]), are necessary to meet these requirements.
4 RELATED WORK
In ML systems, three task domains can be identified
that are relevant regarding privacy. In the following,
we discuss selected representatives for these domains.
Data Preparation.
The earliest stage in which an
ML privacy mechanism can be applied is during data
preparation. That is, similar to Local Privacy, data are
masked before being actually processed, in the case
of ML, the learning of models. AVARE (Alpers et al.,
2018) is a privacy system for smart devices. Various
privacy filters are applied to available data sources.
These filters enable users to specify which data are
available to an application. The filters are adapted to
the respective kind of data. For instance, the accuracy
of location data can be reduced, or certain details of
contact data can be hidden. However, AVARE consid-
ers each smart device separately. As a result, its privacy
settings are unnecessarily restrictive for ML, similar
to Local Privacy. PSSST! (Stach et al., 2019) there-
fore introduces a central control instance that knows
all available data sources and applies privacy mea-
sures according to the privacy requirements of the user.
PSSST! conceals privacy-relevant patterns, i. e., se-
quences in the data from which confidential knowledge
can be derived. This pattern-based approach signifi-
cantly reduces the amount of data masking. To this
end, PSSST! needs to know the intended use of the
data. In the ML context, however, it cannot be assumed
that the usage of the models is known in advance.
Model Management.
A privacy mechanism can also
come into play when dealing with the ML models
(i. e., during learning or provisioning). As in most use
cases where statistical information is shared with third
parties, differential privacy (Dwork, 2006) is often ap-
plied in ML. Abadi et al. (Abadi et al., 2016) apply
differential privacy to the ML database to ensure that
the models do not reveal any sensitive information.
As this excludes users, Shokri and Shmatikov (Shokri
and Shmatikov, 2015) enable users to select a subset
of data to be made available for learning. Data re-
main on the user’s system until they are needed for
learning. Bonawitz et al. (Bonawitz et al., 2017) ex-
tend this approach by enabling users to provide only
aggregated submodels to update a deep neural net-
work. This ensures that the ML system never has full
data access. However, the result can be biased if each
user provides only data that are best for him or her,
whereby certain aspects can be completely lost. More-
over, the models are then available to any application.
Hüffmeyer et al. (Hüffmeyer et al., 2018) introduce an
attribute-based authorization method for querying data
from a service platform. This way, applications can be
granted access to models only, if they have currently
the appropriate attributes (e. g., if they are hosted in
the right execution environment).
Explain Models.
Also, the explicability of models is
relevant for privacy. Alzantot et al. (Alzantot et al.,
2019) introduce a technique to explain ML models
for image recognition. They define masks to cover
image areas that are irrelevant for decision-making.
This enables users to comprehend why a decision
is made. Yet, this approach is restricted to image
recognition. Ribeiro et al. (Ribeiro et al., 2016)
introduce a framework which contains various
description techniques. Yet, for each ML algorithm a
dedicated plugin is required. Thus, it cannot be used
universally. Rudin (Rudin, 2019) therefore suggests
that instead of trying to describe ML models, rather
explainable types of models should be used. However,
users cannot choose the types of models freely as
they are restricted to the types supported by an ML
system. Powell et al. (Powell et al., 2019) propose to
combine ex ante (i. e., explain the model) with ex post
approaches (i. e., explain the decision). They only con-
sider the logic of a model, but not the input data that
lead to a particular decision when the model is applied.
In addition to all the identified shortcomings in related
work, there is also no holistic approach providing all
required privacy features. Therefore, we introduce our
own ML privacy approach called AMNESIA, next.
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
24