bilistically k-anonymized data in the context of mixed
datasets. Then an in-depth analysis is carried out
to evaluate the utility and privacy aspects of proba-
bilistic k-anonymity with respect to PPDP. Then we
trained a variety of ML classifiers on probabilistically
k-anonymized data and evaluated the model utility.
When applied with high privacy parameter levels(k)
or a high number of QIDs, probabilistic k-anonymity
has an adverse impact on ML utility. However, com-
pared to the other syntactic privacy models (i.e., k-
anonymity, l-diversity, t-closeness) probabilistic k-
anonymity has gained better ML utility. In conclu-
sion, probabilistic k-anonymity obtain relatively high
utility for ML while providing the data controllers
with numerous advantage such as high flexibility for
sensitive data analysis under GDPR, a means for
PPDP with low attribute disclosure risk and, an easy
adaptation into ML context without additional data
pre-processing or post-processing requirements. In
future work, it can be explored whether these classi-
fication accuracies can be improved further via noise
correction and sample selection methods presented in
the ML literature when learning has to be carried out
on the noisy data.
This work is supported by Vetenskapsr
adet project:
”Disclosure risk and transparency in big data pri-
vacy”( VR 2016-03346 , 2017-2020 ).
Systematic Evaluation of Probabilistic k-Anonymity for Privacy Preserving Micro-data Publishing and Analysis