Authors:
Joshua Stock
1
;
Lucas Lange
2
;
Erhard Rahm
2
and
Hannes Federrath
1
Affiliations:
1
Security in Distributed Systems, Universität Hamburg, Germany
;
2
Database Group, Universität Leizpig, Germany
Keyword(s):
Machine Learning, Privacy Attacks, Property Inference, Defense Mechanisms, Adversarial Training.
Abstract:
In contrast to privacy attacks focussing on individuals in a training dataset (e.g., membership inference), Property Inference Attacks (PIAs) are aimed at extracting population-level properties from trained Machine Learning (ML) models. These sensitive properties are often based on ratios, such as the ratio of male to female records in a dataset. If a company has trained an ML model on customer data, a PIA could for example reveal the demographics of their customer base to a competitor, compromising a potential trade secret. For ratio-based properties, inferring over a continuous range using regression is more natural than classification. We therefore extend previous white-box and black-box attacks by modelling property inference as a regression problem. For the black-box attack we further reduce prior assumptions by using an arbitrary attack dataset, independent from a target model’s training data. We conduct experiments on three datasets for both white-box and black-box scenarios,
indicating promising adversary performances in each scenario with a test R² between 0.6 and 0.86. We then present a new defense mechanism based on adversarial training that successfully inhibits our black-box attacks. This mechanism proves to be effective in reducing the adversary’s R² from 0.63 to 0.07 and induces practically no utility loss, with the accuracy of target models dropping by no more than 0.2 percentage points.
(More)