Property Inference as a Regression Problem: Attacks and Defense

Joshua Stock

1 a

, Lucas Lange

2 b

, Erhard Rahm

2 c

and Hannes Federrath

Security in Distributed Systems, Universit

at Hamburg, Germany

Database Group, Universit

at Leizpig, Germany

ﬁ

Keywords:

Machine Learning, Privacy Attacks, Property Inference, Defense Mechanisms, Adversarial Training.

Abstract:

In contrast to privacy attacks focussing on individuals in a training dataset (e.g., membership inference),

Property Inference Attacks (PIAs) are aimed at extracting population-level properties from trained Machine

Learning (ML) models. These sensitive properties are often based on ratios, such as the ratio of male to female

records in a dataset. If a company has trained an ML model on customer data, a PIA could for example reveal

the demographics of their customer base to a competitor, compromising a potential trade secret. For ratio-

based properties, inferring over a continuous range using regression is more natural than classiﬁcation. We

therefore extend previous white-box and black-box attacks by modelling property inference as a regression

problem. For the black-box attack we further reduce prior assumptions by using an arbitrary attack dataset,

independent from a target model’s training data. We conduct experiments on three datasets for both white-box

and black-box scenarios, indicating promising adversary performances in each scenario with a test R² between

0.6 and 0.86. We then present a new defense mechanism based on adversarial training that successfully

inhibits our black-box attacks. This mechanism proves to be effective in reducing the adversary’s R² from

0.63 to 0.07 and induces practically no utility loss, with the accuracy of target models dropping by no more

than 0.2 percentage points.

1 INTRODUCTION

Machine Learning (ML) technologies are as present

as never before and their advancement is signif-

icantly enhancing capabilities across multiple do-

mains. However, this progress also introduces new

challenges, particularly in the realm of data privacy.

In healthcare and other user-centric areas, training

large ML models requires ever increasing amounts of

personal data to provide the desired outcomes. Dan-

gers to privacy mostly stem from ML models be-

ing prone to leaking information about their training

data under adverserial attacks (Al-Rubaie and Chang,

2019). A famous example among these are member-

ship inference attacks (Shokri et al., 2017), deciding

membership of individuals in training data, given a

trained model. More recently Property Inference At-

tacks (PIAs) (Ateniese et al., 2015) have emerged as

a critical threat, where adversaries aim to infer sensi-

tive properties from the training data of target mod-

els without direct access to the underlying data. Ex-

https://orcid.org/0000-0003-3940-2229

https://orcid.org/0000-0002-6745-0845

https://orcid.org/0000-0002-2665-1114

amples for such critical properties include the gen-

der ratio (male to female) in a dataset or the status

of security patches in log ﬁles used by an intrusion

detection system (Ganju et al., 2018). These adverse-

rial threats lead to an increasing body of research in

Privacy-Preserving ML (PPML) that revolves around

attacking and defending ML models in various sce-

narios (Xu et al., 2021).

The threat of PIAs is especially severe in in white-

box scenarios, where attackers have complete knowl-

edge of the target model’s architecture and parame-

ters (Ateniese et al., 2015; Ganju et al., 2018; Stock

et al., 2023). However unlike their white-box coun-

terparts, black-box attacks do not require detailed

knowledge of the model, reﬂecting more realistic ad-

versarial scenarios against services that only query

models for their outputs (Zhang et al., 2021). We in-

troduce a novel method of training a black-box ad-

versary for property inference, where we extend the

conventional black-box PIAs to use an arbitrary at-

tack dataset as input to the target model, which can

be independent from the model’s training data. By

formulating PIAs as a regression problem, the adver-

sary can freely extract the most likely distrubtion of a

876

Stock, J., Lange, L., Rahm, E. and Federrath, H.

Property Inference as a Regression Problem: Attacks and Defense.

DOI: 10.5220/0012863800003767

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 21st International Conference on Security and Cryptography (SECRYPT 2024), pages 876-885

ISBN: 978-989-758-709-2; ISSN: 2184-7711

property in the original training data (Suri and Evans,

2022). We show the feasibility of our black-box PIA

in attacking Random Forest models and Deep Neural

Networks (DNNs). To put our black-box results into

context, we compare them to an adapted white-box

attack by (Ganju et al., 2018) that we run for the same

scenarios. In all scenarios, we are able to obtain ad-

versaries with a coefﬁcient of determination (R²) of

at least 0.6, ranging up to 0.72 with black-box access

and up to 0.86 with white-box access.

As a defense mechanism to our black-box PIA at-

tack, we test the possiblity of directly including an

adverserial loss in model training, which is inspired

by (Grari et al., 2020) who showed the effectiveness

for enhancing a model’s fairness. The idea is to check

the model’s proneness to PIAs after each training

round and to then accordingly inﬂuence the model’s

next updates to steer the model into hiding the original

property distibution from its outputs. The model is in-

stead trained to blind the adversary with a predeﬁned

target property value. We evaluate the resulting abil-

ity on mitigating the risks associated with black-box

PIAs but also monitor the negative impact on model

utility from impairing model training. Our results

demonstrate that our mechanism effectively reduces

black-box PIA success by an order of magnitude to

an R² of 0.07, while only incurring a negligible utility

loss of less than 0.2% accuracy.

We Summarize our Contributions as Follows:

• We expand current PIAs for the black-box setting

by enabling our meta-classiﬁer to use an arbitrary

attack dataset, which is indepedent from the tar-

get’s original data distribution.

• For this meta-classiﬁer, we frame our property in-

ference as a regression problem that is able to ex-

tract a more realistic range of sensitive property

distributions compared to the predeﬁned distribu-

tions of a classiﬁcation task.

• As a new defense mechanism we propose an ad-

verserial training scheme that hides the actual

property distribution from attacks by guiding the

model during training to produce more balanced

outputs, while preserving model utility. In con-

trast to defense schemes in related work, our

method generalizes well, i.e., defends against a

whole class of PIAs instead of defeding against

a single adversary instance.

For organizing this work, Section 2 provides an

introduction to important related work on PIAs and

possible defense mechanisms. Section 3 describes

our methodology for conducting black-box PIAs and

framing these attacks as a regression problem. Sec-

tion 4 then outlines the proposed countermeasures for

mitigating PIAs. In Section 5, we present our experi-

mental setup, including the datasets used, the models

evaluated, and the metrics for assessing the success of

attack and defense. Section 6 discusses the results of

our experiments, demonstrating the effectiveness and

limitations of our approaches, and suggests future re-

search directions. Finally, Section 7 concludes with a

summary of our ﬁndings.

1.1 Performance Metrics

Since we model the adversarial task as a regression

problem, we measure the adversary’s success with the

coefﬁcient of determination (R²). A dummy regressor

always outputting the expected value of the trained la-

bels would yield an R² of 0, while an ideal regressor’s

predictions would amount to an R² of 1.

In our evaluation, we use boxplot graphs, since

they capture multiple characteristics of a distribution:

the average is plotted as a diamond, while a line

within the box is the mean of the distribution. The

upper and lower line of the boxes capture the ﬁrst and

third quantile, while the whiskers below and above the

box imply the range of values outside the two main

quantiles. Outliers are plotted as points above or be-

low the whiskers.

2 RELATED WORK

The task of property inference, also known as dis-

tribution inference, is ﬁrst introduced by (Ateniese

et al., 2015) who describe attack patterns on tradi-

tional machine learning models. The current state-

of-the-art white-box PIA by (Ganju et al., 2018) is

based on permutation-invariant networks and adapts

earlier PIAs with a focus on fully connected neural

networks. Their solution has since found successful

adoption to other ML models and is the main focus

for evaluating white-box PIAs (Hartmann et al., 2023;

Suri and Evans, 2022). In the same vein, we ﬁnd

the ﬁrst approach for black-box attacks introduced by

(Zhang et al., 2021) to form the basis for the current

implementations of black-box PIAs in these works.

Their attack method uses a set of shadow models each

trained on data exhibiting different distributions of the

attacked property. They then train a meta-classiﬁer

that learns to predict the property of interest based on

the shadow models’ outputs on a query dataset. Dur-

ing execution, the attacker queries the target model

with the attack query set, feeds the outputs to the

meta-classiﬁer, and obtains a prediction of the sen-

sitive property in the data. The standard attack set-

ting for PIAs is a binary classiﬁcation over two prede-

Property Inference as a Regression Problem: Attacks and Defense

877

ﬁned distribution values to infer to which of them the

dataset property conforms. For a more realistic case

of continuous distribution values like for the gender

ratio, other works extend the inference task from bi-

nary to a larger set of possible distributions (Zhang

et al., 2021; Zhou et al., 2022; Suri and Evans, 2022).

For defending against PIAs, (Hartmann et al.,

2023) try to evaluate the fundamental reasons for why

models leak sensititve information about their distri-

bution. They come to believe that three factors play

an intervening role in enabling such exploits. As so-

lutions they conversly suggest to reduce the model’s

memorization about the expected label given the fea-

tures of adversary interest, optimizing model architec-

ture against wrong inductive bias, and increasing the

amount of training data. While these mitigations may

help, they each come at a cost. Reducing memoriza-

tion impacts model performance, optimizing model

architecture requires prior assumptions, and collect-

ing more training data is not always feasible.

Other methods for protecting against PIAs also

have shown only limited effectiveness in reliably pre-

venting their success. In contrast to membership in-

ference attacks, differential privacy techniques are not

effective at mitigating inference risks related to data

distributions as tested by (Ateniese et al., 2015). This

is because differential privacy focuses on obfuscat-

ing the contributions of individual records, whereas in

the PIA setting, the adversary is able to uncover sta-

tistical properties of the underlying data distribution

on a population level. Another approach by (Zhang

et al., 2021) tries to remove sensitive attributes from

datasets, which has also proven ineffective, as the

correlations between different attributes still prevails.

(Ganju et al., 2018) propose a defense using scaling

transformations applied in models with ReLU activa-

tions that hide their learned distributions in the inter-

nal weights. However, this technique does not offer

any protection against black-box attacks. (Zhou et al.,

2022) suggest alterations to the training data with re-

spect to the target property before training the model.

This can be done in two ways: either by removing

records or by adding new records. By this, the ratio of

a property should reach a predetermined ﬁxed value,

such as 0.5, regardless of the original ratio. However,

this strategy would either strive for acquiring new data

or removing records from the dataset to balance the

property ratio, leading to expensive data acquisition

or a potential reduction in training data. In (Stock

et al., 2023), they test property unlearning as a de-

fense mechanism against white-box attacks. It uses

an adversarial classiﬁer to identify the model parame-

ters that leak information about a property. They then

use backpropagation to modify them into unlearning

the property. Experiments show this can be effec-

tive against a speciﬁc white-box adversary, but has

limitations in generalizing and can therefore not pro-

tecting against black-box or different white-box PIAs

targeting a certain property. In summary, the cur-

rently available defences against property inference

still have signiﬁcant limitations and do not provide

reliable protection across various attack scenarios.

To address the limitiations stemming from related

work, we ﬁrst take the black-box PIA approach by

(Zhang et al., 2021) and shift the attack set to be in-

dependent from the original data distribution to fur-

ther reduce the burden of pre-requisites on black-box

PIAs. We further adopt the suggestion by (Suri and

Evans, 2022) and take property inference as a logistic

regression problem to directly infer speciﬁc property

ratios through our meta-classiﬁer, where others only

consider a deﬁnite set of distributions and formulate

the adversarial task as a classiﬁcation problem. For

defending against black-box PIAs, we adapt an ad-

versarial learning strategy introduced by (Grari et al.,

2020), where they directly include an adverserial loss

term during model training to inﬂuence its memoriza-

tion regarding speciﬁc properties. In their work they

utilize this training framework to achieve fairness in

decision tree models. Transitioning this process to

PIA defense, we use a PIA adversary during training

to punish a model, when it memorizes a distribution

too much, leading to an increased adversary loss term.

Using this process, we can dictate a model to learn a

speciﬁc distribution output, and to hide its true dis-

tribution, regarding a sensitive property. Our goal in

defending is comparable to the proposition by (Zhou

et al., 2022) but is applicable without actively remov-

ing training samples and we instead just reduce their

inﬂuence to steer the property exposure.

3 REGRESSION PROPERTY

INFERENCE ATTACKS

Given a trained target model, the overall aim of a PIA

is to recover sensitive information about the model’s

training data. In contrast to other attacks, PIAs are

not directed at properties of an individual data sample

but rather at global properties of the training dataset.

Examples for such critical properties include the gen-

der ratio (male to female) in a dataset or the status

of security patches in log ﬁles used by an intrusion

detection system (Ganju et al., 2018). While some

properties are binary, this work focuses on the more

difﬁcult problem of extracting continuous properties.

Hence, we introduce PIAs as a regression problem, as

proposed in (Suri and Evans, 2022).

SECRYPT 2024 - 21st International Conference on Security and Cryptography

878













…

train

k shadow

models





input

predict





k shadow

models

input

predict





…

train









meta training set

meta classifier

(adversary)

Figure 1: Black-box PIA in three steps: Train shadow models, generate output and train adversary.

3.1 Black-Box Attacker Model

In this work, we focus on a black-box attacker model,

in which an attacker does not have access to the tar-

get model itself, but can choose input values and ob-

serve the target model’s output. This is a realistic sce-

nario for many applications in which machine learn-

ing models are deployed as a service. In real-world

applications, the internal model weights are often

hidden from clients and only API-access is granted,

meaning that requests are forwarded to the model in-

ternally and clients only receive its output. This helps

the model owner to stay in control, and attacks such as

model stealing (Tram

er et al., 2016) can be prevented

more easily.

The attacker has some information about the target

model’s training data or can access parts of it. This is

an assumption also made for previous defense strate-

gies (Nasr et al., 2018; Song and Mittal, 2021; Tang

et al., 2021; Stock et al., 2023). Otherwise, informa-

tion about the training data can also be reconstructed

with separate attacks (Shokri et al., 2017), which is

just as effective (Liu et al., 2022).

3.2 Attack Description

For our black-box PIA execution, we adapt the tech-

niques from (Zhang et al., 2021) for our scenario. The

attack can be split in four steps, as described below

and depicted in Figure 1.

Step 1: Training Shadow Models

Since the ultimate goal of an adversary A is to pre-

dict a property for a target model, A trains on the

output of models with known property values – the

shadow models. Hence, the ﬁrst step in a PIA is to

train multiple shadow models for different property

values, i.e., shadow models are trained on training

datasets with the respective property values. Since

the adversary A is modeled as a regressor, we ﬁrst cre-

ate auxiliary training datasets with the property values

x ∈ [0.1, 0.2,.. ., 0.8,0.9], i.e., each auxiliary training

dataset DS

aux

features ratio x regarding a predeﬁned

property. As an example, the property might be de-

ﬁned over the ratio of men to women in a dataset.

To create an auxiliary dataset DS

aux

in our experi-

ments, we use the original training dataset and delete

records until property value x is reached. This pro-

cess is repeated for all x ∈ [0.1,0.2, .. . ,0.8, 0.9], such

that the resulting 9 auxiliary training datasets DS

aux

can be used to train shadow models. On each DS

aux

, k

shadow models are trained. For the datasets we have

used in our experiments, k = 200 has proven useful.

While a higher number for k might increase the utility

of the advesary, it comes with higher computational

costs.

By basing the auxiliary datasets on the original

training datasets, we create worst-case adversaries,

i.e., the strongest possible adversaries: Since the

distribution of the DS

aux

is as close as possible to

the original training data distributions, the adversary

learns the behaviour of very similar models to the at-

tacked model. This is especially helpful when model

owners defend their models during training. Since

original training data must be available to the model

owner during the training process, this prerequisite is

easily fulﬁlled.

Step 2: Generating Output from Shadow Models

The focus of this work is on a black-box attacker

model, meaning that the adversary A does not have

access to a target model itself but can only survey its

output on a chosen input. Therefore, after training the

9*200 shadow models, they are each queried with the

same input. This input can be arbitrary, as long as

it yields a meaningful output when used as input to

the shadow models. Since this input is an essential

part of the black-box attack, we call it attack dataset

att

. The output of the shadow models, labeled with

the according property values [0.1, 0.2,.. ., 0.8, 0.9], is

then stored as the training dataset for the adversary A.

The same process of training and generating output

Property Inference as a Regression Problem: Attacks and Defense

879

on shadow models is repeated on equally sized and

non-overlapping auxiliary test datasets. For creating a

test dataset for the adversary, we use 50 shadow mod-

els per property value, i.e., in total there are 9*200

shadow models to generate the meta training set and

9*50 shadow models to generate the meta test set.

Step 3: Training Adversary on Shadow Model

Output

The adversary A , or meta-classiﬁer, is modeled as

a simple deep neural network (DNN). The training

dataset of the adversary (meta training set) consists of

the outputs of the trained shadow models on the cho-

sen input data, paired with the property values of their

respective training datasets. The hyperparameters of

the adversary such as neurons per layer and batch size

are optimized via the framework keras-tuner

. The

optimizer Adam is used to ﬁt the model and early

stopping is applied to avoid overﬁtting. After approx-

imately 65 epochs, the meta-classiﬁer A reaches its

peak test R² value and the training stops.

Step 4: Attacking Target Model

After training the adversary A , a target model m can

be attacked by querying it with the attack dataset

att

. The output of m then serves as the input

for the adversary to infer the sensitive property, i.e.,

A(m(DS

att

)) = x, where x is the value of the sensitive

property inferred by the adversary A.

3.3 White-Box Benchmark PIA

In order to benchmark our novel black-box regres-

sion adversaries described above, we train white-box

advesaries on the same datasets for comparison. In

principle, we follow the work of (Ganju et al., 2018),

which is also based on shadow models. Hence, step

1 in Section 3.2 is identical in the white-box setting.

Step 2 (generating output from shadow models) is

not necessary, since the adversary is trained on the

shadow model weights and biases itself – constituting

the ﬁnal step of training a white-box adversary. The

white-box attack is carried out by directly using the

weights and biases of a target model as input to the

trained adversary.

4 DEFENDING BLACK-BOX PIA

To defend against black-box PIAs, we propose a form

of adversarial training. We borrow the technique

https://keras.io/keras tuner/

of (Grari et al., 2020), originally intended to make

models fair, in the sense of making decisions indepen-

dent from pre-speciﬁed sensitive characteristics in the

data. Grari et al. design an adversary with the task of

deducing the sensitive data property from the model

output for Gradient Boosted Trees (GBTs). Using this

adversary during training, they create additional gra-

dients for the trained model – to minimize the success

of the adversary, i.e., to minimize the inﬂuence of the

sensitive property on the model’s output. A parameter

λ is introduced which controls the tradeoff between

the two competing training goals model performance

and property suppression.

We have extended the strategy of (Grari et al.,

2020) for GBTs to defend DNNs against PIAs. For

defending a target DNN during training, we can sim-

ply modify its loss function. To be precise, we deﬁne

two terms within the loss function: A ﬁrst term pun-

ishing the model if its prediction deviates from the

labels of the training data, and a second term pun-

ishing the model if the PIA adversary infers another

value than the predeﬁned target distribution from the

model’s output. As in the work of (Grari et al., 2020),

the two terms are weighted with (λ − 1) and λ, allow-

ing to ﬁnd an optimal tradeoff between model opti-

mization and PIA defense.

When computing the loss ℓ as a mean squared er-

ror (mse), this yields

ℓ = (1−λ)∗mse(y

true

pred

)+λ∗mse(adv

tar

,adv

pred

)

for training data labels y

true

and model predictions

pred

, the adversary’s target distribution adv

tar

and the

adversary’s current prediction adv

pred

) based on the

model’s output.

5 EXPERIMENTS

We perform our experiments on the three datasets

Adult, UTKFace and CIFAR-10. We have performed

our experiments on linux machines using Python 3.10

and the latest versions of TensorFlow and keras. The

defense experiments have been performed on 10 tar-

get models per property value and dataset. The tar-

get models are trained on a different portion of the

datasets than the shadow models (i.e., the auxiliary

test datasets as described in Section 3.2, Step 2) to

avoid side effects.

5.1 Datasets

Adult is a dataset consisting of 32,561 records from

the US census (Kohavi et al., 1996). The 14 at-

tributes include information about the individual gen-

der, work hours per week and education level. The

SECRYPT 2024 - 21st International Conference on Security and Cryptography

880

machine learning task is to predict whether an individ-

ual earns more than 50k dollars per year. For this tab-

ular dataset, we use random forest classiﬁers, reach-

ing an accuracy of 85%.

As the sensitive attribute, we choose the attribute

gender in the adult dataset, i.e., the adversary’s goal

is to predict the ratio of male:female records in the

training dataset. For the attack dataset which the ad-

versary uses to generate output from target models,

we generate a synthetic dataset with 10,000 records

based on Adult using a conditional generative adver-

sarial network (CTGAN) (Xu et al., 2019).

UTKFace consists of 20,000 color images with

annotations of age, gender and ethnicity (Zhang et al.,

2017). We choose the ML task of inferring the correct

gender from an image. Deep neural networks with

convolutional layers are used for recognizing the im-

ages, reaching a test accuracy of 90%.

We deﬁne our sensitive property over the age at-

tribute, i.e., the property is deﬁned by the distribution

of old to young instances in the data. For simplicity,

we consider all age labels above 59 as old. Our at-

tack dataset is based on the Labeled faces in the wild

dataset (Huang et al., 2008), containing 13,233 im-

ages of 5,749 different people.

CIFAR-10 contains 60,000 color images for im-

age recognition tasks with 32x32 pixels each (Alex,

2009). The dataset is grouped into 10 classes, such as

airplane, horse and truck. For the target and shadow

models, we use deep neural networks with the same

architecture as for UTKFace above, reaching accura-

cies of 70% (random guessing would amount to 10%

accuracy).

The sensitive attribute we deﬁne in this dataset is

the amount of animals in the training dataset, i.e., the

adversary’s task is to predict the ratio of animal:non-

animal images in the training dataset of the target

model. The attack dataset which the adversary uses

to generate output from the target model is based on

the CIFAR-100 dataset (Alex, 2009). We randomly

select 5,040 samples of 21 CIFAR-100 classes, which

share similarities with CIFAR-10 classes, e.g., images

of the CIFAR-100 class lion have similar features as

the CIFAR-10 class cat.

5.2 Regression PIA Results

The results of our attack experiments are summarized

in the ﬁrst two columns of Table 1. Following the

regression approach explained in Section 3, we have

trained PIA adversaries both in white-box and black-

box scenarios

. The black-box adversaries reach R²

The white-box approach was not applicable to the

GBTs of the adult dataset, since the proposal in (Ganju

Table 1: Adversary performance measured as R² on test

data, comparison black-box and white-box. Both the white-

box attack and the black-box defense are not applicable to

the tree-based models for the Adult dataset.

Access white-box black-box

Defense none none λ = 0.15

Adult – 0.72 –

UTKFace 0.86 0.63 0.070

CIFAR-10 0.60 0.64 0.069

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

distribution (property value)

mean adv

pred

Adult

UTKFace

CIFAR-10

Figure 2: Performance of black-box adversaries, mean ad-

versary output on test data for different distributions (resp.

property values).

values from 0.63 (UTKFace) to 0.72 (Adult). Their

performance is visualized in Figure 2: An ideal ad-

versary would infer all property values correctly, as

shown by the dotted red line. It is apparent that all

three adversaries perform best in the mid-ranges of

property values (0.2–0.7), where their predictions are

closest to the ideal. For CIFAR-10, the largest devi-

ation is at property value 0.1, where the mean pre-

diction amounts to 0.38. The smallest deviations are

observed for property value 0.6, where all three ad-

versaries’ mean predictions deviate less than 5% from

the correct value.

Focusing on the ﬁrst column of Table 1, we can

see that the white-box adversary for UTKFace out-

performs its black-box counterpart (R² of 0.86 com-

pared to 0.63), while the performance of the white-

box adversary for CIFAR-10 is worse than the black-

box version (R² of 0.60 opposed to 0.64). The great

performance of the white-box UTKFace adversary is

also reﬂected in Figure 3, where the plotted deviations

from the ideal predictions (dotted red line) are small

or even not existing in the case of property value 0.1.

At the same time, the CIFAR-10 white-box adversary

et al., 2018) is designed for neural networks.

Property Inference as a Regression Problem: Attacks and Defense

881

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

distribution (property value)

mean adv

pred

UTKFace

CIFAR-10

Figure 3: Performance of white-box adversaries, mean ad-

versary output on test data for different distributions (resp.

property values).

performs worst at property value 0.1, its mean pre-

diction deviating even more from the truth than its

black-box counterpart in Figure 2. Interestingly, the

CIFAR-10 white-box adversary obtains its best pre-

dictions for property value 0.9, contrary to the UTK-

Face white-box adversary.

5.3 Regression black-box PIA Defense

Results

We have implemented the defense strategy of Sec-

tion 4 and conducted experiments for different values

of λ. To recapitulate, the higher the value of λ, the

more the trained model is defended against a PIA ad-

versary, hence λ = 0 implies a regular training with-

out any defense mechanisms. All experiments were

run with the target property value 0.5.

To show the effect of our adversarial training,

we plot the adversarial outputs after defending target

models with λ = 0.15 in Figure 4: The plot is very

different from the original adversary performance in

Figure 2, with mean adversary outputs close to the

target 0.5 for target models with all property values.

This underlines the results in the third column of Ta-

ble 1 with anadversary R² of 0.07 on defended models

for both datasets UTKFace and CIFAR-10.

More detailed results for different values of λ are

presented in Figure 5 for UTKFace and Figure 6 for

the CIFAR-10 dataset. The green boxplots at the bot-

tom of both ﬁgures represent the distance of adversary

outputs to the target value 0.5 – as λ increases along

the x-axis, this distance decreases. The black box-

plots at the top of the ﬁgures show the accuracies of

the defended models. For both datasets, we can ob-

serve practically no accuracy decrease: The mean ac-

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

distribution (property value)

mean adv

pred

UTKFace

CIFAR-10

Figure 4: Black-box adversary outputs for target models de-

fended with λ = 0.15 and target property value 0.5.

0 0.01 0.05 0.1 0.15 0.2

0.2

0.4

0.6

0.8

abs(adv

pred

-0.5)

0.2

0.4

0.6

0.8

test accuracy

Figure 5: UTKFace: Accuracy of defended models for dif-

ferent values of lambda (black) and absolute divergence

from target 0.5 of the adversary for defended models

(green).

curacy even increases for the UTKFace models from

86.19% (λ = 0) to 86.60% (λ = 0.15), before it de-

creases slightly to 84.91 for λ = 0.2. For CIFAR-10,

the accuracy decreases slightly from 67.46% (λ = 0)

to 67.33% (λ = 0.25).

6 DISCUSSION

Regarding the attack success rates, the black-box and

white-box adversaries are in the same range of 0.6 to

0.86 R². While the age property for the UTKFace

dataset is easily extracted in the black-box scenario

(highest R²: 0.86), the animal property of CIFAR-10

seems to be harder to extract for an adversary, espe-

cially for property values below 0.3 (both in the black-

box and white-box scenarios). The reasons behind

this are guesswork; one hypothesis could be that as

long as the share of animal images during CIFAR-10

SECRYPT 2024 - 21st International Conference on Security and Cryptography

882

0.01

0.05

0.07

0.1

0.15

0.25

0.2

0.4

0.6

abs(adv

pred

-0.5)

0.2

0.4

0.6

test accuracy

Figure 6: CIFAR-10: Accuracy of defended models for

different values of lambda (black) and absolute divergence

from target 0.5 of the adversary for defended models

(green).

Table 2: Approximated accuracies for our attacks to estab-

lish comparability to related work. The value for related

work is extracted from (Zhang et al., 2021), Table 7.

black-box white-box

related work ours ours

Adult 50.6% 39.8% –

UTKFace – 44.9% 67.6%

CIFAR-10 – 49.2% 62.4%

training is below a certain threshold, the neural net-

work does not account for typical animal features to

an extend which is inferrable for an adversary.

In general, the attack success rates are hard to

compare to related work, since PIAs are only con-

sidered as a classiﬁcation problem, not covering the

more natural continuous value range of property dis-

tributions. To the best of our knowledge, (Zhang

et al., 2021) is the only work with a similar PIA ex-

periment. In reference to their “ﬁne-grained” attack

using 5 classes for the property gender in the Adult

dataset, we calculate approximate accuracy values for

our attackers by classifying our adversary outputs into

4 classes across the range 0–1. The value from (Zhang

et al., 2021) is the average value of the rightmost col-

umn in Table 7, i.e., the average accuracy across the

5 classes. The results (see Table 2) exhibit similar

accuracy rates across all adversaries (39.8%–67.6%),

although Zhang et al.’s work seems to outperform our

black-box attacks. The two white-box attacks show

signiﬁcantly higher accuracy rates than the black-box

attacks. However, we need to stress that the accu-

racy values are extracted from our regression outputs

and that our adversaries have not been optimized to

achieve high accuracies in a classiﬁcation task.

In our defense experiments, we have shown that

the adversarial approach, actively suppressing prop-

erty information in target model outputs, works well

and does not negatively affect the performance of de-

fended target models. Across both datasets, λ = 0.15

has proven to create a reasonable tradeoff during

training, minimizing the adversary’s R² to 0.07 while

harming the target models’ performance by less than

0.2 percentage points on average. As Figure 5 ex-

hibits, using a higher λ than 0.15 is not necessary,

since the difference in adversary performance is neg-

ligible (R² of 0.067 for λ = 0.25 instead of R² of

0.069 for λ = 0.15) and two outliers imply possibly

unwanted behavior for higher values of λ. Another

notable observation from Figure 5 and Figure 5 is that

not only the accuracy values are stable, but also their

deviations do not increase for bigger values of λ.

For demonstration purposes, we have used 0.5

as the target property value across all defense ex-

periments. In practice, this value could be either

randomly chosen from the range 0–1 for each tar-

get model individually, or set to some other constant

value. Depending on the use case, one option might

make more sense than the other.

In (Stock et al., 2023), the authors have demon-

strated how it does not sufﬁce to harden a target

model against a single white-box adversary to de-

fend a whole class of PIA adversaries. For black-

box adversaries, this is different, since the informa-

tion available to the adversary is a lot more sparse than

in white-box scenarios (model output vs. all trained

weights and biases). Therefore, adversaries cannot

circumvent the adversarial defense by focusing on an-

other part of available information, as has been shown

for the white-box case in Section 6 of (Stock et al.,

2023). We were able to conﬁrm this experimentally

by validating that defending a target model against

one black-box adversary limits the capabilities of an-

other adversary with the same task at the same rate.

This shows that the defense mechanism in this work

generalizes well, in contrast to the white-box defense

presented in (Stock et al., 2023).

6.1 Future Work

Through our experiments, we have demonstrated the

feasibility of regression PIAs. Although this will not

prevent attackers from executing them, the process of

implementing the attack takes a lot of effort, entail-

ing the identiﬁcation of a target model’s training data

distribution, creating shadow models, ﬁne-tuning the

architecture of the adversary model, etc. Follow-up

work could investigate whether this effort could be

limited, while maintaining the success rates shown in

this work. Inspiration could be taken from (Li and

Property Inference as a Regression Problem: Attacks and Defense

883

Zhang, 2021), where a “boundary attack” for mem-

bership inference is presented, which bypasses the

creation and usage of shadow models altogether.

Also, our defense mechanism could be transferred

to a hybrid scenario. Instead of using a static adver-

sary during the adversarial training of a target model,

the adversary is retrained on the modiﬁed output of

the target model in every epoch of the target model

training, as in (Grari et al., 2020). Such a hybrid

adversarial training could potentially further reduce

the leaked property information of a target model, al-

though further investigation is necessary.

Last but not least, the adversarial training for fair-

ness in (Grari et al., 2020) could have side effects

on the success rate of PIAs. Since Grari et al. train

their models to yield outputs independent from a sen-

sitive property p, it would be interesting to investi-

gate whether their approach could also defend a target

model against a PIA focussing on property p.

7 CONCLUSION

In this work, we have expanded upon existing black-

box PIAs by using an arbitrary attack dataset, which

can be based on other datasets than the training

dataset. As the natural ﬁt for many ratio-based prop-

erties, we have modeled the PIAs in this work as

regression problems. We have explored a defense

mechanism based on adversarial training which hard-

ens a target model against black-box PIAs during its

training process. We have evaluated our approach on

three datasets, comparing the attack against white-box

benchmarks and related work. In our experiments, we

have shown our defense scheme to be both effective

(by decreasing the adversary’s performance from an

R² of 0.63–0.64 to 0.07) and practical, decreasing the

mean accuracy of target models by less than 0.2 per-

centage points.

ACKNOWLEDGEMENTS

The authors would like to thank Jona Zantz for his

helpful comments and insights.

REFERENCES

Al-Rubaie, M. and Chang, J. M. (2019). Privacy-preserving

machine learning: Threats and solutions. IEEE Secu-

rity & Privacy, 17(2):49–58.

Alex, K. (2009). Learning multiple layers of fea-

tures from tiny images. https://www. cs. toronto.

edu/kriz/learning-features-2009-TR. pdf.

Ateniese, G., Mancini, L. V., Spognardi, A., Villani, A.,

Vitali, D., and Felici, G. (2015). Hacking smart ma-

chines with smarter ones: How to extract meaningful

data from machine learning classiﬁers. International

Journal of Security and Networks, 10(3):137–150.

Ganju, K., Wang, Q., Yang, W., Gunter, C. A., and Borisov,

N. (2018). Property inference attacks on fully con-

nected neural networks using permutation invariant

representations. In Proceedings of the 2018 ACM

SIGSAC conference on computer and communications

security, pages 619–633.

Grari, V., Ruf, B., Lamprier, S., and Detyniecki, M. (2020).

Achieving fairness with decision trees: An adversarial

approach. Data Science and Engineering, 5(2):99–

110.

Hartmann, V., Meynent, L., Peyrard, M., Dimitriadis, D.,

Tople, S., and West, R. (2023). Distribution inference

risks: Identifying and mitigating sources of leakage.

In 2023 IEEE Conference on Secure and Trustworthy

Machine Learning (SaTML), pages 136–149. IEEE.

Huang, G. B., Mattar, M., Berg, T., and Learned-Miller,

E. (2008). Labeled faces in the wild: A database

forstudying face recognition in unconstrained envi-

ronments. In Workshop on faces in’Real-Life’Images:

detection, alignment, and recognition.

Kohavi, R. et al. (1996). Scaling up the accuracy of naive-

bayes classiﬁers: A decision-tree hybrid. In Kdd, vol-

ume 96, pages 202–207.

Li, Z. and Zhang, Y. (2021). Membership leakage in label-

only exposures. In Proceedings of the 2021 ACM

SIGSAC Conference on Computer and Communica-

tions Security, pages 880–895.

Liu, Y., Wen, R., He, X., Salem, A., Zhang, Z., Backes,

M., Cristofaro, E. D., Fritz, M., and Zhang, Y. (2022).

ML-Doctor: Holistic risk assessment of inference at-

tacks against machine learning models. In USENIX

Security.

Nasr, M., Shokri, R., and Houmansadr, A. (2018). Machine

Learning with Membership Privacy using Adversarial

Regularization. In CCS.

Shokri, R., Stronati, M., Song, C., and Shmatikov, V.

(2017). Membership inference attacks against ma-

chine learning models. In 2017 IEEE symposium on

security and privacy (SP), pages 3–18. IEEE.

Song, L. and Mittal, P. (2021). Systematic evaluation of

privacy risks of machine learning models. In USENIX

Security.

Stock, J., Wettlaufer, J., Demmler, D., and Federrath, H.

(2023). Lessons learned: Defending against prop-

erty inference attacks. In di Vimercati, S. D. C. and

Samarati, P., editors, Proceedings of the 20th Interna-

tional Conference on Security and Cryptography (SE-

CRYPT), pages 312–323. SCITEPRESS.

Suri, A. and Evans, D. (2022). Formalizing and estimating

distribution inference risks. Proceedings on Privacy

Enhancing Technologies.

Tang, X., Mahloujifar, S., Song, L., Shejwalkar, V., Nasr,

M., Houmansadr, A., and Mittal, P. (2021). Mitigat-

SECRYPT 2024 - 21st International Conference on Security and Cryptography

884

ing membership inference attacks by self-distillation

through a novel ensemble architecture. USENIX Sec.

Tram

er, F., Zhang, F., Juels, A., Reiter, M. K., and Risten-

part, T. (2016). Stealing machine learning models via

prediction {APIs}. In 25th USENIX security sympo-

sium (USENIX Security 16), pages 601–618.

Xu, L., Skoularidou, M., Cuesta-Infante, A., and Veera-

machaneni, K. (2019). Modeling tabular data using

conditional gan. In Advances in Neural Information

Processing Systems.

Xu, R., Baracaldo, N., and Joshi, J. (2021). Privacy-

preserving machine learning: Methods, challenges

and directions. arXiv preprint arXiv:2108.04417.

Zhang, W., Tople, S., and Ohrimenko, O. (2021). Leakage

of dataset properties in {Multi-Party} machine learn-

ing. In 30th USENIX security symposium (USENIX

Security 21), pages 2687–2704.

Zhang, Z., Song, Y., and Qi, H. (2017). Age pro-

gression/regression by conditional adversarial autoen-

coder. In Proceedings of the IEEE conference on

computer vision and pattern recognition, pages 5810–

5818.

Zhou, J., Chen, Y., Shen, C., and Zhang, Y. (2022). Property

inference attacks against gans. In 30th Network and

Distributed System Security Symposium (NDSS 2022).

Property Inference as a Regression Problem: Attacks and Defense

885