An Alternative Approach to Federated Learning for Model Security and

Data Privacy

William Briguglio

1 a

, Waleed A. Yousef

1,2 b

, Issa Traor

1 c

, Mohammad Mamun

3 d

and Sherif Saad

4 e

Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada

Department of CS, HCILab, Helwan University, Cairo, Egypt

National Research Council of Canada, Fredericton, NB, Canada

Department of Computer Science, University of Windsor, Windsor, ON, Canada

Keywords:

Data Poisoning, Federated Learning, Model Poisoning, non-IID.

Abstract:

Federated learning (FL) enables machine learning on data held across multiple clients without exchanging

private data. However, exchanging information for model training can compromise data privacy. Further,

participants may be untrustworthy and can attempt to sabotage model performance. Also, data that is not in-

dependently and identically distributed (IID) impede the convergence of FL techniques. We present a general

framework for federated learning via aggregating multivariate estimated densities (FLAMED). FLAMED ag-

gregates density estimations of clients’ data, from which it simulates training datasets to perform centralized

learning, bypassing problems arising from non-IID data and contributing to addressing privacy and security

concerns. FLAMED does not require a copy of the global model to be distributed to each participant during

training, meaning the aggregating server can retain sole proprietorship of the global model without the use of

resource-intensive homomorphic encryption. We compared its performance to standard FL approaches using

synthetic and real datasets and evaluated its resilience to model poisoning attacks. Our results indicate that

FLAMED effectively handles non-IID data in many settings while also being more secure.

1 INTRODUCTION

Federated learning (FL) is used to train a machine

learning (ML) model from data held by multiple

owners, without compromising the privacy of each

owner’s data. In the standard FL approach, each data

owner or client trains a local ML model starting from

a shared initial global model. The result of local train-

ing is sent in the form of weight or gradient updates

to an aggregating server, which then combines the lo-

cal models to obtain the new global model. This pro-

cess repeats for several rounds, each starting from the

previous round’s global model, until convergence is

reached. However, (Zhu and Han, 2020) has shown

that it is possible to leak training samples from the

gradient updates alone. To protect against this, other

https://orcid.org/0000-0002-2357-3966

https://orcid.org/0000-0001-9669-7241

https://orcid.org/0000-0003-2987-8047

https://orcid.org/0000-0002-4045-8687

https://orcid.org/0000-0002-5506-5261

approaches in the literature rely on techniques such as

homomorphic encryption (HE) and secure multi-party

computation (SMPC). But, as discussed in Section 4,

these solutions work against securing FL against at-

tacks that target the performance of the global model,

posing a trade-off between model security and data

privacy.

In FL, during local training, each batch is sampled

only from the data available at a given client. How-

ever, data in FL settings is non-independent and iden-

tically distributed (non-IID), meaning clients have

different dataset distributions. This causes local mod-

els to be biased away from the global optimum, ham-

pering or preventing convergence. Many approaches

have been proposed to overcome this hurdle but they

often ignore privacy and security considerations.

We propose an alternative general FL framework

using density estimation to simultaneously address

non-IID data (see Section 3.1), and privacy and se-

curity (see Section 4) concerns. Clients model their

local data distributions and share this with the server,

Briguglio, W., Yousef, W. A., Traoré, I., Mamun, M. and Saad, S.

An Alternative Approach to Federated Learning for Model Security and Data Privacy.

DOI: 10.5220/0013237500003899

In Proceedings of the 11th International Conference on Information Systems Security and Privacy (ICISSP 2025) - Volume 1, pages 291-301

ISBN: 978-989-758-735-1; ISSN: 2184-4356

291

allowing the aggregating server to simulate central-

ized global training. This is a general framework and

the methods used for modeling distributions and cre-

ating the global model can be decided upon by the

practitioner.

The present work makes the following contribu-

tions. (1) FLAMED is a general framework for sim-

ulated centralized learning that serves as a concep-

tual basis for alternative FL methods and allows non-

IIDness, privacy, and model security to be addressed

simultaneously. (2) FLAMED enables the aggre-

gating server to obtain a global model not known

to any other participants. Restricting knowledge of

the global model to a single participant secures the

model’s intellectual property and guards against a ma-

licious participant using shared model weights to at-

tack training data privacy. (3) We evaluated our ap-

proach against baseline and state-of-the-art FL ap-

proaches on a variety of synthetic datasets and a

real-world healthcare dataset from a federated setting

with 132 participants. The latter, with 3,069 fea-

tures, demonstrates FLAMED’s potential with high-

dimensionality data. (4) We performed a security

analysis of the proposed framework and evaluated its

resilience against backdoor attacks as deﬁned in (Bag-

dasaryan et al., 2020) using the real dataset. (5) We

present a technique speciﬁc to FLAMED for detect-

ing model backdoor attacks via data poisoning.

The next section provides a brief review of the FL

literature. Section 3 presents the general formulation

of our approach and a discussion of its strengths rela-

tive to standard FL. We also present the proof of con-

cept implementation used for FLAMED in the cur-

rent paper. In Section 5, we present all the conﬁg-

urations of the FL and model backdoor experiments.

Section 6 presents an analysis of the results and high-

lights the strengths and weaknesses of the proposed

approach. Section 7 summarizes our ﬁndings and dis-

cusses future work.

2 RELATED WORK

FedAvg (McMahan et al., 2017) is the basic FL algo-

rithm in which an initial global model is distributed

from an aggregation server to clients participating in

the FL scheme. Each client trains the model on their

local data. The trained local models are sent back to

the server, which obtains the updated global model as

a weighted average of the local models. This process

repeats until convergence.

This research was enabled in part by the Digital Re-

search Alliance of Canada.

The privacy of training data and the integrity of the

global model is a principal concern in FL. Although

data are never exchanged between clients, (Zhu and

Han, 2020) showed it is possible to reproduce training

samples using the gradient updates alone, a problem

known as gradient leakage. In (Bagdasaryan et al.,

2020), the authors demonstrate that one or multiple

clients can collaborate to cause misclassiﬁcations for

speciﬁc feature values, without signiﬁcantly impact-

ing the global model’s overall performance.

Reference (Bonawitz et al., 2017) employed

SMPC to provide a private vector summation frame-

work for FL weight aggregation. Their framework is

also resilient to clients dropping out of the FL net-

work, ensuring results are still correct even if clients

leave part way through the secure summation proce-

dure.

Several approaches improve convergence with

non-IID data. FedProx was introduced in (Li et al.,

2020), and it improved on FedAvg by penalizing large

updates with the addition of a “proximal term” to

the clients’ local objective function. The proximal

term prevents local updates from pulling the global

model away from the global optimum. Researchers

in (Karimireddy et al., 2020) introduced SCAFFOLD

to account for “client drift,” when a client’s local op-

timum is not aligned with the average local optimum

across all clients, by approximating the ideal unbiased

local update, which is the average gradient of the lo-

cal model across all clients’ data. FedDC, proposed

in (Gao et al., 2022), improved on SCAFFOLD by

adding a loss term that allows clients to learn their

client drift and correct it before submitting updates.

Existing FL approaches have difﬁculty simulta-

neously addressing non-IIDness while maintaining

clients’ data privacy and global model integrity. In

FLAMED, we provide an FL framework that address

all these challenges at once.

3 METHOD

In this section, we introduce the general FLAMED

framework and discuss its beneﬁts before detailing

the speciﬁc FLAMED implementation used in the ex-

periments presented in this paper. Table 1 provides the

notation used throughout.

3.1 FLAMED: The General Framework

Non-IID data pose a signiﬁcant challenge when ap-

plying FL in real-world scenarios by causing local

models to be biased towards the local solution con-

ditioned only on the locally held data, slowing con-

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

292

Table 1: Notation frequently used in this paper.

Symbol Meaning

K Number of clients

The i

client

P Global data distribution

Client i’s data distribution

P Estimation of P

M Global Model

X Data from all clients

Data belonging to C

′

Low-dimensional transformation of X

Data simulated from the estimated distri-

bution of X

r-dimensional transformation from SVD

or FedSVD

n Total number of samples

Number of samples at C

m Number of features

c Number of classes

Experimental parameter specifying the

ratio of n

to m

α = β

Experimental parameter specifying the

level of non-IIDness

vergence towards the global solution. For any obser-

vation x in the IID setting, we have x ∼ P, where P is

the global data distribution, while in the non-IID set-

ting, we have x ∼ P

for each client C

, where P

may

not equal P

∀i ̸= j and i, j ∈ [1, K]. To bypass non-

IIDness, we estimate the maximum likelihood estima-

tor (MLE)

MLE

of P. In FLAMED, each P

is mod-

eled using a density estimation technique to obtain

; then one of two approaches is possible. (1) Each

client simulates a dataset

∼

, which the server

aggregates into a global dataset. (2) The server aggre-

gates the estimated

, or its summary statistics, from

each client, which the server uses to simulate a global

dataset. The ﬁrst approach is the default in FLAMED

and what we adopt in the present article’s proof of

concept implementation, which can be seen as just

one of many possible implementations of the ﬁrst gen-

eral approach. In both approaches, a global model

is constructed at the server from the global dataset,

which then follows a mixture distribution

P =

∑

= 1, (1)

where the weight of the convex combination, α

, con-

trols the importance of each client, and the distribu-

tions

, i = 1, ··· , K are the empirical distributions

MLE

of the data simulated at the client side (approach

1) or the estimated distributions themselves (approach

2). Figure 1 illustrates the general FLAMED frame-

work.

FLAMED:

FL Clients:

1 2 K

Aggregating Server

③

Simulated Centralized Training on

X = [X

, X

, ..., X

] or X ⁓ P

② X

⁓ P

or P

⁓ ⁓ ⁓ ⁓ ⁓

⁓

i=0

∑

①

= DensityEstimation(X

)

④ Global Model:

Figure 1: The general FLAMED framework with red indi-

cating approach 1 and blue indicating approach 2.

Theorem 3.1. FLAMED bypasses non-IIDness by

approximating the global MLE

MLE

of the global

data distribution P.

Proof : It is obvious that

MLE

̸=

MLE

, where

MLE

is the MLE, called the empirical nonparamet-

ric distribution, that simply puts a mass of

on each

observation, and where n

is the number of observa-

tions at C

. In both approaches, a global model is con-

structed at the server from the mixture distribution

deﬁned in Eq. (1) Therefore, the aggregating server

obtains

P, an estimation of

MLE

. ■

In this way, FLAMED simulates a centralized

learning task, thus bypassing non-IIDness resulting

from varying data distributions P

̸= P

. Further, in

contrast to standard FL, there is only a single round

of communication. This means each client can con-

tribute once they are available, and the aggregating

server can wait to train the global model only when

all clients have contributed, without holding up other

participants. Therefore, in addition to bypassing non-

IIDness resulting from differing P

, FLAMED also

addresses non-IIDness resulting from client selection

bias due to nonuniform client availability.

The general framework of FLAMED proceeds as

follows (A speciﬁc implementation is given in Sec-

tion 3.3):

1. (Optional for low-dimensionality datasets) The

clients use a privacy-preserving distributed

dimensionality reduction technique to enable

tractable density estimation, distribution model-

ing, etc., depending on the method used to derive

the global model.

2. Clients perform the statistical analysis sufﬁcient

for the learning task on their (optionally) trans-

formed data, and the resulting information (

for approach 1 or 2, respectively) is sent to the

An Alternative Approach to Federated Learning for Model Security and Data Privacy

293

aggregating server.

3. The server simulates centralized training to con-

struct the global model M .

The default for FLAMED is to follow approach

1 because clients can always simulate the data them-

selves and send only the simulated data to the server.

Here, we are outlining a general approach, so, practi-

tioners themselves must ensure clients do not expose

sensitive information in step 2. For example, sum-

mary statistics or a simple scaled histogram may be

shared with the server but, kernel density estimation

(KDE), which uses observations within its estimated

probability density function (PDF), would leak obser-

vations if its PDF was sent to the server.

3.2 FLAMED: Practical Beneﬁts

FLAMED has several practical beneﬁts that distin-

guish it from traditional FL. Many FL use cases

are resource-constrained but standard FL methods

(e.g. all methods cited in Section 2) require multiple

rounds of communication and computation on partic-

ipants’ devices. FLAMED offers an alternative that

requires only a single round of communication and

computation, with all model training taking place at

the aggregating server. The complexities shown in

Table 2 emphasize this point. FLAMED trades mul-

tiple rounds of local model training at the clients for

a single round of density estimation at the clients and

global model training at the server. FLAMED’s space

requirements are comparable at the client but larger at

the server. Communication is also reduced to a sin-

gle round. Altogether, FLAMED asks for less space,

computation, and availability from the clients in ex-

change for a heavier burden on the server. In gen-

eral, FLAMED’s redistribution of computational bur-

dens may make it more appropriate for settings with

low-resource clients, e.g. internet of things applica-

tions, provided the central server is able to handle the

extra workload. Further, any contribution can easily

be individually excluded from global model training

by imposing a zero weight on that client (Eq. (1)),

making detecting malicious contributions (see Sec-

tion 6.3) and performing contribution evaluation eas-

ier. Also, if a new client joins the FL network, the

global model can be updated without repeating the en-

tire FL process with all participants. FLAMED also

allows for the streamlined design of the global model

because grid search can be performed with little co-

ordination or communication overhead. In contrast,

standard FL methods require the entire procedure to

be repeated for each choice of hyperparameters or

model architecture.

3.3 FLAMED: Speciﬁc Implementation

Input: Data X = [X

, ..., X

]

Output: Global model M

begin

FedSVD(X

, ..., X

); \\Clients get U

for i ∈ [1, K] do

\\At Client i

′

← U

← KDE(X

′

)

′

∼

SendToServer(

′

)

end

\\At Server

′

← [

′

, ...,

′

]

M ← GlobalModel.Train(

′

)

SendToClients(M )

end

Algorithm 1: FLAMED Using Simulation.

In our experiments, to reduce data dimensionality

and make simulation tractable, we consider singular

value decomposition (SVD) (Halko et al., 2011) for

dimensionality reduction. SVD decomposes matrix

X ∈ R

m×n

as X = UΣV

⊤

, where Σ ∈ R

m×n

is diago-

nal, U ∈ R

m×m

, and V ∈ R

n×n

. Using the r columns

of U corresponding to the r largest singular values

in Σ, denoted U

, a low-dimensional transformation

′

∈ R

r×n

is obtained with X

′

= U

However, this approach cannot be directly applied

to our problem because each client would obtain a dif-

ferent U

, each biased toward their local dataset. Thus,

we use FedSVD (Chai et al., 2021), which, with the

help of a trusted masking server, can compute U

the combined data matrix X = [X

, ..., X

], which is

composed of all K clients’ data matrices X

, without

compromising the privacy of any of the clients’ data.

Once each client receives U

, they compute the com-

mon r-dimensional transformation of their data.

The speciﬁc algorithm and implementation of

FLAMED used in this paper is depicted in Algo-

rithm 1. For simulation, we use KDE with the Gaus-

sian kernel. KDE maps points in the feature space to

estimates of the probability density using a weighted

sum of kernel distances from said point to each ob-

servation in the training set. Logistic regression and a

feed forward neural network (NN) were used for the

global model. The time, space, and communication

complexities for the general FLAMED framework

and the speciﬁc implementation in our experiments

along with the comparison approaches are shown in

Table 2.

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

294

Table 2: Time and space complexities for all methods. D, A, and G are placeholders for density estimation, aggregation, and

building the global model, respectively. n, n

, and n

are the total number of observations across all clients, the number of

observations at client i, and the max number of observations at any given client, respectively. We assume a feed-forward NN

for the global model with l layers no larger than m trained for e epochs and R rounds of FL.

Method Time

Space Communication

Client Server Size Rounds

FLAMED(General) O(D) + O(A) + O(G) O(D) max(O(A), O(G)) O(K(size(I

) + size(M ))) 1

FLAMED(KDE) O(n

) + O(lm

ne) O(n

) O(lm

) O(nm + Klm

) 1

FedAvg/Prox/DC O(lm

eR) O(lm

) O(lm

) O(RKlm

) R

4 SECURITY

4.1 Threat Model

In our threat model, we assume any subset of the K

participants, including the aggregating server, could

be malicious and use any means necessary to at-

tempt to learn something about a particular data sam-

ple x belonging to a benign participant. Although

in some settings, it may be inadmissible to allow

global properties of data distributions to be leaked, it

is not obvious that standard FL approaches can pre-

vent this (Wang et al., 2019; Zhu and Han, 2020).

Most data privacy approaches, ours included, focus

on the privacy of any particular sample. The Eu-

ropean Union’s General Data Protection Regulation

(GDPR), a major incentive for the development of FL

algorithms in the ﬁrst place, only applies to “personal

data,” which is data relating to an identiﬁed or iden-

tiﬁable individual (see (Voigt and Von dem Bussche,

2017, Sec. 2.1.2)). Therefore, we allow knowledge

of empirical probability densities of private data to be

learned by adversaries.

Given this threat model, because FLAMED is a

general method, its security would have to be proved

for each individual implementation. Speciﬁcally, if

approach 1 is followed, it must be shown that sharing

X conforms to a particular privacy requirement, which

is application-dependent. If approach 2 is followed,

sharing

must be shown to conform to a particu-

lar privacy requirement (e.g. sharing an approxima-

tion of KDE’s PDF is proven secure in (Wagner et al.,

2023)). There are many varieties of privacy require-

ments. Differential privacy requires that any synthetic

datasets generated from neighbouring private datasets

(i.e. datasets that differ by one element) have a near

equal probability of occurring (Ding et al., 2011).

This can be accomplished by using methods like

PrivBayes for generating synthetic datasets (Zhang

et al., 2017). We leave investigation of this approach

to future work. Below, we prove the security of the

speciﬁc FLAMED implementation deﬁned in 3.3 us-

ing a weaker privacy requirement. Speciﬁcally, we

require that FLAMED releases no certain information

about a particular sample.

Theorem 4.1. FLAMED is secure with respect to our

threat model. That is, FLAMED leaks no certain

information about a particular x

∈ X = [X

, ..., X

]

other than

Proof : In approach 1, where each client simulates

a dataset

∼

, the server only receives

from each

client, which cannot be used to accurately reconstruct

any particular x

with certainty. The server could only

use

X = [

, ...,

] to construct an empirical PDF that

approximates the estimated density

P from Eq. (1).

Even if the server colludes with all but one participant

j, obtaining the private data belonging to clients i ̸= j,

the server will obtain at best a closer approximation to

P, which does not leak any certain information about

a particular x

at the non-colluding participant. Sim-

ilarly, in approach 2, the server only receives density

estimation information

from each client. As dis-

cussed in Section 3.1, the practitioner should ensure

this information does not leak private data (e.g. sum-

mary statistics or histograms can be safely shared).

Here, we prove the security of our general approach

and assume the practitioner will ensure sharing

secure in their speciﬁc implementation. By this as-

sumption, in approach 2, the server also cannot ac-

curately reconstruct any particular x

after receiving

. Therefore, in both approaches, so long as care

is taken in specifying

when using approach 2, the

server does not learn any certain information about a

particular x

. Conversely, the clients may optionally

obtain, at most, the global model trained on

X. Even

if they were able to approximately reconstruct much

of its training data with model inversion attacks, they

would have less certain information than the server,

and, at best, would only be able to reconstruct

Thus, FLAMED leaks no certain information about a

particular x

other than its estimated probability den-

sity. ■

An Alternative Approach to Federated Learning for Model Security and Data Privacy

295

4.2 Security Advantages

As discussed in Section 2, training samples can be

reconstructed from weight updates exchanged in the

standard FL approach (Zhu and Han, 2020). Thus,

privacy can be compromised if no measures are

taken to hide weight updates from the server. In

FLAMED, no gradients are exchanged, so this type

of attack is not feasible. Further, (Bagdasaryan

et al., 2020) showed that directly manipulating gra-

dient updates (model poisoning) is more effective

than manipulating training data (data poisoning).

This makes FLAMED inherently more robust against

model backdoor attacks, as demonstrated in Sec-

tion 6.3. FLAMED addresses non-IIDness, gradi-

ent leakage, and gradient poisoning simultaneously.

Other approaches require observing raw gradients or

client state information to correct biases or detect poi-

soned gradients. This makes securing such meth-

ods against gradient leakage attacks more difﬁcult.

Conversely, addressing gradient leakage by obscuring

raw gradient information makes correcting bias and

detecting data poisoning more difﬁcult. FLAMED

presents no such trade-off by allowing careful anal-

ysis of all client contributions while not exchanging

gradient information and maintaining privacy.

FLAMED allows the aggregating server to obtain

a global model that is not known to other participants.

To share this property with FLAMED, most standard

FL methods would require intensive redesign with

costly HE or other privacy-preserving methods. It is

easy to see the use cases for such an FL method. For

example, consider an FL scenario with untrusted par-

ticipants, such as cell phone users. The participants

may want to isolate the trained model at the aggregat-

ing server to maintain the aggregating server’s sole

proprietorship of the global model or to strengthen

privacy guarantees because sharing the global model

with participants may allow them to perform model

inversion attacks, exposing participants’ private data

to one another.

5 EXPERIMENTS

5.1 Comparison Approaches

We compare FLAMED with FedAvg, FedProx, and

FedDC. All approaches come with a strong theoreti-

cal foundation. FedAvg, has over 20,000 citations and

is included as a baseline FL approach. A survey of

the FL literature showed FedProx reported the largest

accuracy increase over FedAvg (Liu et al., 2020, Ta-

ble 11). FedDC is more recent and outperformed Fe-

dAvg, FedProx, and other approaches from the liter-

ature. We thus include FedProx and FedDC to repre-

sent the state-of-the-art.

5.2 Performance Comparison: Setup

and Conﬁgurations

Synthetic Datasets. To evaluate FLAMED against

the comparison FL techniques under a variety of sce-

narios, we used synthetic datasets generated follow-

ing the approach used by the authors of FedProx. In

their approach, parameters α = β control how non-

IID different client datasets are. Where we depart

from FedProx is in the conﬁgurations of the syn-

thetic datasets used. The mean number of observa-

tions held across all clients is determined in propor-

tion to the number of features m as ρm, where ρ is

an experimental hyperparameter. In the following, n

is the number of observations held at client C

. The

distribution of the number of observations across all

clients is either uniform (i.e. n

= ρm ∀i) denoted U,

or a modiﬁed log-normal distribution L = n

∗

ρm

where n

∗

∼ lognormal(µ, 2) and µ and 2 are the mean

and standard deviation, respectively, of the underlying

normal distribution; µ is chosen such that E[n

∗

] =

ρm

and thus, the mean number of observations at each

client C

is E[L ] = E[n

∗

] +

ρm

= ρm.

In our initial experiments we applied FLAMED

and the comparison methods to 1,536 synthetic

distributed dataset conﬁgurations deﬁned by the

cross product: K ∈ {2, 4, 8, 16} × c ∈ {2, 4, 8, 16} ×

m ∈ {8, 32, 128, 512} × ρ ∈ {5, 10, 20} × α = β ∈

{IID, 0, 0.5, 1} × D ∈ {U, L} where c is the number

of classes and α = β = IID denotes IID data across

all clients. After our results from these initial exper-

iments (discussed in Section 6.1), we explored how

FLAMED handled higher levels of non-IIDness and

repeated our initial experiments, but with α = β ∈

{1.5, 2}, adding 768 conﬁgurations. We also repeated

our initial experiments but with a higher number of

clients K ∈ {32, 64, 128} and ρ ∈ {5, 10}. Conﬁgura-

tions with K = 128 ∧m = 512∧ ρ = 10 were excluded

due to time constraints. This added another 736 con-

ﬁgurations, for 3,040 conﬁgurations in total.

Real Dataset. To evaluate FLAMED in a real-world

federated setting, we used the eICU Collaborative Re-

search Database (Pollard et al., 2018). This dataset

contains real-world medical data from over 200,000

ICU admissions to more than 200 medical centres

across the United States. Unlike other commonly

used datasets, the eICU dataset represents a real-

world federated setting instead of a contrived one ob-

tained by separating a centralized dataset. We use the

data contained in the drug infusions table. Each row

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

296

in our feature matrix X corresponds to a patient, while

each column corresponds to a drug. If a patient i re-

ceives any dose of a certain drug j across any of their

ICU admissions, then X

i j

is set to 1. Otherwise, it is

set to 0. A patient is assigned label 0 if their discharge

status in the patient table is “alive”, and 1 otherwise.

After removing any hospital with less than 10 obser-

vations, we are left with 3,069 features and 72,959

patients held across 132 hospitals. In addition to us-

ing all 132 clients, we test 22 different conﬁgurations

deﬁned in the set K ∈ {2, 4, 8, 16, 32, 64, 128} × S ∈

{smallest, middle, largest}. Here, K is the number of

hospitals used and S denotes the strata of hospitals we

select from. That is, if S = middle, then we select the

K hospitals with the nearest to the median number of

patients, while if S = smallest or S = largest, then we

select the K hospitals with the least or most number

of patients, respectively.

Model Parameters. For FedAvg, FedProx, and

FedDC, we performed 200 rounds of standard FL to

train a feed-forward NN. In all experiments, this was

more than enough rounds for the global model to con-

verge. Intermediate layers have ⌊

⌋ neurons, unless

m = 2, in which case they have 2 neurons. Other

parameters are determined through grid search using

balanced accuracy for evaluation to account for class

imbalances. For FLAMED, FedSVD is used to trans-

form the dataset into r ∈ {2, 4, 8} dimensions. Grid

search is used to determine the optimal KDE simu-

lation parameters which minimize the log-likelihood

score of a held-out local test set. After simulation and

the training of a global model, the optimal values for

r and the hyperparameters of the global model are de-

termined using balanced accuracy on a validation set

that consists of ∼ 10% of each local dataset. For the

global model, logistic regression (LR) and NNs were

compared.

5.3 Security Analysis: Setup and

Conﬁgurations

We recreated the attacks in (Bagdasaryan et al., 2020)

which used poisoning to cause the global model to

only misclassify observations with certain feature val-

ues, called the backdoor, while not affecting the over-

all accuracy of the global model. We used the full

eICU dataset after preprocessing as described in Sec-

tion 5.2. For all methods, we varied the number of

attacking clients, using the clients with the nearest to

median number of observations. The backdoor was

set when column 377 is 1 with target label 1. Only

one observation in the benign data had this column

set to 1, and it had the label 0. Thus, the poison-

ing objective was contradictory to the benign data but

should not have caused overall degradation in model

performance. The poisoned training data consisted of

the backdoor and some noise in the form of random

columns set to 1 to help the model to generalize the

learned backdoor.

For attacking FedAvg, FedProx, and FedDC, we

followed the model poisoning approach presented

in (Bagdasaryan et al., 2020). We also performed

the backdoor attack via data poisoning as a baseline

comparison with the data poisoning attack against

FLAMED. At the server, we tested two different de-

fences that were also presented in (Bagdasaryan et al.,

2020), computing the cosine distance and the L

dis-

tance between each client’s weight update and the

global model. It is assumed that updates with higher

or cosine distances are anomalous and represent

poisoning attempts. In practice, these defences cannot

be deployed in conjunction with secure aggregation;

however, we report their effectiveness here as a best

case scenario. Further, also following (Bagdasaryan

et al., 2020), in order to evade detection, the attacker

modiﬁes their loss function to include an “anomalous

loss” term, weighted with 1 − α. This term penalizes

weight updates with large L

or cosine distances, de-

pending on the defence deployed (see (Bagdasaryan

et al., 2020, Eq. (4))). The strength of the attack-

ers’ poisoned update and the weight of the anoma-

lous loss term are controlled using the hyperparam-

eters denoted γ and α in the original paper. In our

experiment, γ and α were varied across {50, 75, 90}

and {0.4, 0.5, 0.7, 1}, respectively. Attackers were se-

lected in every round of federated training.

For FLAMED, as discussed in Section 4.2,

we are conﬁned to data poisoning because no

gradients are exchanged. We injected poi-

soned training data before FedSVD dimension-

ality reduction and varied the number of poi-

soned training observations as a multiple p ∈

{0.01, 0.05, 0.1, 0.2, 0.3, 0.5, 0.8, 1, 2, 4, 8, 16, 32, 64}

of the amount of training data at the attacking

client(s). We assume that the backdoor, being

inserted by only a few clients and by deﬁnition not

present in the original data, will be rare. Therefore,

any observations containing the backdoor will be an

outlier; so, we used anomaly detection to ﬁnd poi-

soned observations. We tested six defense methods

using two anomaly detection algorithms, local outlier

factor (LOF) (Breunig et al., 2000) and isolation

forest (IF) (Liu et al., 2008), on the simulated data,

the centroid of each client’s simulated data, and the

centroid of each class’s simulated observations at

each client.

To evaluate the poisoning attacks, we used the

backdoor success rate (BSR) which is the percent-

An Alternative Approach to Federated Learning for Model Security and Data Privacy

297

age of observations in the poisoned test data that fool

the global model into predicting the target label. The

poisoned test data contain the backdoor and some

noise, which tests how well the backdoor general-

ized. We also record the area under curve (AUC) of

the defences by assigning positive labels to poisoned

updates or data and using the L

/cosine distance or

LOF/IF outlier scores as the predictions. The attacker

aims to insert an effective model backdoor with a high

BSR while also going undetected, meaning the de-

fence method scores a low AUC. Any other result is

good for the defender because then the attack is either

detected, ineffective, or both.

6 RESULTS

6.1 Synthetic Data

Table 3: Balanced accuracy for each method averaged over

the initial synthetic dataset conﬁgurations including and ex-

cluding the entirely IID conﬁgurations.

Method

Balanced Accuracy

Conﬁgs. Excluding

α = β = IID

All Conﬁgs.

FedAvg 0.7962 0.8092

FedProx 0.8009 0.8125

FedDC 0.8023 0.7952

FLAMED 0.7741 0.7226

In real-world FL settings, entirely IID datasets are

exceedingly rare. We also gain more from non-

IID datasets because each client’s contribution holds

different information about the global learning task.

Therefore, we are more interested in the non-IID con-

ﬁgurations and provide the average balanced accu-

racy scores both including and excluding the conﬁg-

urations with entirely IID data. The averaged bal-

anced accuracy scores for each method across all

dataset conﬁgurations in our initial experiments are

shown in Table 3. FedAvg, FedProx, and FedDC

were the overall winners, performing mostly at par,

but FLAMED remained competitive in the non-IID

experiments. This can already be considered a suc-

cess because, in addition to FLAMED’s performance

(which was within 3% balanced accuracy of the best-

performing method across all non-IID settings), it of-

fers the security advantages discussed in Section 4

and the practical beneﬁts mentioned in Section 3.2.

It was not expected that any one approach would

perform best across all scenarios, and as we will

see, the results in Table 3 represent only a superﬁ-

cial glance at the true utility of each method. We

break down our results with respect to conﬁguration

parameters found to heavily affect the relative perfor-

mance of the compared methods. The average bal-

anced accuracy across all conﬁgurations with respect

to the level of non-IIDness for our initial experiments

and extended experiments with greater levels of non-

IIDness are shown in Fig. 2. We can see that the intro-

duction of even slight non-IIDness resulted in a very

large improvement in the performance of FLAMED.

This trend continued as non-IIDness increased, with

diminishing returns, until FLAMED scored just 0.007

average balanced accuracy below the best-performing

comparison approach, FedDC. The fact that the dif-

ference between the entirely IID conﬁgurations and

conﬁgurations where α = β = 0 alone is so great illus-

trates the importance of considering the non-IID con-

ﬁgurations separately. FLAMED performed worse

on IID data, and because the comparison methods

do not behave in the same way, we cannot con-

sider this an artifact of the synthetic dataset genera-

tion. Rather, this may result from FedSVD failing to

preserve classiﬁcation-relevant information or inter-

ference from overlapping estimations across similar

client distributions.

Increasing the number of clients also increases

non-IIDness. Fig. 3 shows, for each method, the bal-

anced accuracy with respect to different numbers of

clients averaged across all conﬁgurations in our initial

experiments and in our extended experiments with a

greater number of clients. The plot shows that as the

number of clients increased, the relative performance

of FLAMED improved. When not considering en-

tirely IID conﬁgurations, FLAMED performed better

than FedAvg and FedProx, and was competitive with

FedDC, if the number of clients K ≥ 16.

A principal characteristic of any ML problem is

the number of features being used for prediction. In

our case, it is especially important because transfor-

mation to a low-dimensionality feature space is re-

quired for tractable simulation. Fig. 4 shows the av-

erage balanced accuracy for each method with re-

spect to different dataset dimensionalities for the ini-

tial experiments, the experiments with a greater num-

ber of clients, and the experiments with high non-

IIDness. From the ﬁgure, we can see that in the ini-

tial experiments, when the dimensionality was lowest,

FLAMED performed best. Further, the results show

that when non-IIDness is increased by increasing the

number of clients or by increasing α = β, FLAMED

is performant, even at higher dimensionalities, where

simulation is not tractable without ﬁrst using dimen-

sionality reduction. This outcome is corroborated by

our results on the real dataset.

Overall, our results show there are several sce-

narios where FLAMED performs well. Speciﬁcally,

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

298

MIIID

0 0.5 1 1.5 2

0.4

0.6

0.8

Non-IIDness: value of α = β

Balanced Accuracy

FedAvg

FedProx

FedDC

FLAMED

Figure 2: Avg. balanced accuracy for each method across

initial (unshaded) and extended conﬁgurations with high

non-IIDness (shaded) for different levels of non-IIDness.

2 4 8

128

0.4

0.6

0.8

No. of Clients

Balanced Accuracy

FedAvg

FedProx

FedDC

FLAMED

Inc. IID

Exc. IID

Figure 3: Avg. balanced accuracy versus the number of

clients across the initial (unshaded) and extended conﬁgu-

rations with a greater number of clients (shaded).

8 32 128

512

0.4

0.6

0.8

No. of Features

Balanced Accuracy

8 32 128

512

No. of Features

8 32 128

512

0.4

0.6

0.8

No. of Features

Balanced Accuracy

FedAvg

FedProx

FedDC

FLAMED

Ex. IID

Inc. IID

Figure 4: Avg. balanced accuracy versus the number of

features for each method across all conﬁgurations in the

initial experiments (top-left) the extra conﬁgurations with

more clients (top-right) and higher non-IIDness (bottom).

when the number of features is low, so as to keep

simulation tractable (of course this is dependent on

the number of informative features, which determines

SVD’s ability to successfully preserve all meaning-

ful information) and where non-IIDness is high be-

cause of highly heterogenous client distributions or a

large number of slightly heterogenous client distribu-

tions. It is important to note that FLAMED’s rela-

tive performance improves as the number of clients

increases because this ampliﬁes the non-IIDness of

the data, which impacts FLAMED’s performance less

adversely compared to other methods. However, in

scenarios where the data is IID, this advantage may

diminish. Regardless of the data distribution, each

client must possess sufﬁcient data to achieve reliable

local data density estimation.

6.2 Real Data

The results of our experiments on the eICU dataset

are presented in Fig. 5. The plot shows the best bal-

anced accuracy achieved by each method with vary-

ing numbers of clients. The test set is taken from

each client that was used in training. Unsurprisingly,

when clients with a smaller number of observations

were used, the relative performance of FLAMED was

lowest because many observations are needed to re-

liably train a simulator versus a classiﬁer. When

larger clients were used, the relative performance of

all methods became much closer. When we had

a high number of clients, FLAMED had good per-

formance, beating FedAvg and FedProx when using

more than 32 of the clients with the largest number

of samples, and remaining competitive with FedDC.

Notably, FLAMED performed best on the full eICU

dataset with all but the three smallest clients. When

using the full eICU dataset, these three excluded

clients likely contributed poorly simulated data, so

FLAMED performed second best.

2 4 8

128

0.4

0.6

0.8

No. of Clients (small)

Balanced Accuracy

2 4 8

128

No. of Clients (median)

2 4 8 16 32 64128132

0.4

0.6

0.8

≀

No. of Clients (large)

Balanced Accuracy

FedAvg

FedProx

FedDC

FLAMED

Figure 5: The balanced accuracy for each method evaluated

on the eICU dataset with varying number of clients using

the clients with the smallest (top-left), nearest to median

(top-right), and largest (bottom) number of samples.

An Alternative Approach to Federated Learning for Model Security and Data Privacy

299

6.3 Security Analysis: Poisoning

Comparison

Here, we present the results of our experiments com-

paring the practicality of model backdoor attacks

against FedAvg, FedProx, FedDC, and FLAMED.

In Fig. 6, we show the resulting AUC and BSR of

the backdoor attacks via data and model poisoning

against FedAvg, FedProx, and FedDC, as described

in Section 5.3. Each point corresponds to different

attack techniques and parameters. For the data poi-

soning attacks, in which case no evasion can be per-

formed by the attacker, the maximum AUC from ei-

ther the L

or cosine defence is presented. For the

model poisoning attacks, the AUC resulting from the

speciﬁed evasion technique’s corresponding defence

method is used. If no evasion technique is used, the

maximum AUC from either the L

or cosine defence

is presented. The results indicate there are many in-

stances where the model poisoning attack is success-

ful, with over 90% BSR, while going undetected by

the defence methods used. However, the data poison-

ing attack was always detected with a high AUC, cor-

roborating the results from (Bagdasaryan et al., 2020).

Regardless of the FL method and the defence method

used, there were attack conﬁgurations where the BSR

was greater than 90% and the AUC lower than 60%,

meaning the attacks were undetected but potent. The

reader should also note that the AUCs presented here

represent a best-case scenario beacuse, in practice,

client updates are not visible to the aggregating server

due to privacy concerns.

It is important to note that in cases where the AUC

is below 50, the defender cannot simply ﬂip the pre-

diction to achieve a better-than-random AUC. The

AUC is so low primarily because the attacker’s eva-

sion techniques minimize the L

norm or cosine dis-

tances, and therefore the outlier score, of their poi-

soned updates. Intuitively, any outlier score exceed-

ing a given threshold should be considered anoma-

lous, and the corresponding update ignored. How-

ever, this approach would also eliminate many benign

updates because the attackers minimize their outlier

score, signiﬁcantly compromising accuracy.

Data Poison No Evasion

L2 Evasion Cosine Evasion

FedAvg FedProx FedDC

0 0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

BSR

AUC

Figure 6: BSR obtained with the different poisoning attack

conﬁgurations and the AUC of the corresponding defence

method.

Fig. 7 shows the results of the backdoor attacks

via data poisoning against FLAMED. IF used on the

simulated data was very effective, scoring a high

average AUC regardless of the attack conﬁguration.

This demonstrates that FLAMED, unlike standard FL

methods, enables effective attack detection without

exposing gradients to the aggregating server. Also,

many data poisoning attacks achieved poor results,

but there is no reliable method to determine good at-

tack parameters. Attackers must guess the best at-

tack parameter settings and risk an ineffective attack

or being detected. Therefore, a data poisoning attack

on FLAMED is impractical. In contrast, the authors

in (Bagdasaryan et al., 2020) present methods for ob-

taining good model poisoning attack parameters with

little prior knowledge of the FL network. However,

as stated, model poising attacks cannot be performed

against FLAMED beacuse it doesn’t exchange gradi-

ents.

0.5

0.2

0.4

0.6

0.8

BSR

AUC

0.5

BSR

LOF

0.5

BSR

0.5

0.2

0.4

0.6

0.8

BSR

0.5

BSR

Isolation Forest

0.5

BSR

Raw Simulated Samples Client Centroids Client Class Centroids

Figure 7: The BSR of data poisoning attacks against FLAMED versus the average AUC across all parameter settings for all

defence methods. Repeated marker shapes correspond to different attack conﬁgurations.

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

300

7 CONCLUSION

In this work, we introduced the FLAMED frame-

work and compared it to FedAvg, FedProx, and

FedDC. FLAMED demonstrated strong performance

in handling non-IID data and detecting attacks against

model performance while resisting gradient-based

privacy attacks. FedSVD effectively reduced the di-

mensionality of large datasets (3,069 features) for ac-

curate simulation. While FLAMED’s performance

was competitive, it represents an early step in FL with

estimated densities, whereas comparison approaches

like FedDC represent the culmination of eight years of

research interest. Future research directions include

developing FedSVD approaches that eliminate the

need for a masking server and extending FLAMED

to settings such as categorical features, online learn-

ing, and vertical FL.

REFERENCES

Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., and

Shmatikov, V. (2020). How to backdoor federated

learning. In Chiappa, S. and Calandra, R., editors,

Proceedings of the Twenty Third International Confer-

ence on Artiﬁcial Intelligence and Statistics, volume

108 of Proceedings of Machine Learning Research,

pages 2938–2948. PMLR.

Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A.,

McMahan, H. B., et al. (2017). Practical secure ag-

gregation for privacy-preserving machine learning. In

Proceedings of the 2017 ACM SIGSAC Conference on

Computer and Communications Security, CCS ’17,

page 1175–1191, New York, NY, USA. Association

for Computing Machinery.

Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J.

(2000). Lof: identifying density-based local outliers.

In Proceedings of the 2000 ACM SIGMOD interna-

tional conference on Management of data, pages 93–

104.

Chai, D., Wang, L., Fu, L., Zhang, J., Chen, K., and Yang,

Q. (2021). Federated singular vector decomposition.

arXiv preprint arXiv:2105.08925.

Ding, B., Winslett, M., Han, J., and Li, Z. (2011). Differ-

entially private data cubes: optimizing noise sources

and consistency. In Proceedings of the 2011 ACM

SIGMOD International Conference on Management

of Data, SIGMOD ’11, page 217–228, New York, NY,

USA. Association for Computing Machinery.

Gao, L., Fu, H., Li, L., Chen, Y., Xu, M., and Xu, C.-

Z. (2022). FedDC: Federated learning with non-iid

data via local drift decoupling and correction. In 2022

IEEE/CVF Conference on Computer Vision and Pat-

tern Recognition (CVPR), pages 10102–10111.

Halko, N., Martinsson, P. G., and Tropp, J. A. (2011).

Finding structure with randomness: Probabilistic al-

gorithms for constructing approximate matrix decom-

positions. SIAM Review, 53(2):217–288.

Karimireddy, S. P., Kale, S., Mohri, M., Reddi, S., Stich, S.,

and Suresh, A. T. (2020). Scaffold: Stochastic con-

trolled averaging for federated learning. In Interna-

tional Conference on Machine Learning, pages 5132–

5143. PMLR.

Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A.,

and Smith, V. (2020). Federated optimization in het-

erogeneous networks. Proceedings of Machine Learn-

ing and Systems, 2:429–450.

Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2008). Isolation

forest. In Proceedings of the 2008 Eighth IEEE In-

ternational Conference on Data Mining, ICDM ’08,

page 413–422, USA. IEEE Computer Society.

Liu, Y., Zhang, L., Ge, N., and Li, G. (2020). A systematic

literature review on federated learning: From a model

quality perspective. arXiv preprint arXiv:2012.01973.

McMahan, B., Moore, E., Ramage, D., Hampson, S., and

y Arcas, B. A. (2017). Communication-efﬁcient learn-

ing of deep networks from decentralized data. In Ar-

tiﬁcial intelligence and statistics, pages 1273–1282.

PMLR.

Pollard, T. J., Johnson, A. E. W., Raffa, J. D., Celi, L. A.,

Mark, R. G., and Badawi, O. (2018). The eicu col-

laborative research database, a freely available multi-

center database for critical care research. Scientiﬁc

Data, 5(1):180178.

Voigt, P. and Von dem Bussche, A. (2017). The eu gen-

eral data protection regulation (gdpr). A Practical

Guide, 1st Ed., Cham: Springer International Pub-

lishing, 10(3152676):10–5555.

Wagner, T., Naamad, Y., and Mishra, N. (2023). Fast pri-

vate kernel density estimation via locality sensitive

quantization. In International Conference on Machine

Learning, pages 35339–35367. PMLR.

Wang, Z., Song, M., Zhang, Z., Song, Y., Wang, Q., and

Qi, H. (2019). Beyond inferring class representatives:

User-level privacy leakage from federated learning. In

IEEE INFOCOM 2019 - IEEE Conference on Com-

puter Communications, page 2512–2520. IEEE Press.

Zhang, J., Cormode, G., Procopiuc, C. M., Srivastava, D.,

and Xiao, X. (2017). Privbayes: Private data release

via bayesian networks. ACM Trans. Database Syst.,

42(4).

Zhu, L. and Han, S. (2020). Deep Leakage from Gradi-

ents, pages 17–31. Springer International Publishing,

Cham.

An Alternative Approach to Federated Learning for Model Security and Data Privacy

301