De-Anonymization of Health Data: A Survey of Practical Attacks,

Vulnerabilities and Challenges

∗

Hamza Aguelal

and Paolo Palmieri

Department of Computing & IT, University College Cork, Cork, Ireland

Keywords:

De-Anonymization Attacks, Anonymity, Anonymization Assessment, Health Data Protection, Data Privacy.

Abstract:

Health data ranks among the most sensitive personal information disclosing serious details about individu-

als. Although anonymization is used, vulnerabilities persist, leading to de-anonymization and privacy risks

highlighted by regulations like the General Data Protection Regulation (GDPR). This survey examines de-

anonymization attacks on health datasets, focusing on methodologies employed, data targeted, and the effec-

tiveness of current anonymization practices. Unlike previous surveys that lack consensus on essential empirical

questions, we provide a comprehensive summary of practical attacks, offering a more logical perspective on

real-world risk. Our investigation systematically categorizes these practical attacks, revealing insights into suc-

cess rates, generality and reproducibility, new analytics used, and the speciﬁc vulnerabilities they exploit. The

study covers health-related datasets, including medical records, genomic data, electrocardiograms (ECGs),

and neuroimaging, highlighting the need for more robust anonymization. Signiﬁcant challenges remain in

the literature despite existing reviews. We advocate for stronger data safeness by improving anonymization

methods and advancing research on de-anonymization and assessment within healthcare.

1 INTRODUCTION

Digitalization of healthcare raised challenges, threats,

and complexity in safeguarding patient privacy, espe-

cially with advancements in medical research, public

health, and personalized medicine. As highly sensi-

tive, data misuse can lead to severe privacy breaches

and risks, particularly de-anonymization. Regula-

tions like GDPR (European Union, 2016) and the

Health Insurance Portability and Accountability Act

(HIPAA) (U.S. Congress, 1996) emphasize the need

for stringent data protection in healthcare.

Despite advancements in anonymization prac-

tices, healthcare remains a prime target for breaches,

per IBM’s report (IBM Security, 2023). We highlight

https://orcid.org/0009-0002-9409-4206

https://orcid.org/0000-0002-9819-4880

∗

Funded in part by the European Union (EU),

Grant Agreement no.101095717 (SECURED) and by

Taighde

Eireann – Research Ireland under Grant number

18/CRT/6222. For the purpose of Open Access, the author

has applied a CC BY public copyright licence to any Author

Accepted Manuscript version arising from this submission.

Views and opinions expressed are those of the authors and

do not necessarily reﬂect those of the EU or the Health and

Digital Executive Agency. Neither the EU nor the granting

authority are responsible for them.

insufﬁciency against vulnerabilities exploited via ad-

vanced analytics and auxiliary information.

Previous studies have explored de-anonymization

techniques, but there is a noticeable gap focusing on

practical attacks in healthcare; many address theo-

retical risks in other ﬁelds (different to healthcare).

This hinders the development of effective counter-

measures. Our review addresses this by providing

deep analysis targeting health data, including but not

limited to medical records, ECG and genomic data.

We examine re-identiﬁcation methods and assess

standard practices, techniques and factors leading to

successful breaches to inform the debate on balancing

privacy and utility. Our investigation introduces cat-

egorizations of de-anonymization attacks speciﬁc to

health data and suggests a novel framework for eval-

uating anonymization methods. This framework con-

siders the unique characteristics of health datasets and

advanced analytics, including machine learning (ML)

and deep learning (DL), assessing the availability of

open-access de-anonymization codes and models and

reaching out for access, highlighting the importance

of reproducibility. Through this critical analysis, we

identify signiﬁcant gaps and pinpoint suggestions for

innovative directions.

Aguelal, H. and Palmieri, P.

De-Anonymization of Health Data: A Survey of Practical Attacks, Vulnerabilities and Challenges.

DOI: 10.5220/0013274200003899

In Proceedings of the 11th International Conference on Information Systems Security and Privacy (ICISSP 2025) - Volume 2, pages 595-606

ISBN: 978-989-758-735-1; ISSN: 2184-4356

595

1.1 Contributions and Novelty

Through this survey, we introduce a new framework

and identify gaps. The key contributions are:

• Novel Attack Categorization Framework: In-

troducing a new taxonomy and framework to as-

sess de-anonymization attacks on health data:

1. Dataset types and sources.

2. Methodologies employed with a review of

available or shared models.

3. Success rate evaluation and comparing them to

understand effectiveness better.

4. Incorporation of advanced techniques by anal-

ysis of recent ML and DL methods applied to

de-anonymization missing in previous surveys.

• Practical Emphasis: Unlike existing reviews fo-

cusing on hypothetical risks, we offer insights into

actual risks on practical, real-world attacks, bridg-

ing the gap between research and practice.

• Assessment of Reproducibility, Generality and

Code Availability: Evaluating the availability of

codes to underscore the challenges in reproducing

studies and the importance of open-source prac-

tices and identify gaps in attack generality.

• Ethical and Legal Considerations: We look

at the ethical and legal guidelines, focusing on

GDPR and HIPAA as standards.

• Key Recommendation: Based on the results and

challenges, we suggest recommendations such as

establishing standardized benchmarks for evaluat-

ing de-anonymization in healthcare.

This paper’s remaining sections are arranged as

follows: Sec. 2 covers the background on vulnera-

bilities in health datasets and related works. Sec. 3

outlines the methodology, including selection criteria

and data extraction. Sec. 4 reviews de-anonymization

techniques, categorization, and evaluation. Sec. 5

presents key insights and ﬁndings, while Sec. 6 dis-

cusses strengths, gaps, and recommendations. Fi-

nally, Sec. 7 summarizes contributions and future re-

search directions.

2 BACKGROUND AND RELATED

WORK

De-anonymization (re-identiﬁcation) links back or

extracts anonymized data to individuals, posing a

unique privacy threat, especially with the uniqueness

of medical data. Despite the use of k-anonymity

(Sweeney, 2002), l-diversity (Machanavajjhala et al.,

2007), and t-closeness (Li et al., 2006) (Bayardo and

Agrawal, 2005), vulnerabilities persist and require

practical assessment. Adversaries exploit patterns,

unique identiﬁers Table 1, and correlations to un-

dermine traditional anonymization. The rise in pub-

lic datasets post-2009 (Henriksen-Bulmer and Jeary,

2016) eased access to auxiliary information, driving

attacks. While regulations emphasize anonymization,

we note challenges, especially in healthcare, requiring

approaches. This motivates our systematic evaluation

of de-anonymization effectiveness and ﬁeld insights.

2.1 Related Work

Several studies have explored data de-anonymization

techniques across various domains. (Ding et al., 2010;

Bhattacharya et al., 2023; Ji et al., 2019) focus on

social networks, utilizing graph-based structures and

interactions. Similarly, (Farzanehfar et al., 2021)

examined geolocation by tracing individuals’ move-

ments. Surveys such as (Al-Azizy et al., 2016) and

(Henriksen-Bulmer and Jeary, 2016) reviewed gen-

eral de-anonymization methods, including link pre-

diction and data aggregation. However, they lacked

a focus on health data or practical feasibility.

In healthcare, early works (Sweeney, 1997)

demonstrated the re-identiﬁcation of medical records

by linkage to publicly available voter registration

data, and (Malin and Sweeney, 2004) on genomics

using trail re-identiﬁcation techniques. Authors in

(Prada et al., 2011) summarize the regulatory efforts

and overview of risks in healthcare, and in (Emam

et al., 2011), researchers reviewed anonymization

techniques and limitations, particularly with smaller

datasets. However, their work is dated and does

not reﬂect on emerging empirical issues and practi-

cal de-anonymzaition of health datasets. Earlier re-

views did not fully account for recent advancements

in ML/DL that have changed de-anonymization capa-

bilities, consequently, the need for our survey. For

instance, (Shokri et al., 2017) introduces his mem-

bership inference attack (MIA) against ML models

to show how adversaries can determine if a speciﬁc

record was part of the model’s data (Nasr et al., 2019).

(Lee et al., 2017) presented blind attack using gener-

ative adversarial networks (GANs) and DL in bypass-

ing anonymization, (Yin et al., 2023) and (Lu et al.,

2024) emphasize the increasing threat posed by so-

phisticated DL algorithms.

2.2 Legal and Ethical Considerations

De-anonymization of health data raises legal and eth-

ical concerns. The need to protect Personally Iden-

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

596

tiﬁable Information (PII) and Quasi-Identiﬁers (QIs)

is necessary; regulations emphasize removing or gen-

eralizing identiﬁers (see Table 1) and according to

GDPR (European Union, 2016): ” Personal data

means any information relating to an identiﬁed or

identiﬁable ... by reference to an identiﬁer such as

name, an identiﬁcation number, ... or factors speciﬁc

to the physical, physiological, genetic...”

Table 1: The 18 elements in the HIPAA must be removed

or generalized for a data set to be de-identiﬁed.

Identiﬁer Description

(A) Names

(B) Geographic subdivisions smaller than a State

(D) Telephone numbers

(E) Fax numbers

(F) Electronic mail addresses

(G) Social security numbers

(H) Medical record numbers

(I) Health plan beneﬁciary numbers

(J) Account numbers

(K) Certiﬁcate/license numbers

(L) Vehicle identiﬁers and serial numbers

(M) Device identiﬁers and serial numbers

(N) Web URLs

(O) IP address numbers

(P) Biometric identiﬁers

(Q) Full face photographic images

(R) Any other unique identifying number, char-

acteristic, or code

However, de-anonymization can lead to potential

breaches and ethical implications, among others:

• Patient Trust: Re-identiﬁcation weakens trust be-

tween patients (BEUC, 2023) and providers.

• Informed Consent: Patients must be fully aware

of the risks associated with data sharing.

• Data Utility vs. Privacy: Balancing the utility of

health data against the need to protect individual

privacy is a persistent dilemma.

Our review considers legal and ethical dimensions

and accurately assesses de-anonymization to improve

data privacy in the health sector.

3 METHODOLOGY

This section outlines our approach, including a de-

ﬁned scope, the selection process, inclusion/exclusion

criteria, and the data extraction process. We followed

rigorous guidlines adopted from the established pro-

cedures (Kitchenham, 2004) to ensure the empirical

relevance of the reviewed studies.

3.1 Scope of this Review

This review focuses exclusively on de-anonymization

attacks targeting health-related information. It en-

compasses a range of health data types, including

medical records, neuroimaging data, ECGs, wearable

device data, and genomic data. We emphasize arti-

cles that discuss practical or simulations, providing

empirical evidence on health data rather than theoret-

ical analysis to address the practical risks.

The initial search returned 1170 papers, narrowed

to 146 after initial screening, with 69 deemed rele-

vant following abstract analysis. We ended up with

17 papers in the summary with empirical evidence

on health data types. We explored the availability of

codes, and only ﬁve works explicitly provided access

with one inaccessible link. For the rest, we contacted

the authors directly, and two of them provided access.

This highlights the varying levels of transparency and

accessibility regarding reproducibility.

3.2 Search Strategy

We constructed a detailed search strategy utilizing

Boolean operators and a set of the related keywords

combination Table 2. We supplemented these with

terms like ”Data Privacy Risk” and ”Identiﬁcation”

to ensure comprehensive coverage.

Table 2: Search terms.

Primary search terms Some excluded terms

De-anonymizaiton Social Networks

Re-identiﬁcation Vehicle data

Deanonymization attacks Smart city data

Anonymizaiton Assess-

ment

Marketing and Finance

Analytics

Identiﬁcation Location and Geolocation

Health data

We performed our initial search on the listed

databases to capture literature from ﬁelds: health in-

formatics, computer science, and privacy studies:

• IEEE Xplore

• ACM Digital Library

• Google Scholar

• PubMed

• SpringerLink

Additionally, we manually reviewed references to

avoid missing relevant work.

3.3 Inclusion and Exclusion Criteria

We developed our inclusion criteria, Table 3, to meet

our objectives and research questions.

De-Anonymization of Health Data: A Survey of Practical Attacks, Vulnerabilities and Challenges

597

• What data types and corresponding de-

anonymization studies have been most targeted?

• Did the authors perform or develop an attempt

of de-anonymization attack? What types of at-

tributes, were used?

• What datasets were used in these de-

anonymization studies, and how they char-

acterized in size and diversity?

• How to interpret the results and outputs presented

in the works?

• To what extent the studies address the repro-

ducibility and transparency, such as providing ac-

cess to code?

Table 3: Inclusion Criteria for Study Selection.

Criterion Description

Data Types Involve health data types, including

but not limited to genomic data, ECG

and medical records.

Study Type &

Empirical Vali-

dation

Studies detail empirical attempts,

practical or simulations.

Results and

Quality of

Methodology

Studies must present quantiﬁable re-

sults, including success rates and de-

tailed methodologies.

Peer-Reviewed

Publications

Studies in peer-reviewed journals or

conference proceedings.

Language English-language studies for consis-

tent analysis and interpretation.

Publication Date Studies published from 2010 onwards

Exclusion Criteria:

1. Duplicate or Redundant Publications.

2. Non-Health Data Focus: We excluded studies

outside the healthcare context.

3. Theoretical Studies: We excluded works focus-

ing solely on theoretical analyses.

4. Lack of Methodological and Result Detail:

Studies with insufﬁcient experimental validation

and measurable results.

3.4 Data Extraction Process

We followed a structured data extraction process

(Fig. 1) with four phases:

1. Identiﬁcation: We initially reviewed the studies’

titles, abstracts, and keywords to assess relevance.

2. Abstract Screening: Detailed abstract analysis.

3. Core Eligibility: Full-text review to match re-

search questions.

4. Final Inclusion: Studies offering insights into at-

tacks’ applicability and methodologies.

Figure 1: Data extraction and selection process.

Through this multi-phase process, we narrowed

the selection to 17 papers that provide empirical ev-

idence attacks and measurable outcomes. We then

further extracted insights, methodologies and success

rates aligning with our scope, inclusion 3 and exclu-

sion criteria.

3.5 Structured Analysis

We organized the extracted information into a struc-

tured format to classify and evaluate the ﬁndings’ crit-

ical components, as shown in Table 4.

Table 4: Structured Analysis of De-Anonymization Re-

search in Health Data.

Section Information

Title Analysis of the title and implications

for de-anonymization in health data.

Abstract Summary of aims, methodology, re-

sults, and signiﬁcance.

Methodology

& Objectives

Detailed approach and research goals.

Data Source

& Data type

Inventory of data used and sources.

Background

& Algorithm

Theoretical foundation and algorithms

used.

Attack Type Classiﬁcation of the attack(s) studied.

Results & Ob-

servations

Key ﬁndings success rates and in-

sights.

Limitations Constraints and future research areas.

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

598

4 SURVEY OF

DE-ANONYMIZATION

TECHNIQUES

This section surveys the techniques applied to health

information. We split the processes and assessed the

used approaches. Our deconstruction forms a frame-

work based on categories, data types, and methods

and describes strategies and ﬁndings.

4.1 Data Types and Analysis

We categorize attacks by data types vulnerable to de-

anonymization and detailed vulnerabilities in Table 5.

1. Genomic Data: Genomic data is a key target

due to its unique nature. Studies such as (Gym-

rek et al., 2013; Humbert et al., 2015; Lippert

et al., 2017; Wan et al., 2021) exploit single-

nucleotide polymorphisms (SNPs) and open ge-

netic databases for re-identiﬁcation. Additionally,

(Ayoz et al., 2021; Thenen et al., 2019) use partial

genetic data based on data beacons.

2. Medical Records (Including EHRs): Medical

records often contain sensitive information, par-

ticularly quasi-identiﬁers (QIs). (Ji et al., 2020;

Branson et al., 2020) explored linkage and infer-

ence attacks on EHRs using probabilistic models.

Furthermore, (Antoniou et al., 2022) emphasized

the risk associated with k-anonymity, highlighting

the need for stronger anonymization.

3. Neuroimaging Data: Neuroimaging data, such

as MRI scans, contain unique patterns in brain

structures discussed in (Ravindra and Grama,

2021; Venkat-esaramani et al., 2021) to exploit

deep learning (DL), notably convolutional neural

networks (CNNs), in identifying individuals.

4. Electrocardiograms (ECGs): Due to their

uniqueness, ECG signals have been increasingly

used for re-identiﬁcation (Ghazarian et al., 2022).

Studies like (Min-Gu Kim, 2020), (Hong et al.,

2020), and (Mitchell et al., 2023) demonstrate

various ECG-based re-identiﬁcation techniques

and implement models such as Support Vector

Machines (SVM), highlighting vulnerabilities ex-

ploited for attacks.

5. Wearable Device Data: Wearable devices gen-

erate continuous streams of health-related data,

such as vital signs. (Lange et al., 2023) and (Min-

Gu Kim, 2020) demonstrated that similarity-

based attacks, founded on techniques like Dy-

namic Time Warping (DTW), can align time-

series data and re-identify users based on vital pat-

terns.

In Sec.5 we further explore the deconstruction and

key insights (Table 6, Table 7).

4.2 Classiﬁcation of Attacks

To address the concerns about the attack classiﬁca-

tion, and based on adversaries’ objectives and tech-

niques, we organize them into consistent categories to

address gaps and overlaps in the literature and ensure

each class is distinct and self-contained. While these

are known attacks applied in various contexts, cur-

rent classiﬁcations are often inconsistent, with some

classes overlooked or ambiguously grouped. Our

framework aims to clarify these distinctions and pro-

vide a reference for future research.

While a uniﬁed threat model is not explicitly de-

ﬁned, we categorize de-anonymization under estab-

lished criteria. Each study inherently deﬁnes its own

threat model tailored to the dataset, methodology,

and objectives. By evaluating each attack, we indi-

rectly address the adversarial settings relevant to these

works.

4.2.1 Inference Attacks

Settling as one of the most common classes (see

Fig. 2), involves extracting sensitive information us-

ing ML/DL (Wu et al., 2020) and aggregating hidden

patterns from the same or multiple sources. Discussed

in (Shokri et al., 2017; Thenen et al., 2019) by deduc-

ing attributes and use of auxiliary information.

Figure 2: Evolution of De-anonymization Techniques Over

Time (Health Data).

4.2.2 Linkage or Matching Attacks

Linkage or matching attacks connect records from

different datasets to identify individuals using aux-

iliary information, as demonstrated in (Venkat-

esaramani et al., 2021). These frequent attacks form

a foundational class of de-anonymization methods.

De-Anonymization of Health Data: A Survey of Practical Attacks, Vulnerabilities and Challenges

599

Table 5: De-Anonymization Vulnerabilities and Techniques by Health Data Type.

Data Type Key Vulnerabilities Techniques Used in Attacks ML/DL Methods and Modiﬁcations

Genomic

Data

Unique genetic markers

(e.g., SNPs), open ge-

nealogy dbs

SNP linkage; allele frequency

comparisons; mutation patterns

and beacons exploitation

Feature selection for SNP density, RNNs

for sequential patterns; Probabilistic and

regularization for diversity

Medical

Records and

EHR

Sensitive QIs; health

conditions and at-

tributes inferences

Linkage with auxiliary data;

probabilistic models, stylometric

analysis and Bayesian models

CNNs with attribute encoding layers,

QIs feature correlation analysis; RNNs

for temporal sequence data

Neuroimaging

(e.g., MRI)

Unique patterns in

brain structures, high-

dimensional features

Deep feature extraction using

CNNs, structural matching and

spatial features capture

CNNs and transfer learning to leverage

pre-trained models; optimized layers to

isolate unique brain patterns

ECG Data Heartbeat variability,

signal characteristics,

periodic waveforms

Signal decomposition; DTW and

ﬁducial/non-ﬁducial methods for

feature and temporal variations

LSTM for time dependencies; hybrid

CNN-RNN, regularization and data aug-

mentation to increase accuracy

Wearable

Device Data

Continuous metrics,

lifestyle identiﬁers,

vital signs

DTW; cross-user sensor patterns’

and behaviour-based tracing

Multi-modal CNNs for sensor fusion

data, clustering for temporal similarity;

attention layers for pattern recognition

4.2.3 Membership Attacks

Designed to determine if an individual’s data is en-

closed in a dataset, which challenges anonymization

by demonstrating that individuals can be pinpointed.

The concept of MIA has been tested and validated

(Ayoz et al., 2021)and discussed in the literature as

a threat to the privacy of health datasets, even with

the current privacy-preserving solutions as synthetic

data generation (Zhang et al., 2022).

4.2.4 Prediction Attacks

Prediction attacks, though related to inference attacks,

focus precisely on predicting sensitive attributes or

traits. This attack aims to forecast characteristics such

as phenotypic traits in our scope, (Lippert et al., 2017)

demonstrated its effectiveness in genomics. These at-

tacks are distinct from inference ones due to the focus

on forecasting instead of aggregation.

4.2.5 Correlation Attacks

Exploit the statistical relationships between

anonymized and external data as (Ji et al., 2020)

deﬁned and simulated similarly to inference, but

the suggestion is to distinguish this class for the

reliance on cross-referencing statistical relationships.

(Narayanan and Shmatikov, 2008) highlight the

power of correlation and validate this as a strategy.

4.2.6 Justiﬁcation of Classiﬁcation

Our classiﬁcation is founded on the literature and ap-

plications in different domains. The distinctions are

based on the adversary’s objectives and the speciﬁc

techniques. Some papers merge certain types for the

overlapping methodologies, but we chose to separate

them to reﬂect the differences acknowledged.

4.3 Methodologies Summary

This section reviews the diverse methodologies and

techniques extracted from ML/DL techniques to

probabilistic models. Below, we summarise the meth-

ods and principles for piloting the approaches.

1. Machine Learning and Deep Learning

ML and DL are fundamental in health, enabling

de-anonymization through feature extraction from

complex datasets. DL was used to identify unique

patterns (Wu et al., 2020), CNNs achieved over

94% accuracy in neuroimaging (Ravindra and

Grama, 2021), and unsupervised learning linked

facial and genomic data (Venkat-esaramani et al.,

2021). The deployment of ML/DL leverages cor-

relations to bypass traditional anonymization, as

RNNs (e.g., LSTM) for ECG rhythm variations’

identiﬁcation and waveform morphologies, CNNs

to capture neuroimaging’s spatial hierarchies in a

pixel pattern and transformers on features across

sequences like in SNP or EHR with distant rela-

tionships between data points.

2. Probabilistic and Statistical Learning Models

Probabilistic models used for genomic data and

estimating phenotypic traits, (Humbert et al.,

2015) combined unsupervised and supervised

learning to predict phenotypic traits and map them

to identities (Gymrek et al., 2013), and (Shringar-

pure and Bustamante, 2015) employed Bayesian

inference and hypothesis testing to detect genome

presence, but we highlight the effectiveness’s de-

crease in larger populations.

3. Clustering and Similarity-Based Techniques

These methods exploit inherent patterns and out-

liers in time series and wearable data. (Lange

et al., 2023) used DTW to achieve a 70.6%

re-identiﬁcation rate on the WESAD dataset

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

600

(Schmidt et al., 2018). K-NN, and similar al-

gorithms analyze distinctive behavioral patterns,

such as ECG peaks and ﬁducial biometrics (Min-

Gu Kim, 2020; Hong et al., 2020).

4. Stylometric and Correlation Techniques

Used mainly to analyze text-based EHR or similar

data by exploiting unique writing styles. (Ji et al.,

2020) presented stylometric techniques to match

data using online proﬁles, which also exploit ex-

ternal metadata (Saxena et al., 2024).

Each methodology was evaluated on a scale from

1 to 5 on ﬁve key attributes (Accuracy, Scalability,

Computational Cost, Reliance on External Data, Data

Diversity Handling) to underscore the strengths and

weaknesses in its applicability to de-anonymization.

The chart in Fig. 3 provides a summary illustrating

how each method performs.

Figure 3: Comparison of Methodologies based on 5 Key

Attributes.

5 FINDINGS

This section highlights key ﬁndings from the sur-

veyed practical studies. Table 6 summarizes attacks

on health datasets based on the evaluated criteria.

5.1 Key Insights

From our analysis of methodologies and ﬁndings, we

highlight several key insights:

• Data-Speciﬁc Focus: Attacks exploit speciﬁc

characteristics and vulnerabilities (see Table 5).

1. Genomic Data Vulnerability: Ranked as the

most targeted due to its uniqueness, Fig. 4.

(a) SNP and genetic genealogy use demonstrated

success rates >80% in (Shringar-pure and

Bustamante, 2015; Edge and Coop, 2020).

(b) Beacons and genome reconstruction (Ayoz

et al., 2021) for information inference.

2. Neuroimaging Data: Exploits unique brain

patterns in MRI scans.

3. ECG and Wearable Device Data: Vulnerable

to attacks using temporal patterns.

Figure 4: Distribution of Data Types Trageted.

• Success Rate Trends: Success rates vary depend-

ing on the strategy Fig. 5, and commonly decrease

as dataset size increases. We explain it with:

1. Increased Data Diversity: Larger datasets in-

troduce variability (Gymrek et al., 2013), con-

fusing the identiﬁcation of unique patterns,

2. Noise in the Data: Increased noise blurs

unique markers.

3. Scalability: ML/DL models can be effective

but struggle with heterogeneous datasets and

require high computational resources.

• Dependency on Auxiliary Information: Attack

success often relies on auxiliary data, such as de-

mographic data and beacons for genomics as in

(Thenen et al., 2019; Ayoz et al., 2021).

• Variation on Explainability: Probabilistic and

clustering methods offer greater clarity (Venkat-

esaramani et al., 2021), while some studies lack

detail due to their methodologies.

• Uniqueness and Sensitivity: Health data

(e.g., biological patterns) requiring speciﬁc re-

identiﬁcation and analysis techniques.

• Dominance of Certain Attacks: Inference and

membership attacks are predominant due to the

adaptability (see Fig. 2), unlike correlation that

depends on the auxiliary data availability.

• Ethical Considerations: Studies followed ethi-

cal guidelines, balancing research progress with

De-Anonymization of Health Data: A Survey of Practical Attacks, Vulnerabilities and Challenges

601

Table 6: Summary of De-Anonymization Vulnerabilities, Techniques and Results by Health Data Type.

Work Main Studied

Data

Datasource Background Knowl-

edge and Technique

Attack Strat-

egy

Results

(Min-

Gu Kim,

2020)

ECG MIT-BIH

NSRDB

Noise removal, 2D

ECG image transfor-

mation, CNN, RNN

Inference at-

tacks

98.9% recogni-

tion (ensemble

network)

(Lange et al.,

2023)

BVP, EDA,

TEMP, ACC

WESAD dataset DTW alignment algo-

rithms, similarity rank-

ing

Similarity-

based inference

attack

70.6% identiﬁca-

tion at k=1

(Venkat-

esaramani

et al., 2021)

3D face im-

ages, SNPs

OpenSNP, LFW,

MegaFace,

CelebA

DL (CNN, VGGFace),

face-to-phenotype pre-

diction

Prediction,

matching at-

tack

80% for small

populations,

<20% for popu-

lations >100

(Humbert

et al., 2015)

SNPs, pheno-

types

OpenSNP,

SNPedia,

23andMe,

FTDNA

Supervised ML, proba-

bilistic models

Inference,

matching at-

tacks

52% success (su-

pervised) for 10

participants

(Gymrek

et al., 2013)

Genetic geneal-

ogy (Y-STR)

Ysearch, inter-

net research

Illumina sequencing,

lobSTR algorithm

Inference, link-

age attacks

12% surname re-

covery

(Ravindra

and Grama,

2021)

Neuroimaging

data

HCP, ADHD-

200

Leverage-score sam-

pling, clustering-based

approach

Matching, in-

ference attacks

94% accuracy for

HCP; 97.2% for

ADHD

(Edge and

Coop, 2020)

Genetic geneal-

ogy

GEDmatch IBS tiling, probing,

baiting

Inference, link-

age attacks

82% genome re-

covery with IBS

Tiling

(Ji et al.,

2020)

Medical

records

WebMD,

HealthBoards

Stylometric, correlation

features, SVM, KNN

Correlation at-

tacks

12.4% re-

identiﬁcation

(Wu et al.,

2020)

Medical

records, mam-

mography

images

Cardiovascular

Disease Dataset,

MIAS, CBIS-

DDSM

Attribute inference,

model inversion, CNN,

MLP

Inference at-

tack

80% attack suc-

cess without de-

fenses; defenses

reduced accuracy

(Branson

et al., 2020)

Medical reports Nepafenac, U.S

Public death

and discharge

Records, FOIA,

Facebook, Red-

dit

Manual pattern match-

ing, auxiliary informa-

tion correlating and

record linkage

Inference, link-

age attacks

6 low-conﬁdence

matches out of

500 patients (pro-

cess consumed

170 hours)

(Thenen

et al., 2019)

Genomic data

(SNPs)

HapMap QI-attack (LD cor-

relations), GI-attack

(Markov chain)

Inference at-

tacks

282 queries for

95% conﬁdence

(Shringar-

pure and

Bustamante,

2015)

Genomic data

(beacons)

1000 Genomes Likelihood-ratio test

(LRT)

Membership at-

tack

>95% detection

with 5,000 SNP

queries

(Ayoz et al.,

2021)

Genomic data

(beacons)

OpenSNP,

HapMap

Clustering, SMOTE,

ensemble classiﬁers

Prediction,

membership,

inference at-

tacks

0.96 precision for

SNP reconstruc-

tion

(Lippert

et al., 2017)

Genomic data 1,061 individ-

uals from San

Diego

PCA for face variation,

ridge regression, max

entropy models

Prediction,

linkage attacks

80% in mixed co-

hort, 50% in sub-

cohorts

(Ayday and

Humbert,

2017)

Genomic data CEPH/Utah

Pedigree 1463,

1000 Genomes

Project

Belief propagation,

graphical models,

Markov models

Inference,

membership

attacks

87% in inferring

hidden SNPs from

multiple relatives,

(Wan et al.,

2021)

Genomic data Craig Venter’s

record, Ysearch,

Publicﬁnders

Backward induction,

greedy algorithm

Inference, link-

age attacks

76% in no-

protection sce-

nario

(Liu et al.,

2018)

Genetic se-

quencing,

phenotype data

eMERGE-PGx,

VUMC SD,

KPW, NW

Likelihood ratio, mem-

bership detection

Membership at-

tacks

80% re-

call/precision

for 16,346 indi-

viduals

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

602

Figure 5: Ranges of Success Rates of De-anonymization Attacks.

privacy. However, de-anonymization remains a

threat, requiring regulatory updates.

Table 7 shows a comparative analysis and assess-

ment based on effectiveness and common vulnerabil-

ities and completes the breakdown done in Sec.4.

6 EVALUATION AND

DISCUSSION

We deliver a critical lens on the literature’s strengths

and challenges, through which we can observe de-

anonymization with rational considerations.

6.1 Key Strengths

• High Accuracy Rates: Reporting high success

when using multiple sources. See Fig. 5.

• Veriﬁcation of Hypotheses: The simulation on

real-world datasets (Connectome and 23andMe)

indicates practical relevance and potential impact.

• Diverse Methodologies: Various methodologies,

including CNNs, RNNs, clustering and proba-

bilistic models, deﬁne the multiple attack types.

• Innovative Attack Techniques: DL and stylo-

metric and behavioural analysis (people’s interac-

tion with data), as (Ji et al., 2020) debate the evo-

lution of methods beyond standard approaches.

• Adaptability to Evolving Data: Flexible to

evolving technologies and emerging data formats

(e.g., IoT-related)(Dimitrievski et al., 2023).

6.2 Challenges

Our work reveals challenges to address in anonymity

assessment. We delve more into these gaps:

• Reproducibility Challenge: The absence of pub-

licly available code, use of restricted or non-

public datasets (e.g., (Lippert et al., 2017; Bran-

son et al., 2020)) and methodological descrip-

tions insufﬁcience restrains veriﬁcation, testing

and hinders reproducibility.

• Scalability of Studies: While some studies use

large datasets (Thenen et al., 2019), other works

were not tested broadly (Min-Gu Kim, 2020) rais-

ing scalability and practicality concerns.

• Underreporting of Negative Results: We high-

light a direction to underreport negative results in

de-anonymization. By not publishing unsuccess-

ful attempts, we risk losing valuable insights.

• Under-Representing Data Types: Several health

data types were studied. However, other valuable

data have not been explored in de-anonymziation,

such as pathology images and speech data.

• Interdisciplinary Approaches & Application

Scope: Current research often lacks integration

across different disciplines and broader applica-

tions, such as fraud detection.

• Lack of Assessment on Advanced Anonymiza-

tion Techniques: We debate the lack of applica-

ble attempts on emerging anonymization namely

Differential Privacy (DP) and advanced synthetic

data generation, holding back our understanding

of the robustness of these methods in real world.

• Poor Explainability: The interpretability of the

white-box models remains limited. Explainable

models will improve transparency in assessments.

• Ambiguity in Legal Regulations: GDPR and

HIPAA lack precise criteria for what is sufﬁ-

ciently anonymized data. An apprehension in

clear legal guidance on de-anonymization risks

can slow efforts to assess patients’ privacy.

6.3 Recommendations

Building upon our survey and the identiﬁed gaps, we

suggest recommendations:

1. Establishement of Standardized Benchmark-

ing and Datasets.

De-Anonymization of Health Data: A Survey of Practical Attacks, Vulnerabilities and Challenges

603

Table 7: Insights and Analysis of De-Anonymization Techniques.

Category Insights Techniques

Comparative Suc-

cess Rates

- Higher success in genomic & neuroimaging data

- Text-based & wearable data require auxiliary info for accuracy

Genomic & Neuro: ML/DL

models; Wearable: Clustering

Methodological

Performance

- ML/DL: Best for structured data (e.g., imaging); CNNs for spa-

tial data, RNNs/LSTMs for time dependencies

- Probabilistic: Effective for probability-based genomic matches,

- Clustering/Similarity: Ideal for repetitive time-series patterns

CNNs, RNNs, LSTM, Bayesian

Inference, likelihood-ratio test-

ing, DTW

Common Vulnera-

bilities

- Genetic markers, biometric patterns (e.g., ECG)

- Auxiliary data (genealogy records, beacons) for linkage

ML/DL, Clustering, Similarity

and Probabilistic models

Feature Extraction - CNNs & DL improve re-identiﬁcation through feature isolation

- Clustering & DTW scalable but need quality data

DL models for high accuracy

Evaluation & Scal-

ability

- CNNs: High accuracy, lower efﬁciency

- Clustering: Scalable, less accurate in noisy environment

CNNs vs Clustering for varied

data adaptability

We suggest developing standardized datasets and

benchmarks for evaluating de-anonymization in

health to facilitate accurate results comparisons

and reproducibility and foster collaboration. Our

advocacy for common benchmarks, including

health data types and anonymization levels, can

include:

• Organized datasets with variables for testing.

• Standard metrics like success rates, accuracy,

and computational efﬁciency.

• Open shared repositories.

2. Promote Reproducibility and Open Science

Practices.

We recommend publicly sharing code, data, and

methods to improve reproducibility, result valida-

tion and study replication. Key steps include:

• Using open-source licenses and documentation.

• Use accessible repositories.

• Complies sharing with ethics and regulations.

3. Enhance Scalability Testing and Evaluation.

Assess attacks on large, diverse datasets to reﬂect

real-world constraints. Our success rate evalua-

tion (Fig. 5) highlights the need for scalability.

4. Encourage Publication of Negative Results.

We advocate for a shift in the research commu-

nity to overcome the under-reporting of negative

results. Sharing negative ﬁndings provides valu-

able insights, mainly preventing redundant efforts

to achieve more reporting transparency.

5. Expand Research to Wider Health Data Types.

We recommend more interdisciplinary collabora-

tion in computer science, data privacy, health-

care and bioinformatics; to open new opportuni-

ties exploring underrepresented types like voice

and speech, pathology images and dietary data.

6. Pratical Assessment of Advanced Anonymiza-

tion Techniques.

We suggest conducting practical attacks on meth-

ods like DP and synthetic data generation in

health. We promote publishing successful and un-

successful results.

7. Broaden the De-Anonymizaiton Applications.

We advocate to explore broader applications

of de-anonymization techniques beyond re-

identiﬁcation, such as:

• Fraud Detection: Using de-anonymization

(e.g., pattern recognition, cross-referencing) to

identify fraudulent activities in healthcare.

• Data Integrity Veriﬁcation: Anonymized data

tamper detection and provenance tracking.

While our paper evaluates de-anonymization tech-

niques and challenges, an extended version will elab-

orate on comparative evaluation, validation, and the

detailed benchmarking framework.

7 CONCLUSION

Our survey underscores the state of de-anonymization

in healthcare, given the metrics, approaches, and data

features used in different studies. This review re-

vealed insights on methods, revealing that genomic,

ECG, and medical record data have been targeted due

to the unique patterns and use of auxiliary informa-

tion. This led us to outline the challenges and pro-

pose actionable suggestions to enhance practical as-

sessments in healthcare.

Our contribution introduces a perspective on a de-

anonymization attack that leverages insights by out-

lining potential strategies and contextualizing them

to data types. We explore the investigation of

telemedicine, raising new privacy challenges caused

by the digital shift in healthcare and integration of

real-time data from various sources, resulting in the

need for more robust data protection measures based

on privacy-preserving mechanisms such as DP and

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

604

tuning models to customizable privacy budgets while

keeping data utility, utilization of techniques like

GANs (Yoon et al., 2020) and Variational Autoen-

coders (VAEs) (Yang et al., 2023) to enhance syn-

thetic data generation as well as homomorphic en-

cryption delivering better protection for health data

(Regazzoni et al., 2024). Our ﬁndings outline a

roadmap for investigations into the privacy and con-

ﬁdentiality of health information. We emphasize the

ethical and secure sharing of health data aligned with

anonymization techniques. We highlight a ﬁne line

between data utility and protection in today’s digital

age, and our ﬁndings set a foundation for ongoing re-

search to maintain this balance.

This review introduced strategies for future works;

as a part, we emphasize expanding our investiga-

tion on developing a detailed benchmarking frame-

work to evaluate de-anonymization techniques com-

prehensively. This includes a comparative evalua-

tion of methodologies, experimental validation, and

reproducibility assessments to bridge gaps in cur-

rent research, go beyond re-identiﬁcation, and use

de-anonymization for fraudulent activity detection

within healthcare systems.

Our extended version will further elaborate on

these aspects, providing a more in-depth analysis of

de-anonymization challenges, research gaps, and in-

novative solutions to advance the ﬁeld.

REFERENCES

Al-Azizy, D., Millard, D., Symeonidis, I., O’Hara, K., and

Shadbolt, N. (2016). A literature survey and classiﬁca-

tions on data deanonymisation. In Risks and Security

of Internet and Systems. Springer International.

Antoniou, A., Dossena, G., MacMillan, J., Hamblin, S.,

Clifton, D., and Petrone, P. (2022). Assessing the

risk of re-identiﬁcation arising from an attack on

anonymised data.

Ayday, E. and Humbert, M. (2017). Inference attacks

against kin genomic privacy. IEEE Security & Pri-

vacy, 15(5):29–37.

Ayoz, K., Ayday, E., and Cicek, A. E. (2021). Genome Re-

construction Attacks Against Genomic Data-Sharing

Beacons. PoPETs, 2021(3):28–48.

Bayardo, R. J. and Agrawal, R. (2005). Data privacy

through optimal k-anonymization. In 21st Interna-

tional conference on data engineering (ICDE’05),

pages 217–228. IEEE.

BEUC (2023). Consumer Attitudes to Health Data Sharing:

Survey Results from Eight EU Countries. Technical

Report BEUC-X-2023-051, BEUC, European Con-

sumer Organisation.

Bhattacharya, M., Roy, S., Chattopadhyay, S., Das, A. K.,

and Shetty, S. (2023). A comprehensive survey on

online social networks security and privacy issues:

Threats, machine learning-based solutions, and open

challenges. Security and Privacy, 6(1):e275.

Branson, J., Good, N., Chen, J.-W., Monge, W.,

Probst, C., and Emam, K. E. (2020). Evaluating

the re-identiﬁcation risk of a clinical study report

anonymized under ema policy 0070 and health canada

regulations. Trials.

Dimitrievski, A., Loncar-Turukalo, T., and Trajkovik, V.

(2023). Securing patient information in connected

healthcare systems in the age of pervasive data col-

lection. In 2023 IEEE MeditCom.

Ding, X., Zhang, L., Wan, Z., and Gu, M. (2010). A brief

survey on de-anonymization attacks in online social

networks. In 2010 CASoN. IEEE.

Edge, M. D. and Coop, G. (2020). Attacks on genetic pri-

vacy via uploads to genealogical databases. eLife,

9:e51810.

Emam, K. E., Jonker, E., Arbuckle, L., and Malin, B.

(2011). A systematic review of re-identiﬁcation at-

tacks on health data. PLoS ONE, 6(12):e28071.

European Union (2016). Regulation (eu) 2016/679 general

data protection regulation. Ofﬁcial Journal of the Eu-

ropean Union. eur-lex.europa.eu/eli/reg/2016/679/oj.

Farzanehfar, A., Houssiau, F., and de Montjoye, Y.-A.

(2021). The risk of re-identiﬁcation remains high

even in country-scale location datasets. Patterns,

2(3):100204.

Ghazarian, A., Zheng, J., Struppa, D., and Rakovski, C.

(2022). Assessing the reidentiﬁcation risks posed by

deep learning algorithms applied to ecg data. IEEE

Access, 10:68711–68723.

Gymrek, M., McGuire, A. L., Golan, D., Halperin, E., and

Erlich, Y. (2013). Identifying Personal Genomes by

Surname Inference. Science, 339(6117):321–324.

Henriksen-Bulmer, J. and Jeary, S. (2016). Re-identiﬁcation

attacks—a systematic literature review. International

Journal of Information Management, 36:1184–1192.

Hong, S., Wang, C., and Fu, Z. (2020). Cardioid: Learning

to identiﬁcation from electrocardiogram data. Neuro-

computing, 412:11–18.

Humbert, M., Huguenin, K., Hugonot, J., Ayday, E.,

and Hubaux, J.-P. (2015). De-anonymizing Ge-

nomic Databases Using Phenotypic Traits. PoPETs,

2015(2):99–114.

IBM Security (2023). Cost of a data breach report 2023.

Technical report, IBM Security.

Ji, S., Gu, Q., Weng, H., Liu, Q., Zhou, P., Chen, J., Li, Z.,

Beyah, R., and Wang, T. (2020). De-Health: All Your

Online Health Information Are Belong to Us. In 2020

IEEE 36th International Conference on Data Engi-

neering (ICDE), pages 1609–1620, Dallas, TX, USA.

IEEE.

Ji, S., Wang, T., Chen, J., Li, W., Mittal, P., and Beyah,

R. (2019). De-sag: On the de-anonymization of

structure-attribute graph data. IEEE Transactions on

Dependable and Secure Computing, 16(4):594–607.

Kitchenham, B. (2004). Procedures for performing sys-

tematic reviews. Keele, UK, Keele University,

33(2004):1–26.

De-Anonymization of Health Data: A Survey of Practical Attacks, Vulnerabilities and Challenges

605

Lange, L., Schreieder, T., Christen, V., and Rahm,

E. (2023). Privacy at risk: Exploiting similari-

ties in health data for identity inference. CoRR,

abs/2308.08310.

Lee, W.-H., Liu, C., Ji, S., Mittal, P., and Lee, R. B.

(2017). Blind de-anonymization attacks using social

networks. In Proceedings of the 2017 on Workshop on

Privacy in the Electronic Society, pages 1–4.

Li, N., Li, T., and Venkatasubramanian, S. (2006).

t-closeness: Privacy beyond k-anonymity and l-

diversity. In 2007 IEEE 23rd international conference

on data engineering, pages 106–115. IEEE.

Lippert, C., Sabatini, R., Maher, M. C., Kang, E. Y., Lee,

S., et al. (2017). Identiﬁcation of individuals by trait

prediction using whole-genome sequencing data. Pro-

ceedings of the National Academy of Sciences.

Liu, Y., Wan, Z., Xia, W., Kantarcioglu, M., Vorobeychik,

Y., Clayton, E. W., Kho, A., Carrell, D., and Malin,

B. A. (2018). Detecting the presence of an individual

in phenotypic summary data. In AMIA Annual Sym-

posium Proceedings, pages 760–769.

Lu, G., Li, K., Wang, X., Liu, Z., Cai, Z., and Li, W. (2024).

Neural-based inexact graph de-anonymization. High-

Conﬁdence Computing, 4(1).

Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkita-

subramaniam, M. (2007). l-diversity: Privacy beyond

k-anonymity. Acm TKDD, 1(1):3–es.

Malin, B. and Sweeney, L. (2004). How (not) to pro-

tect genomic data privacy in a distributed network:

Using trail re-identiﬁcation to evaluate and design

anonymity protection systems. Journal of Biomedical

Informatics, 37(3):179–192.

Min-Gu Kim, Hoon Ko, S. B. P. (2020). A study on user

recognition using 2d ecg based on ensemble of deep

convolutional neural networks. Journal of Ambient In-

telligence and Humanized Computing.

Mitchell, A. R. J., Ahlert, D., Brown, C., Birge, M., and

Gibbs, A. (2023). Electrocardiogram-based biomet-

rics for user identiﬁcation – using your heartbeat as a

digital key. Journal of Electrocardiology, 80:1–6.

Narayanan, A. and Shmatikov, V. (2008). Robust de-

anonymization of large sparse datasets. In 2008 IEEE

Symposium on Security and Privacy. IEEE.

Nasr, M., Shokri, R., and Houmansadr, A. (2019). Compre-

hensive privacy analysis of deep learning: Passive and

active white-box inference attacks against centralized

and federated learning. In 2019 IEEE SP.

Prada, S. I., Gonz

alez-Mart

ınez, C., Borton, J., et al.

(2011). Avoiding disclosure of individually identiﬁ-

able health information: a literature review. SAGE

Open, 1(3):2158244011431279.

Ravindra, V. and Grama, A. (2021). De-anonymization At-

tacks on Neuroimaging Datasets. In Proceedings of

the 2021 International Conference on Management of

Data, pages 2394–2398, Virtual Event China. ACM.

Regazzoni, F., Acs, G., Palmieri, P., et al. (2024). Secured

for health: Scaling up privacy to enable the integration

of the european health data space. In 2024 Design, Au-

tomation & Test in Europe Conference & Exhibition

(DATE), pages 1–4. IEEE.

Saxena, R., Arora, D., Nagar, V., and Chaurasia, B. K.

(2024). Blockchain transaction deanonymization us-

ing ensemble learning. Multimedia Tools and Appli-

cations, pages 1–30.

Schmidt, P., Reiss, A., Duerichen, R., Marberger, C., and

Van Laerhoven, K. (2018). Introducing wesad, a mul-

timodal dataset for wearable stress and affect detec-

tion. In Proceedings of the 20th ACM ICMI, pages

400–408.

Shokri, R., Stronati, M., Song, C., and Shmatikov, V.

(2017). Membership inference attacks against ma-

chine learning models. In 2017 IEEE Symposium on

Security and Privacy (SP), pages 3–18. IEEE.

Shringar-pure, S. S. and Bustamante, C. D. (2015). Pri-

vacy Risks from Genomic Data-Sharing Beacons. The

American Journal of Human Genetics, 97(5):631.

Sweeney, L. (1997). Weaving technology and policy to-

gether to maintain conﬁdentiality. The Journal of Law,

Medicine & Ethics, 25(2-3):98–110, 82.

Sweeney, L. (2002). k-anonymity: A model for protecting

privacy. International Journal of Uncertainty, Fuzzi-

ness, and Knowledge-Based Systems.

Thenen, N. V., Ayday, E., and Cicek, A. E. (2019).

Re-identiﬁcation of individuals in genomic data-

sharing beacons via allele inference. Bioinformatics,

35(3):365–371.

U.S. Congress (1996). Health Insurance Portability and Ac-

countability Act of 1996 (HIPAA). Public Law 104-

191, 110 Stat. 1936.

Venkat-esaramani, R., Malin, B. A., and Vorobeychik, Y.

(2021). Re-identiﬁcation of individuals in genomic

datasets using public face images. Science Advances,

7(47):eabg3296.

Wan, Z., Vorobeychik, Y., Xia, W., Liu, Y., et al. (2021).

Using game theory to thwart multistage privacy in-

trusions when sharing data. Science Advances,

7(50):eabe9986.

Wu, M., Zhang, X., Ding, J., Nguyen, H., Yu, R., Pan, M.,

and Wong, S. T. (2020). Evaluation of inference at-

tack models for deep learning on medical data. arXiv

preprint arXiv:2011.00177.

Yang, R., Ma, J., Miao, Y., and Ma, X. (2023). Privacy-

preserving generative framework for images against

membership inference attacks. IET Communications,

17(1):45–62.

Yin, H., Liu, Y., Li, Y., Guo, Z., and Wang, Y. (2023).

Defeating deep learning based de-anonymization at-

tacks with adversarial example. Journal of Network

and Computer Applications, 220.

Yoon, J., Drumright, L. N., and Van Der Schaar, M. (2020).

Anonymization through data synthesis using genera-

tive adversarial networks. IEEE J-BHI, 24(8).

Zhang, Z., Yan, C., and Malin, B. A. (2022). Membership

inference attacks against synthetic health data. Jour-

nal of Biomedical Informatics, 125:103977.

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

606