Characterising and Categorising Anonymization Techniques:

A Literature-Based Approach

Andrea Fieschi

1,2 a

, Pascal Hirmer

1 b

, Christoph Stach

2 c

and Bernhard Mitschang

2 d

Mercedes-Benz AG, Stuttgart, Germany

Institute for Parallel and Distributed Systems, University of Stuttgart, Stuttgart, Germany

ﬁ

Keywords:

Privacy Protection, PRISMA Systematic Literature Research, Privacy-Enhancing Techniques, Anonymization

Techniques.

Abstract:

Anonymization plays a crucial role in protecting personal data and ensuring information security. However,

selecting the appropriate anonymization technique is a challenging task for developers, data scientists, and

security practitioners due to the vast array of techniques available in both research and practice. This paper

aims to assist users by offering a method for structuring a framework that helps them make informed decisions

about the most appropriate anonymization techniques for their speciﬁc use cases. To achieve this, we ﬁrst

conduct a systematic literature review following the PRISMA guidelines to capture the current state of the art

in anonymization techniques. Based on the ﬁndings from this review, we propose a conceptual organisation

of anonymization techniques, designed to help users navigate the complex landscape of anonymization and

choose techniques that align with their security requirements.

1 INTRODUCTION

Data collection is a necessity in various domains, but

it poses signiﬁcant risks to information security. As

the volume of data collected from sources like pa-

tients, smartphones, or vehicles increases, so does the

potential for exposing sensitive information that indi-

viduals did not agree to disclose. In this context, pro-

tecting personal privacy and securing data are critical

challenges in information security (Stach, 2023).

In order to protect personal privacy, numerous Pri-

vacy Enhancing Technologys (PETs) can be used. A

possible way of achieving effective privacy protection

is anonymization (Majeed and Lee, 2021). The Eu-

ropean General Data Protection Regulation (GDPR)

deﬁnes anonymization as an ”irreversible transforma-

tion of personal data in such a way that the data sub-

ject can no longer be identiﬁed” (European Parlia-

ment and Council of the European Union, 2016).

Following the concept of Anonymization by De-

sign (Fieschi et al., 2024), choosing the most suit-

https://orcid.org/0009-0007-9126-6021

https://orcid.org/0000-0002-2656-0095

https://orcid.org/0000-0003-3795-7909

https://orcid.org/0000-0003-0809-9159

able anonymization technique early in the develop-

ment stages of a data-collecting use case is essen-

tial for ensuring effective privacy protection. Privacy

can guarantee stronger protection if it is considered

during the development process (Morton and Sasse,

2012). However, selecting the right anonymization

technique for a particular data use case presents a sig-

niﬁcant challenge for developers, data scientists, and

security practitioners. Privacy needs to cater to the

needs of both the service provider and the service

users (Fieschi et al., 2023). The heterogeneous land-

scape of anonymization techniques, each with its own

strengths and limitations, makes it difﬁcult to choose

the most suitable approach, especially when aiming

to ensure robust information security. The lack of a

comprehensive, organised framework further compli-

cates the decision-making process, increasing the risk

of inappropriate or ineffective techniques being used,

potentially leading to security breaches.

This paper addresses this gap by proposing how

to structure a collection of anonymization techniques

that offers a comprehensive overview, designed to

support users (developers, data scientists, and security

practitioners) in selecting the appropriate technique

to meet their use case security requirements. Our

goal is to organise a conceptual framework that as-

Fieschi, A., Hirmer, P., Stach, C. and Mitschang, B.

Characterising and Categorising Anonymization Techniques: A Literature-Based Approach.

DOI: 10.5220/0013379100003899

In Proceedings of the 11th International Conference on Information Systems Security and Privacy (ICISSP 2025) - Volume 1, pages 107-118

ISBN: 978-989-758-735-1; ISSN: 2184-4356

107

sists security-conscious users in navigating the com-

plex landscape of anonymization techniques and mak-

ing informed decisions that ensure both privacy and

security. To this end, it is important that the frame-

work supporting users provides an easy-to-navigate

and thorough overview of the available anonymiza-

tion techniques; that it’s not monolithic and allows

each technique to be used as a single piece; and that it

is ﬂexible enough to allow users to incorporate new

techniques published in the literature, developed in

practice, or self-developed.

In this paper, we provide two main contributions.

First, we present the insights gained about the land-

scape of anonymization techniques through a system-

atic literature review, conducted using the PRISMA

method (Moher et al., 2015). It was important for

us to understand the types of techniques available,

the kinds of data they process, and the domains in

which they are applied. Second, we propose a con-

ceptual structure for organising anonymization tech-

niques, based on our literature research ﬁndings.

In Section 2, we present the current state of the

art of collections of anonymization techniques and

we highlight the reasons why the present solutions

do not fully satisfy our needs. Section 3 details our

PRISMA-based literature review and characterises

the anonymization techniques landscape. Building on

these insights, Section 4 explains how our ﬁndings in-

form the structured organisation of a new collection

of anonymization techniques. We then compare our

proposed framework with existing solutions in Sec-

tion 5, highlighting its advantages and addressing cur-

rent shortcomings. Finally, Section 7 concludes the

paper by summarising the key takeaways and suggest-

ing directions for future research and development.

2 RELATED WORK

In the ﬁeld of data anonymization, ensuring robust

privacy protection requires careful consideration of

available techniques. To identify the most suitable

anonymization technique, security-conscious users

would beneﬁt from having access to a well-organised

collection of available techniques that will: 1) Provide

a Comprehensive Overview: Enable users to make

well-informed decisions by offering a complete range

of anonymization techniques for various use cases.

2) Offer Modular Deployment: Design each technique

as an individual module so that users can deploy only

the speciﬁc technique they need, rather than integrat-

ing an entire framework. This approach simpliﬁes

deployment and minimises overhead. 3) Offer Up-

datability: Allow users to add new techniques as they

emerge from research, or are self-developed in order

to maintain the collection relevant and current.

Several software solutions have been created to of-

fer a range of anonymization techniques and support

their implementation. However, these solutions come

with notable limitations that can compromise their

effectiveness in supporting the selection of the most

suitable anonymization technique in a real-world sce-

nario. The solutions found in the practice bring to-

gether only anonymization techniques of a similar na-

ture and act on a focused part of a data stream.

There are software solutions that provide privacy

protection through anonymization by acting at the

very beginning of the data stream. Before the data are

used in any way, these methodologies generate a new

dataset with the same characteristics as the one ac-

quired from real-world scenarios. The Synthetic Data

Vault (Patki et al., 2016) and the Synthetic Data Gen-

eration framework (Walonoski et al., 2017) are ex-

amples of frameworks that provide methodologies for

generating synthetic datasets which can be used for

testing and development without risking real data ex-

posure. These tools are valuable for creating safe en-

vironments for data analysis, but they are not directly

focused on the anonymization of existing datasets.

Other software solutions bring together

anonymization techniques that are apt at modi-

fying the acquired dataset before analysing it or

passing it on to the next data handler. ARX (Prasser

and Kohlmayer, 2015) is a good example as it

is a notable software that provides a rich set of

anonymization techniques, including k-anonymity,

l-diversity, and t-closeness. It is a well-regarded tool

in the ﬁeld, appreciated for its user-friendly interface

and robust algorithmic implementations. However,

despite its strengths, ARX is not without limitations.

Speciﬁcally, it falls short in its integration capabil-

ities; once an anonymization technique is selected,

deploying it within a data processing pipeline is

not straightforward. Its monolithic nature does not

allow for a single algorithm to be used. ARX is

designed more as a standalone application rather than

a modular component that can be seamlessly incor-

porated into existing workﬂows. OpenDP (Gaboardi

et al., 2020) is another good example, some of its

anonymization techniques consist in adding noise

according to the Differential Privacy (DP) postulate,

hence modifying the dataset before it is sent to the

next stage of the data stream. It too comes with the

same limitations mentioned for ARX.

There are also software solutions that offer privacy

protection by employing anonymization techniques

that alter the way data is handled for its intended use.

OpenDP (Gaboardi et al., 2020) and TensorFlow Pri-

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

108

vacy (Abadi et al., 2015) are good examples of it since

they offer an ensemble of techniques apt for ensuring

differential privacy. The ﬁrst provides mechanisms

that allow differentially private queries, while the lat-

ter ensures the successful employment of differen-

tial privacy within machine learning models. While

these libraries are powerful in their respective niches,

they do not provide an overview of all the available

anonymization techniques. Moreover, they too are

not designed as modular, making them ill-suited for

on-the-ﬂy deployment.

All of the above solutions offer a range of meth-

ods, though often not as comprehensive as one might

desire for a ﬂexible and modular approach. These

software solutions present the following problems:

1) Fragmented Coverage: The focus of each one of

them on a speciﬁc type of anonymization approach

does not allow us to use any of these software so-

lutions as a comprehensive overview of all the dif-

ferent techniques available. This fragmentation can

obstruct users from making well-informed decisions

based on a full range of options. 2) Integration Issues:

The standalone nature of many solutions complicates

their integration into existing data processing work-

ﬂows. This lack of modularity limits the practical ap-

plicability of these tools in dynamic and evolving data

environments. 3) Limited Flexibility: Existing solu-

tions often fail to incorporate new techniques emerg-

ing from recent research, address user-speciﬁc needs,

or allow users to incorporate custom techniques, re-

sulting in a lack of adaptability and relevance.

We need to lay the groundwork that allows us

to organise a collection of anonymization techniques

with the characteristics listed at the beginning of this

section. The ﬁrst step to this end is conducting a sys-

tematic literature review, which will serve as the foun-

dation for developing our proposed collection.

3 LITERATURE ANALYSIS:

ANONYMIZATION

TECHNIQUES

To build an organised collection of anonymization

techniques, it is important to get a clear picture of

the methods that exist in the literature. To this end,

we conducted a systematic literature review using the

PRISMA method (Moher et al., 2015). By reviewing

the anonymization techniques available in the litera-

ture, we were able to determine how to structure our

collection of these methods.

Notable surveys on the topic like (Chen et al.,

2009), provide a comprehensive explanation of the

Keyword: “anonymization”

Results: 5255

Keyword: “anonymization AND (approach OR

technique OR algorithm)”

Results: 3472

Keyword: “anonymization AND

(approach OR technique OR

algorithm) AND novel”

Results: 480

Figure 1: Keywords Identiﬁcation for the Papers Research.

subject and its various approaches. Our PRISMA

research provides us with insights into the literature

landscape of anonymization.

3.1 PRISMA Research

The PRISMA method (Moher et al., 2015), short

for Preferred Reporting Items for Systematic Re-

views and Meta-Analyses, is a protocol that guides

researchers through the process of conducting a sys-

tematic literature review. The approach helps ensure

comprehensive coverage and an unbiased selection of

relevant studies. It begins with deﬁning strict cri-

teria for which studies to include and exclude. Re-

searchers then search relevant databases and sources

using a detailed strategy, followed by a careful screen-

ing of studies based on titles, abstracts, and full texts

to ensure relevance. Data is extracted from the chosen

studies using a standardised approach, and the quality

of each study is assessed to identify any potential bias.

The results from these studies are then synthesised,

either quantitatively or qualitatively, to form conclu-

sions. Finally, the process and ﬁndings are reported

in a structured and transparent way, often accompa-

nied by a ﬂow diagram to map out the study selection

process. This methodical approach is designed to en-

sure clarity, thoroughness, and reproducibility in the

review of research literature.

Our research was conducted in 2023 consulting

the Scopus database

and web of science

. Both on-

line platforms index and store peer-reviewed litera-

ture from various sources such as IEEE, Elsevier, and

Springer. The user-friendly interfaces of both plat-

forms enable efﬁcient reﬁnement of search results ac-

cording to various criteria. Since both Scopus and

Web of Science collect scientiﬁc works from more

or less the same sources, the results yielded were ap-

proximately the same for this research. Therefore, we

https://www.scopus.com/

https://www.webofscience.com/

Characterising and Categorising Anonymization Techniques: A Literature-Based Approach

109

chose to use only one of the two platforms and our

choice fell on Scopus.

In order to understand the number of papers

present in literature, we started by entering only the

word ”anonymization” as a search term. We in-

cluded both the British and American spellings of the

word. For practicality, we will only use the American

spelling in the keyword list. This yielded 5255 re-

sults. Given the size of the vast amount of papers, we

proceeded with further reﬁning the search by adding

more search terms. In Figure 1, we see how the num-

ber of results changes when we reﬁne the keyword

search. To focus more, we reﬁned the search term

with ”anonymization AND (approach OR technique

OR algorithm)”, this reduces the number of results to

3472. The addition of ”novel” to the research terms

helped us weed out papers that marginally talk about

anonymization or that are not proposing a new ap-

proach to an anonymization technique.

In Figure 2, we see the PRISMA scheme ﬂow that

led us to identify the papers to be analysed and in-

cluded in our search. Through stages of search reﬁne-

ment and early screening, we managed to eliminate

the papers that would not have contributed to our lit-

erature search by following the PRISMA paradigm.

With 139 papers we have a reasonable reﬂection of

the works present in the literature and the statistical

information we extrapolate from this ensemble helps

us gain a clear overview of the kind of anonymization

techniques present in the literature, their application

domains, and the type of data types processed. In Sec-

tion 3.2, we see the main information we extrapolated

from these results.

3.2 Anonymization Categories

In the literature, we found several different anonym-

ization techniques. The multitude of approaches can

appear rather overwhelming. However, a pattern can

be traced among all of them. To put some order in

the landscape of anonymization techniques we de-

ﬁned 5 overarching categories, as it can be seen in

Section 3.3, under which most techniques can be

grouped. In the following, we explain the categories

we identiﬁed, give them a name, and reference the

main techniques belonging to each category. It has to

be noted that the basic step common to all anonym-

ization techniques is eliminating the direct identiﬁers,

i.e., attributes that directly link the data to a speciﬁc

data source, such as full names, ID numbers, matric-

ulation numbers, etc. This is not enough to guaran-

tee anonymity since the collection of further attributes

that describe the data source, i.e., quasi-identiﬁer, can

still lead to the risk of identifying the data source.

Records identified

from Scopus and

other sources:

Registers (n = 3486)

Records removed before

screening:

Records marked as ineligible

by adding “novel” as a

keyword (n =2992),

Record removed after limiting

the fields to “computer

science”, “engineering”, and

“mathematics” (n = 54)

Records removed for other

reasons (n = 14)

Records screened

(n = 426)

Records excluded after

screening the title, publisher,

keywords

(n = 183)

Reports sought for

retrieval

(n = 243)

Reports not retrieved

(n = 12)

Reports assessed for

eligibility

(n = 231)

Reports excluded:

Application specific (n = 29)

Actual topic different than

anonymization (n = 63)

Studies included in

review

(n = 139)

Identification of studies via databases and registers

Identification

Screening

Included

Figure 2: PRISMA Flowchart.

Therefore, for all the categories we describe in the

following sections, this step is included.

3.2.1 Grouping Based

With the term ”grouping based”, we refer to anonym-

ization techniques that try to guarantee anonymity

Figure 3: Visualisation of the distribution of papers in the

literature according to our categorisation.

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

110

Table 1: Anonymization techniques in the literature.

Anonymization Model Explanation Examples of Sources

Grouping Based k-anonymity, l-diversity, t-closeness, etc. (Sweeney, 2002), (Li et al., 2007), (Machanava-

jjhala et al., 2007), (Khan et al., 2022)

Differential Privacy Reaching the guarantee of the DP postulate (Dwork, 2008), (Dwork and Roth, 2013), (Yu et al.,

2019)

Perturbation Based Obfuscation through data perturbation, e.g.,

noise injection

(Hamm, 2017), (Aljably, 2021), (Shynu et al.,

2020)

Data Synthesize Generate fake data with the same properties

of real data

(Piacentino and Angulo, 2020a), (Piacentino and

Angulo, 2020b)

Encryption Used to enhance anonymity protection, e.g.,

block-chain.

(Javed et al., 2021), (Alnemari et al., 2018),

(Yamac¸ et al., 2019)

Other Speciﬁc techniques for speciﬁc data types,

like voice or video

(Fan and Wang, 2023), (Zhao et al., 2016)

by creating groups of data sources that are indistin-

guishable from one another. This is done by modi-

fying the quasi-identiﬁer to allow data coming from

different sources to be grouped and to become in-

distinguishable from one another. The sensitive at-

tributes are not modiﬁed, and can still provide a

good level of data usability, but the value of a sen-

sitive attribute under protection will not be re-linked

to its source since it is grouped with data coming

from multiple sources. In most cases, we found pa-

pers that implement or extend k-anonymity (Sweeney,

2002), l-diversity (Machanavajjhala et al., 2007), or t-

closeness (Li et al., 2007).

3.2.2 Differential Privacy

This cluster contains all the papers that present

anonymization techniques that aim at guaranteeing

the postulate of DP (Dwork, 2008), that is:

Pr[M(D

) ∈ S] ≤ e

Pr[M(D

) ∈ S]

In the literature (Zhu et al., 2017), the parameter ε is

also referred to with the term privacy budget. In or-

der to achieve higher privacy guarantees, we need a

lower value of ε. DP can be achieved through differ-

entially private queries (Dwork and Roth, 2013). It

could also be reached through randomised response

or a mechanism of differentially private data collec-

tion (Wang et al., 2016). Most papers aim at im-

proving data usability in different environments or to

speciﬁc data types as we discovered in (Jin et al.,

2022), (Hamm, 2017), (Aljably, 2021), or (Gao and

Li, 2019c). While other anonymization approaches

can have similar mechanisms, e.g., noise injection,

only the approaches that aim at guaranteeing the DP

postulate mentioned above are found in this category.

3.2.3 Perturbation Based

Here, we cluster all the anonymization techniques that

aim to ensure anonymity by adding noise to the data

or perturbing their values in other ways. These tech-

niques do not aim to satisfy speciﬁc postulates like

DP or k-anonymity, which is why they are grouped

into a separate category. Here, we have anonymiza-

tion techniques that use value perturbation but do not

fall under the category of DP or grouping-based meth-

ods. This can be applied at different stages of the data

pipeline (Chen et al., 2009). For example, sampling

noise from a normal distribution and adding it to an

attribute (Domingo-Ferrer et al., 2020).

It has to be pointed out that many anonymiza-

tion techniques clustered under the Differential Pri-

vacy bubble also introduce noise in order to reach the

DP postulate. Therefore, they are stricter and have

a mathematically deﬁned goal. Some examples are:

(Sun et al., 2016), (Ullah and Shah, 2016), (Eyupoglu

et al., 2018a), and (Attaullah et al., 2021).

3.2.4 Data Synthesis

Under the category Data Synthesis, we have the tech-

niques that protect the users’ privacy by synthesising

completely new data based on the original data (Fung

et al., 2010). With this type of approach, the di-

rect identiﬁers are often removed before synthesis or

pseudonymized in order not to store unprotected data.

After the generation, the new dataset is made of data

points with no connections to speciﬁc users, and the

original dataset is deleted. Hence, the guarantee of

anonymity. Generative Adversarial Networks can be

used to generate new datasets with the same charac-

teristics as the original dataset but are not connected

to any real data source (Park et al., 2018). Further

examples can be found here: (Aleroud et al., 2022),

(Abay et al., 2019), (Piacentino and Angulo, 2020b).

3.2.5 Encryption

The cluster of encryption is an interesting one under

the anonymization lens. Encryption is, strictly speak-

ing, not an anonymization technique, however, some

techniques use encryption as a base to reach a guar-

Characterising and Categorising Anonymization Techniques: A Literature-Based Approach

111

antee of anonymity. This can be done in combination

with other methods, to obfuscate sensitive data, or

to use blockchain methods. The following resources

provide valid examples of which kind of technique

can converge to this cluster: (Javed et al., 2021), (Al-

nemari et al., 2018), and (Yamac¸ et al., 2019).

3.3 Research Landscape: Methods,

Application Domains, and Data

Types

The systematic literature search helped us understand

which anonymization techniques are present in the lit-

erature, in which domains they are applied, and which

data types they can process. In the following, we il-

lustrate the statistics of which percentage of the pa-

pers analysed in our literature can be grouped under a

certain category, in which ﬁeld they are applied, and

which data types can be processed by the anonymiza-

tion techniques they present.

3.3.1 Categories of Anonymization Techniques

In Section 3.2, we outlined the categories of tech-

niques we deﬁned to group the techniques we found in

the literature, explaining the nomenclature and what

belongs to each category. Here, we illustrate how the

works analysed in our literature search are distributed

over all the different clusters. Table 1 shows the dis-

tribution of all the techniques found. Grouping-based

techniques are most prevalent, used in about 66% of

the papers, often involving k-anonymity and its exten-

sions, i.e., l-diversity and t-closeness.

The next signiﬁcant group, around 10%, employs

DP. Different approaches are proposed in the vari-

ous papers to try to improve data usability for differ-

ent application ﬁelds. Perturbation-based techniques

make up 8% of the works we analysed, outlining var-

ious ways of injecting noise, or generally perturbing

the data, in order to achieve anonymization guaran-

tees. Techniques that can be ordered under the clus-

ter of data synthesis are found in 4% of the analysed

papers. Encryption-related anonymization techniques

make up 4% of this literature search. As already men-

tioned in Section 3.2.5, this is peculiar given that en-

cryption is not, strictly speaking, an anonymization

method. However, the paper mentioned here uses en-

cryption to strengthen the guarantee of anonymiza-

tion. The last category, in Table 1 named, contains

all the techniques that do not belong to any of the cat-

egories described before. Most of the papers grouped

in Others deal with speciﬁc cases and address privacy

problems speciﬁc to certain data collections. Some

examples are image anonymization, speech anonym-

ization, and video anonymization.

3.3.2 Areas of Application

The application areas, combined with the techniques,

of the analysed works are detailed in Figure 4. By ap-

plication areas, we refer to those mentioned in each

paper, not to potential areas where the techniques

could be applied. It has to be noted that around 40%

of the papers are not tied to a speciﬁc area of appli-

cation and mainly address theoretical aspects, such

as introducing new algorithms or improving upon al-

ready well-established methods.

Social networks and healthcare are the predom-

inant application areas, with the former including

about 20% of the works and the latter including about

15% of the analysed works. The healthcare sector has

a history of anonymization research due to clinical

study evaluations, with ongoing work to reﬁne these

methods (Abbasi and Mohammadi, 2022), (Aminifar

et al., 2021)). Social networks are also a key area be-

cause of the amount of personal information held that

industries want to analyse while guaranteeing privacy

protection. Here we ﬁnd a high proportion of differen-

tial privacy (Gao and Li, 2019a), (Gao and Li, 2019c))

and perturbation-based techniques (Al-Kharji et al.,

2018), (Rong et al., 2018).

The areas of Big Data and Cloud & Web favour

the perturbation-based methods (Eyupoglu et al.,

2018b), (Kalia et al., 2021). In all other ﬁelds,

grouping-based techniques are the most used and dis-

cussed in the literature.

3.3.3 Data Types Processed

The analysed papers were closely reviewed for the

types of data requiring anonymization, as depicted in

Figure 5. This ﬁgure illustrates the data types and

their associated anonymization methods as found in

the analysed papers.

Half of the studies focus on tabular data, which

is not surprising given its prevalence and ease of

anonymization. About 15% of the studies address

anonymizing Graph Data (Thouvenot et al., 2020),

(Gao and Li, 2019b), graph data often linked to so-

cial networks. Another 15% is dedicated to Positional

Data. These data types prompted the development of

specialised techniques due to their peculiar structure.

Positional data is split into points of interest (An

et al., 2018), (Li et al., 2021), (Sei and Ohsuga,

2017) and trajectory (Ward et al., 2017), (Mahdavi-

far et al., 2022), (Li et al., 2022), with the former re-

quiring complex anonymization techniques due to the

sensitivity of location data. Streaming and transac-

tional data are less commonly studied, with existing

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

112

Figure 4: Techniques found in the literature grouped by application domain.

techniques adapted to their speciﬁc needs (Mohamed

et al., 2020), (Tsai et al., 2020), (Puri et al., 2022).

Image, video, and speech data anonymization are not

extensively covered in this review, as they require use-

case-speciﬁc approaches.

3.4 Literature Gaps

As we have illustrated in Figure 5, most of the

anonymization methods found in the literature deal

with tabular data. Images, videos, and speech data,

just to name a few, are handled with methods that are

focused on that kind of data type used for a speciﬁc

use case. Also log data, already closer to tabular data,

is a data type not extensively considered in the litera-

ture, only few speciﬁc cases deal with this data type.

An approach that tackles log data would help an-

onymization being used as a privacy-protecting mech-

anism for diagnostics or product improvement.

Another aspect lacking in the literature is the guar-

antee of anonymity also after incremental data up-

dates. Except for a few cases like (Pei et al., 2007),

(Dwork and Roth, 2013), most of the anonymiza-

tion approaches are not run in real-time and in or-

der to maintain the same guarantee they need to be

re-run when new data come in. This is particularly

true for tabular data and for grouping-based methods.

Anonymization thought for incremental data updates

would make its use easier for stream data and its ap-

plication more widespread.

4 HOW TO ORGANISE A

COLLECTION OF

ANONYMIZATION

TECHNIQUES

Leveraging the information extrapolated in section 3,

the knowledge from the literature research and the

classiﬁcation we made of the anonymization tech-

niques, we illustrate here our vision on how to organ-

ise a collection of anonymization techniques.

The extensive amount of anonymization algo-

rithms necessitates a clear organisational method to

exploit their different characteristics, areas of appli-

cation, and best usages. To achieve this, we em-

ploy a hierarchical structure to categorise these tech-

niques systematically. We propose a three-tier hi-

erarchical model to organise anonymization tech-

niques. We devise the following three structuring cri-

teria: 1) Anonymization Category: This is the broad-

est classiﬁcation level, grouping techniques based

on their fundamental approach (e.g., grouping-based,

data synthesis, etc.). 2) Anonymization Technique:

Within each category, speciﬁc techniques are detailed

(e.g., k-anonymity, differentially private queries).

3) Anonymization Implementation: The most granu-

lar level, where particular implementations of tech-

niques are described. This includes detailed informa-

tion about how each technique is applied.

Each level in the hierarchy is modelled with es-

sential attributes: 1) ID and Name: Unique identi-

ﬁers and descriptive names. 2) Conceptual Expla-

nation: An overview building on the parent node’s

explanation. 3) Data Types: Types of data the tech-

nique can handle. 4) Application Platform: Whether

it is for backend or on-board processing. 5) Incre-

mental Updates: Whether the technique supports up-

dating datasets incrementally. 6) Implementation De-

Characterising and Categorising Anonymization Techniques: A Literature-Based Approach

113

Figure 5: Techniques found in the literature grouped by data type they are applied on.

tails: Speciﬁcs on how the technique is implemented.

This hierarchical model, as illustrated in Figure 6

and Figure 7, ensures that each technique and its

implementation are well-documented and systemati-

cally categorised, hence giving a better overview of

the available anonymization techniques. It facilitates

modularity by allowing each technique implementa-

tion to be treated as separate modules. This modular-

ity is crucial for practical deployment, as it enables

users to select and integrate only the speciﬁc tech-

nique required for their use case, rather than being

constrained by a monolithic system.

All the characteristics of a new node are inherited

from the parent node, and every attribute can be fur-

ther speciﬁed. For example, the data types that the

anonymization technique can process can be limited

compared to the parent anonymization category. The

same can happen between anonymization technique

to the anonymization implementation node.

Every entry of the collection of anonymization

techniques is a separate module. Every module fol-

lows the model illustrated in Figure 7 of which we

can ﬁnd a description of each attribute in Table 2.

The modules from a lower level of the collection

of anonymization techniques inherit the characteris-

tics from the level above and add information. On

the third level of our hierarchy, we ﬁnd the tech-

niques’ algorithms. The anonymization techniques

are here implemented and deployable. The mod-

ules of the third level, the anonymization implemen-

tation leaf nodes, contain in the description the de-

tails of how the anonymity guarantee is reached,

through data processing, data pipeline architecture,

providing privacy-protecting querying mechanisms,

etc. When an anonymization technique can be writ-

ten as a self-contained and deployable piece of code,

e.g., k-anonymity, then the implementation is found

in the anonymization implementation along with its

Generalisation

Suppression

Mondrian

Algorithm

DP Queries

Randomised

Responses

GAN

Sampling

k-anonym

l-div

Noise

Injection

New

Dataset

Grouping

Based

Differential

Privacy

Data

Synthesis

Anonymization

Techniques

Figure 6: Example of hierarchy and modelling of the col-

lection of anonymization techniques.

documentation, i.e., input and output format, etc.

The deployable algorithm could be implemented as

a software library, Docker container, binary ﬁle, We-

bAssembly, etc.

Once an anonymization technique is selected for

a data-handling process the rest of the collection will

not be used in the implementation of the required data

pipeline reducing an unrequired overhead otherwise

needed by a monolithic structure.

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

114

5 COMPARISON OF OUR

STRUCTURE WITH ALREADY

EXISTING FRAMEWORKS

In this section, we compare our proposed way of

organising a collection of anonymization techniques

with the state of the art discussed in Section 2. Our

goal is to illustrate how our approach distinguishes

itself by fulﬁlling the requirements of providing: a

thorough overview, modularity, and ﬂexibility.

A signiﬁcant advantage of our collection is its

modular architecture. The anonymization frame-

works mentioned in Section 2 operate as monolithic

systems, requiring users to adopt the entire frame-

work even if only a single technique is needed. This

can lead to inefﬁciencies and added complexity. In

contrast, our collection is organised and structured

with modularity in mind, treating each anonymiza-

tion technique as an independent module. This al-

lows users to select and deploy only the techniques

required for their speciﬁc use cases.

Our approach also provides a comprehensive

overview of all available anonymization techniques

from both literature and practice. Unlike existing

frameworks, which may offer a limited or predeﬁned

set of techniques, our collection comprises a wide

range of methods, ensuring that users have access to

the full spectrum of available options. This extensive

coverage allows for a more informed selection pro-

cess, where users can choose the most suitable tech-

nique based on their speciﬁc needs and use cases. By

presenting a complete and organised view of the avail-

able techniques, our collection enables users to make

well-informed decisions and apply the most effective

anonymization methods for their scenarios.

Moreover, current anonymization frameworks of-

ten struggle with expandability. Many of these sys-

tems do not easily accommodate new techniques or

updates from ongoing research, leaving users with

outdated or incomplete options. Our approach is de-

signed to be expandable, with a structure that allows

for the integration of new techniques as they become

available. This ensures that our collection remains

relevant and up-to-date, which is essential for keeping

up with advancements in data privacy and anonymiza-

tion. It also allows the user to include custom-made

techniques that come from their experience.

To ﬁnalise our approach we need to add a rec-

ommender system that further assists users in select-

ing the best-ﬁtting anonymization approach. Such

a system would help users select the optimal tech-

nique based on speciﬁc criteria, such as achieving the

highest level of privacy protection or ﬁnding the best

trade-off between data quality and privacy. Imple-

Node Model

Name

Data

Types

Application

Platform

Incremental

Updates

Technical

Explanation

Deployable

Module

Conceptual

Explanation

Figure 7: Core attributes of the model for the anonymization

categories, techniques, and implementations.

menting this functionality would further enhance the

practicality of our collection of anonymization tech-

niques, making it easier for users to navigate the com-

plex landscape of anonymization techniques.

Table 2: Model Attributes for Each Node.

Attributes Description

ID Unique identiﬁer.

Name Descriptive name of the

node.

Conceptual

Explanation

An overview building on the

parent node’s explanation.

Data Types Types of data the technique

can handle (e.g., text,

numerical, images).

Application

Platform

Speciﬁes whether the

technique is for backend or

on-board processing.

Incremental

Updates

Indicates whether the

technique supports updating

datasets incrementally.

Implementation

Details

Speciﬁc details about how

the technique is

implemented (e.g., algorithm

used, coding languages).

6 TOWARDS PRACTICAL

IMPLEMENTATION

In addition to the theoretical foundations discussed

throughout this work, we have taken a signiﬁcant step

toward realizing a practical solution. We have devel-

oped an initial prototype of the anonymization tool-

box and documented our efforts in a demo paper (Fi-

eschi et al., 2025). This prototype serves as a proof of

concept, showcasing the feasibility and potential of

our approach.

Looking ahead, an essential component of the

toolbox will be an integrated recommender sys-

Characterising and Categorising Anonymization Techniques: A Literature-Based Approach

115

tem. This system aims to support software devel-

opers by guiding them in selecting the most suitable

anonymization techniques for their speciﬁc use cases.

Furthermore, it will document the decision-making

process, ensuring transparency and reproducibility in

the selection of privacy-preserving methods. Our cur-

rent research is now focused on ﬁnding possible solu-

tions for such a recommender system to maximize its

usability and effectiveness.

To evaluate the practicality and acceptance of the

framework, we see the need for extensive testing with

developers and real-world use cases. This evaluation

will help us assess how well the toolbox aligns with

the needs of its intended users and identify areas for

further improvement. Our initial testing efforts will

be conducted within the automotive domain, leverag-

ing its complex and privacy-sensitive use cases as a

foundation for iterative reﬁnement of the collection

of anonymization techniques.

7 CONCLUSIONS

This paper provides a comprehensive overview of

anonymization techniques resulting from a system-

atic literature review and careful categorization of the

available methods. By structuring these techniques

into an organized framework, we offer users a valu-

able resource for making informed decisions about

which anonymization approach best suits the data-

collecting use case under development.

Our conceptual framework for anonymization

techniques categorizes them into distinct clusters,

making it easier to navigate through the various op-

tions and select the most appropriate method for spe-

ciﬁc use cases. This structured approach addresses

the need for a clear and accessible overview of avail-

able anonymization techniques, supporting more ef-

fective decision-making in privacy protection. This

lays a foundation for future research and implemen-

tations, enhancing the potential for anonymization by

design, its application, and spread.

The presence of a recommender system would

signiﬁcantly improve our framework to guide users in

selecting the optimal technique based on their speciﬁc

requirements. Developing such a system represents a

key area for future research, which could further en-

hance the practicality and effectiveness of our collec-

tion of anonymization techniques.

ACKNOWLEDGEMENTS

This work is based on the research project SofDCar

(19S21002), funded by the German Federal Ministry

for Economic Affairs and Climate Action.

REFERENCES

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z.,

Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin,

M., Ghemawat, S., Goodfellow, I., Harp, A., Irving,

G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kud-

lur, M., Levenberg, J., Man

e, D., Monga, R., Moore,

S., Murray, D., Olah, C., Schuster, M., Shlens, J.,

Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Van-

houcke, V., Vasudevan, V., Vi

egas, F., Vinyals, O.,

Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and

Zheng, X. (2015). TensorFlow: Large-scale machine

learning on heterogeneous systems. Software avail-

able from tensorﬂow.org.

Abay, N. C., Zhou, Y., Kantarcioglu, M., Thuraisingham,

B., and Sweeney, L. (2019). Privacy Preserving Syn-

thetic Data Release Using Deep Learning. In Berlin-

gerio, M., Bonchi, F., G

artner, T., Hurley, N., and

Ifrim, G., editors, Machine Learning and Knowl-

edge Discovery in Databases, Lecture Notes in Com-

puter Science, pages 510–526, Cham. Springer Inter-

national Publishing.

Abbasi, A. and Mohammadi, B. (2022). A clustering-

based anonymization approach for privacy-preserving

in the healthcare cloud. Concurrency and Computa-

tion: Practice and Experience, 34(1).

Al-Kharji, S., Tian, Y., and Al-Rodhaan, M. (2018). A

Novel (K, X)-isomorphism Method for Protecting Pri-

vacy in Weighted social Network.

Aleroud, A., Shariah, M., and Malkawi, R. (2022). Pri-

vacy Preserving Human Activity Recognition Using

Microaggregated Generative Deep Learning. In 2022

IEEE International Conference on Cyber Security and

Resilience (CSR), pages 357–363.

Aljably, R. (2021). Privacy Preserving Data Sharing in

Online Social Networks, volume 1415 of Communi-

cations in Computer and Information Science. Pages:

152.

Alnemari, A., Arodi, S., Sosa, V., Pandey, S., Romanowski,

C., Raj, R., and Mishra, S. (2018). Protecting infras-

tructure data via enhanced access control, blockchain

and differential privacy, volume 542 of IFIP Ad-

vances in Information and Communication Technol-

ogy. Pages: 125.

Aminifar, A., Rabbi, F., Pun, V., and Lamo, Y.

(2021). Diversity-Aware Anonymization for Struc-

tured Health Data. pages 2148–2154.

An, S., Li, Y., Wang, T., and Jin, Y. (2018). Contact

Graph Based Anonymization for Geosocial Network

Datasets. pages 132–137.

Attaullah, H., Anjum, A., Kanwal, T., Malik, S., Asher-

alieva, A., Malik, H., Zoha, A., Arshad, K., and Imran,

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

116

M. (2021). F-classify: Fuzzy rule based classiﬁcation

method for privacy preservation of multiple sensitive

attributes. Sensors, 21(14).

Chen, B.-C., Kifer, D., LeFevre, K., Machanavajjhala, A.,

et al. (2009). Privacy-preserving data publishing.

Foundations and Trends® in Databases, 2(1–2):1–

167.

Domingo-Ferrer, J., Muralidhar, K., and Bras-Amoros, M.

(2020). General Conﬁdentiality and Utility Metrics

for Privacy-Preserving Data Publishing Based on the

Permutation Model. IEEE Transactions on Depend-

able and Secure Computing, pages 1–1.

Dwork, C. (2008). Differential Privacy: A Survey of Re-

sults. In Agrawal, M., Du, D., Duan, Z., and Li, A.,

editors, Theory and Applications of Models of Com-

putation, Lecture Notes in Computer Science, pages

1–19, Berlin, Heidelberg. Springer.

Dwork, C. and Roth, A. (2013). The Algorithmic

Foundations of Differential Privacy. Foundations

and Trends® in Theoretical Computer Science, 9(3-

4):211–407.

European Parliament and Council of the European Union

(2016). Regulation (EU) 2016/679 of the European

Parliament and of the Council.

Eyupoglu, C., Aydin, M., Zaim, A., and Sertbas, A. (2018a).

An efﬁcient big data anonymization algorithm based

on chaos and perturbation techniques. Entropy, 20(5).

Eyupoglu, C., Aydin, M., Zaim, A., and Sertbas, A.

(2018b). An efﬁcient big data anonymization algo-

rithm based on chaos and perturbation techniques. En-

tropy, 20(5).

Fan, H. and Wang, Y. (2023). Range optimal dummy lo-

cation selection based on query probability density.

2023 2nd International Conference on Cloud Comput-

ing, Big Data Application and Software Engineering

(CBASE), pages 366–371.

Fieschi, A., Hirmer, P., Agrawal, S., Christoph, S., and

Mitschang, B. (2024). Hysaad – a hybrid selection ap-

proach for anonymization by design in the automotive

domain. In 2024 25th IEEE International Conference

on Mobile Data Management (MDM). IEEE.

Fieschi, A., Hirmer, P., and Stach, C. (2025). Discovering

suitable anonymization techniques: A privacy toolbox

for data experts. Presented at the 21st Conference on

Database Systems for Business, Technology and Web

(BTW 2025), Bamberg, Germany, March 2025.

Fieschi, A., Li, Y., Hirmer, P., Stach, C., and Mitschang,

B. (2023). Privacy in connected vehicles: Perspec-

tives of drivers and car manufacturers. In Symposium

and Summer School on Service-Oriented Computing,

pages 59–68. Springer.

Fung, B. C., Wang, K., Fu, A. W.-C., and Yu, P. S. (2010).

Introduction to Privacy-Preserving Data Publishing.

Chapman and Hall/CRC.

Gaboardi, M., Hay, M., and Vadhan, S. (2020). A program-

ming framework for opendp.

Gao, T. and Li, F. (2019a). PHDP: Preserving Persistent Ho-

mology in Differentially Private Graph Publications.

volume 2019-April, pages 2242–2250.

Gao, T. and Li, F. (2019b). Privacy-Preserving Sketching

for Online Social Network Data Publication. volume

2019-June.

Gao, T. and Li, F. (2019c). Sharing Social Networks Using

a Novel Differentially Private Graph Model.

Hamm, J. (2017). Minimax ﬁlter: Learning to preserve

privacy from inference attacks. Journal of Machine

Learning Research, 18.

Javed, I., Alharbi, F., Margaria, T., Crespi, N., and Qureshi,

K. (2021). PETchain: A Blockchain-Based Pri-

vacy Enhancing Technology. IEEE Access, 9:41129–

41143.

Jin, F., Hua, W., Ruan, B., and Zhou, X. (2022). Frequency-

based Randomization for Guaranteeing Differential

Privacy in Spatial Trajectories. volume 2022-May,

pages 1727–1739.

Kalia, P., Bansal, D., and Sofat, S. (2021). Privacy

Preservation in Cloud Computing Using Random-

ized Encoding. Wireless Personal Communications,

120(4):2847–2859.

Khan, R., Tao, X., Anjum, A., Malik, S., Yu, S., Khan,

A., Rehman, W., and Malik, H. (2022). (tau, m)-

slicedBucket privacy model for sequential anonymiza-

tion for improving privacy and utility. Transactions on

Emerging Telecommunications Technologies, 33(6).

Li, B., Zhu, H., and Xie, M. (2022). Releasing Differen-

tially Private Trajectories with Optimized Data Utility.

Applied Sciences (Switzerland), 12(5).

Li, N., Li, T., and Venkatasubramanian, S. (2007). t-

Closeness: Privacy Beyond k-Anonymity and l-

Diversity. In 2007 IEEE 23rd International Confer-

ence on Data Engineering, pages 106–115. ISSN:

2375-026X.

Li, X., Zhu, Y., and Wang, J. (2021). Highly Efﬁcient

Privacy Preserving Location-Based Services with En-

hanced One-Round Blind Filter. IEEE Transactions

on Emerging Topics in Computing, 9(4):1803–1814.

Machanavajjhala, A., Kifer, D., Gehrke, J., and Venkitasub-

ramaniam, M. (2007). L-diversity: Privacy beyond

k-anonymity. ACM Transactions on Knowledge Dis-

covery from Data, 1(1):3–es.

Mahdavifar, S., Deldar, F., and Mahdikhani, H. (2022). Per-

sonalized Privacy-Preserving Publication of Trajec-

tory Data by Generalization and Distortion of Moving

Points. Journal of Network and Systems Management,

30(1).

Majeed, A. and Lee, S. (2021). Anonymization Techniques

for Privacy Preserving Data Publishing: A Compre-

hensive Survey. IEEE Access, 9:8512–8545. Confer-

ence Name: IEEE Access.

Mohamed, M., Ghanem, S., and Nagi, M. (2020). Privacy-

preserving for distributed data streams: Towards l-

diversity. International Arab Journal of Information

Technology, 17(1):52–64.

Moher, D., Shamseer, L., Clarke, M., Ghersi, D., Liberati,

A., Petticrew, M., Shekelle, P., Stewart, L. A., and

Group, P.-P. (2015). Preferred reporting items for sys-

tematic review and meta-analysis protocols (prisma-p)

2015 statement. Systematic reviews, 4:1–9.

Characterising and Categorising Anonymization Techniques: A Literature-Based Approach

117

Morton, A. and Sasse, M. A. (2012). Privacy is a pro-

cess, not a pet: a theory for effective privacy practice.

In Proceedings of the 2012 New Security Paradigms

Workshop, pages 87–104.

Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H.,

and Kim, Y. (2018). Data synthesis based on gener-

ative adversarial networks. Proceedings of the VLDB

Endowment, 11(10):1071–1083.

Patki, N., Wedge, R., and Veeramachaneni, K. (2016).

The synthetic data vault. In 2016 IEEE International

Conference on Data Science and Advanced Analytics

(DSAA), pages 399–410.

Pei, J., Xu, J., Wang, Z., Wang, W., and Wang, K.

(2007). Maintaining k-anonymity against incremen-

tal updates. In 19th International Conference on Sci-

entiﬁc and Statistical Database Management (SSDBM

2007), pages 5–5. IEEE.

Piacentino, E. and Angulo, C. (2020a). Anonymizing

Personal Images Using Generative Adversarial Net-

works, volume 12108 LNBI of Lecture Notes in Com-

puter Science (including subseries Lecture Notes in

Artiﬁcial Intelligence and Lecture Notes in Bioinfor-

matics). Pages: 405.

Piacentino, E. and Angulo, C. (2020b). Generating Fake

Data Using GANs for Anonymizing Healthcare Data,

volume 12108 LNBI of Lecture Notes in Computer

Science (including subseries Lecture Notes in Artiﬁ-

cial Intelligence and Lecture Notes in Bioinformatics).

Pages: 417.

Prasser, F. and Kohlmayer, F. (2015). Putting statistical dis-

closure control into practice: The arx data anonymiza-

tion tool. Available at: https://arx.deidentiﬁer.org.

Puri, V., Kaur, P., and Sachdeva, S. (2022). (k, m, t)-

anonymity: Enhanced privacy for transactional data.

Concurrency and Computation: Practice and Experi-

ence, 34(18).

Rong, H., Ma, T., Tang, M., and Cao, J. (2018). A novel

subgraph K+-isomorphism method in social network

based on graph similarity detection. Soft Computing,

22(8):2583–2601.

Sei, Y. and Ohsuga, A. (2017). Location Anonymization

with Considering Errors and Existence Probability.

IEEE Transactions on Systems, Man, and Cybernet-

ics: Systems, 47(12):3207–3218.

Shynu, P. G., Shayan., H. M., and Chowdhary, C. L. (2020).

A fuzzy based data perturbation technique for privacy

preserved data mining. 2020 International Conference

on Emerging Trends in Information Technology and

Engineering (ic-ETITE), pages 1–4.

Stach, C. (2023). Data Is the New Oil–Sort of: A View

on Why This Comparison Is Misleading and Its Im-

plications for Modern Data Administration. Future

Internet, 15(2):71:1–71:49.

Sun, Y., Yuan, Y., Wang, G., and Cheng, Y. (2016). Splitting

anonymization: a novel privacy-preserving approach

of social network. Knowledge and Information Sys-

tems, 47(3):595–623.

Sweeney, L. (2002). k-ANONYMITY: A MODEL FOR

PROTECTING PRIVACY. International Journal of

Uncertainty, Fuzziness and Knowledge-Based Sys-

tems, 10(05):557–570.

Thouvenot, M., Cur

e, O., and Calvez, P. (2020). Knowl-

edge graph anonymization using semantic anatomiza-

tion. volume 2721, pages 129–133.

Tsai, Y.-C., Wang, S.-L., Ting, I.-H., and Hong, T.-P.

(2020). Flexible sensitive K-anonymization on trans-

actions. World Wide Web, 23(4):2391–2406.

Ullah, I. and Shah, M. (2016). A novel model for preserving

Location Privacy in Internet of Things. pages 542–

547.

Walonoski, J., Kramer, M., Nichols, J., Quina, A., Moe-

sel, C., Hall, D., Duffett, C., Dube, K., Gallagher, T.,

and McLachlan, S. (2017). Synthea: An approach,

method, and software mechanism for generating syn-

thetic patients and the synthetic electronic health care

record. Journal of the American Medical Informatics

Association, 25(3):230–238.

Wang, Y., Wu, X., and Hu, D. (2016). Using Randomized

Response for Differential Privacy Preserving Data

Collection. In EDBT/ICDT Workshops.

Ward, K., Lin, D., and Madria, S. (2017). MELT:

Mapreduce-based efﬁcient large-scale trajectory

anonymization. volume Part F128636.

Yamac¸, M., Ahishali, M., Passalis, N., Raitoharju, J.,

Sankur, B., and Gabbouj, M. (2019). Reversible

privacy preservation using multi-level encryption and

compressive sensing. volume 2019-September.

Yu, L., Liu, L., Pu, C., Gursoy, M. E., and Truex, S.

(2019). Differentially Private Model Publishing for

Deep Learning. In 2019 IEEE Symposium on Security

and Privacy (SP), pages 332–349. arXiv:1904.02200

[cs].

Zhao, H., Wan, J., and Chen, Z. (2016). A novel dummy-

based knn query anonymization method in mobile ser-

vices. International Journal of Smart Home, 10:137–

154.

Zhu, T., Li, G., Zhou, W., and Yu, P. S. (2017). Differen-

tially Private Data Publishing and Analysis: A Survey.

IEEE Transactions on Knowledge and Data Engineer-

ing, 29(8):1619–1638.

ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy

118