(publishing more general values of the samples), sup-
pression (removal of some samples) and micro aggre-
gation.
Micro aggregation creates some micro-clusters
from the entire data set and then replaces the original
data set in each cluster by their cluster representatives.
In this manner privacy is achieved because now the
perturbed data, the cluster representative, is not a sin-
gle record anymore, instead it is representation of the
entire cluster. Each cluster should have a minimum
number of records to assure privacy, which is equal
to k to satisfy k-anonymity. k is a parameter which
determines ”how much” the information is protected,
intuitively, the higher the value of k, the more is the
protection of information. It decreases the probabil-
ity of a successful record linkage by generating large
equivalence classes.
(De Capitani di Vimercati et al., 2023) illustrates
k-anonymity and its main extensions in different ap-
plications. In this paper, we have developed a man-
ifold version of Maximum Distance to Average Vec-
tor(MDAV) algorithm (Domingo-Ferrer and Mateo-
Sanz, 2002) for k-anonymisation based on micro ag-
gregation. It constructs homogeneous clusters from
the data set while minimizing the sum of squared er-
rors (SSE) i.e., the distance between each record and
its centroid.
SSE =
n
j=1
n
i=1
(x
i j
−¯x
j
)
2
Differential Privacy (Dwork, 2006) is another
mechanism for privacy protection in machine learn-
ing. It aims to obfuscate the presence or absence of
a particular record in a given dataset, by limiting its
effect on the final result. However in real-world ap-
plications, data analysis and model construction are
just one of the steps of a complex process. One needs
to perform exploratory data analysis and test the data
on several models before selecting an optimal ma-
chine learning model and apply privacy-preserving
solutions to it (Torra, 2022), (Domingo-Ferrer et al.,
2021).
There are some recent works which includes dif-
ferential privacy on manifold learning (Vepakomma
et al., 2021) and on riemannian manifolds (Reimherr
et al., 2021). But according to our knowledge, there
are no studies that uses K-anonymity privacy model
on the manifolds. Thus this paper provides a novel
contribution.
3 METHODOLOGY
This section provides a description of our three dif-
ferent approaches that are developed in this pa-
per to achieve a privacy preserving model that
anonymize high-dimensional data considering the
manifold structure. Nobody investigated this field of
analysing the effects of K-Anonymity privacy model
on manifold learning. So, we studied this by propos-
ing three different approaches. Each approach is de-
scribed in a different algorithm.
Algorithm 1 is the M-MDAV approach, that di-
rectly tries to anonymize high-dimensional data us-
ing geodesic-MDAV. Algorithm 2 is M-ISOMDAV,
which uses ISOMAP for preserving the manifold
structure and then uses M-MDAV for anonymiza-
tion. Algorithm 3 is M-LLEMDAV method. It uses
geodesic-LLE and M-MDAV. We have developed
three different approaches, since the Algorithm 1 is a
manifold version of MDAV that directly anonymises
high dimensional data. While the later algorithms use
different manifold learning techniques to preserve the
inherent structure of data and then anonymize using
M-MDAV. A comparative analysis is also conducted
between these three algorithms which is described in
the later sections of the paper.
The intuition behind developing three different ap-
proaches is to analyse the effect of privacy model
on manifold learning techniques. To do so, firstly
we need a metric that preserves the information of
the high-dimensional space. The information should
be preserved and not lost while transforming to low-
dimensional space, as manifold learning computes
distance between points in high-dimensional space
and then aims to preserve these distances while trans-
forming to its low-dimensional embedding.
This is achieved by utilising geodesic distance as a
metric in manifold learning approaches. Once, the in-
formation is transformed in a low-dimensional space,
M-MDAV a newly developed manifold version of K-
anonymity model is used to protect the information
from intruders. We have considered two different
manifold learning techniques for the good properties
they have, and provided a comparative analysis be-
tween them.
Algorithm-1 M-MDAV is a manifold version of
state of the art MDAV. Initially, pairwise-geodesic
distance between each data points are computed as
defined in 1. Then, median of all data points is ob-
tained by minimising the geodesic distance between
the data points, as mentioned in the objective func-
tion of the algorithm 3. After that, clusters are formed
around the data points that are furthest from the me-
dian. This process is repeated until all points get clus-
tered. Finally, the clustered data points are replaced
by the median of that cluster. The Algotihm-2 M-
ISOMDAV is a manifold combination of two different
approaches i.e., ISOMAP manifold learning and
K-Anonymous Privacy Preserving Manifold Learning
41