the original data before edges removal is larger than
in (a) kdegree and (b) CCPA + k-degree.
In particular, in the case of Facebook dataset, the
link match rate is larger for (c), (a), and (b), in that or-
der, and the proposed method (c) outperform the con-
ventional method of k-degree anonymization in terms
of the number of matched links.
In all results, the link match rate was worse for (b)
CCPA+k-degree than for (a) k-degree. This may be
due to an increase in the number of links in the dataset
in the link prediction, and an increase in the amount
of graph modification in anonymization, which added
links that were not in the original data.
The anonymized data generated by the proposed
anonymization method showed a higher link match
rate than the other methods, suggesting that the char-
acteristics of the data are better preserved.
There are two possible reasons for the high
accuracy of the proposed method (c) CCPA+k-
maximum degree.
First, the amount of graph modification can be re-
duced compared to anonymizing all nodes, because
the nodes that need to be anonymized are selected
based on the link prediction results.
Secondly, there is a difference in the graph modi-
fication method during anonymization. The proposed
method (c) anonymizes the graph by adding links be-
cause the amount of link modification can be sup-
pressed. On the other hand, the conventional methods
(b) anonymizes by adding and deleting links in order
to reduce the amount of link modification because the
amount of link modification is enormous.
In Tables 4 and 5, the reason why NA was ob-
tained for the results of conventional methods (a) and
(b) is that the sum of the anonymized degree sequence
is odd, and the anonymized graph cannot be con-
structed. In particular, conventional methods (a) and
(b) require anonymization of degree sequences while
minimizing the amount of modification so that all
nodes satisfy k-degree, which may result in unfeasible
degree sequences as graphs. In this respect, the pro-
posed method (C) can suppress the number of nodes
that are the main target of anonymization process and
can easily generate a feasible degree sequence.
In addition, the results of the link match rate
showed that as the value of the k value increases, the
link match rates decreased. The number of links that
match the original data before edge removal did not
change much as the value of k increases, even though
the number of edges increased. The reason why the
number of links increases as the value of the k value
increases is that the larger the k value, the higher
the anonymity required, the more nodes need to be
anonymized, and the amount of graph modification
increases.
In order to increase the utility of anonymized so-
cial network data with link prediction in the future,
it is necessary to increase the number of links that
match the original data before edge removal, such as
by preferentially selecting links that are more likely
to be present in the original data when adding links
during anonymization.
6 CONCLUSIONS
In this paper, assuming that the real data contains
missing data, we discussed an anonymity metric for
anonymization of social network data with link pre-
diction. We also applied the anonymization method
for social network data with link prediction to real
data and examined the utility.
The experimental results suggested that the pro-
posed anonymization method preserves more data
features than the application of conventional methods
in anonymized data with link prediction.
In the future, we will conduct utility studies on
other real data and safety evaluation, and investigate
better ways to anonymize SNS data through link pre-
diction.
ACKNOWLEDGEMENTS
This work was supported by JSPS KAKENHI Grant
Numbers JP21H03496, JP22K12157.
REFERENCES
Ahmad, I., Akhtar, M. U., Noor, S., and Shahnaz, A. (2020).
Missing link prediction using common neighbor and
centrality based parameterized algorithm. Scientific
reports, 10(1):1–9.
Beigi, G. and Liu, H. (2020). A survey on privacy in
social media: Identification, mitigation, and applica-
tions. ACM/IMS Trans. Data Sci., 1(1).
Campan, A. and Truta, T. M. (2008a). A clustering ap-
proach for data and structural anonymity in social net-
works.
Campan, A. and Truta, T. M. (2008b). Data and structural k-
anonymity in social networks. In International Work-
shop on Privacy, Security, and Trust in KDD, pages
33–54. Springer.
Casas-Roma, J., Herrera-Joancomart
´
ı, J., and Torra, V.
(2017a). k-degree anonymity and edge selection: im-
proving data utility in large networks. Knowledge and
Information Systems, 50(2):447–474.
A k-Anonymization Method for Social Network Data with Link Prediction
499