6 CONCLUSION
ML solves various problems in all industries;
however, personal information leakage can occur,
and the damage is significant in the medical and
financial sectors. DP is a technology used for data
protection that can solve information leakage
problems in ML. However, conventional DP
techniques lack prediction accuracy and require
significant time and cost. To address these challenges,
this study proposes a D-DPFS model that combines
the DP technology for privacy protection in ML with
the FS technology for data analysis.
The experiment used four models to compare the
proposed and conventional models' performance, cost,
and security. The performance measures the
classification accuracy of the LR model according to
the change in epsilon, and the cost is compared based
on memory usage and latency. Additionally, to
measure security in detail, we measured classification
performance for general users and users who selected
privacy features.
The D-DPFS model proposed through
experiments guarantees a high classification
performance of 95% for general users and adopts a
method of applying additional DP when general users
select the feature they want to protect, preventing
attackers from stealing personal information, can be
prevented. Therefore, it has been proven that the D-
DPFS method is suitable for protecting user privacy
in ML situations.
In this study, the D-DPFS model was evaluated
only using the BRFSS dataset; however, in future
studies, the model's performance will be verified in
various dataset environments, and the privacy feature
selection algorithm will be specified.
ACKNOWLEDGEMENTS
This work is supported by the Ministry of Trade,
Industry and Energy (MOTIE) under Training
Industrial Security Specialist for High-Tech Industry
(RS-2024-00415520) supervised by the Korea
Institute for Advancement of Technology (KIAT),
and the Ministry of Science and ICT (MSIT) under
the ICAN (ICT Challenge and Advanced Network of
HRD) program (IITP-2022-RS-2022-00156310) and
Information Security Core Technology Development
(RS-2024-00437252) supervised by the Institute of
Information & Communication Technology Planning
& Evaluation (IITP).
REFERENCES
Aitsam, M. (2022). Differential Privacy Made Easy. 2022
International Conference on Emerging Trends in
Electrical, Control, and Telecommunication
Engineering (ETECTE), 17, 1–7. https://doi.org/10.
1109/etecte55893.2022.10007322
Aleskerov, E., Freisleben, B., & Rao, B. (1997).
CARDWATCH: A Neural Network based database
mining system for credit card fraud detection.
Proceedings of the IEEE/IAFE 1997 Computational
Intelligence for Financial Engineering (CIFEr).
https://doi.org/10.1109/cifer.1997.618940
Alishahi, M., Moghtadaiee, V., & Navidan, H. (2022). Add
noise to remove noise: Local Differential Privacy for
Feature Selection. Computers & Security, 123,
102934. https://doi.org/10.1016/j.cose.2022.102934
Centers for Disease Control and Prevention. (2024, August
28). CDC - BRFSS Annual Survey Data. Centers for
Disease Control and Prevention. https://www.cdc.
gov/brfss/annual_data/annual_data.htm
Chiew, K. L., Tan, C. L., Wong, K., Yong, K. S. C., &
Tiong, W. K. (2019). A New Hybrid Ensemble Feature
Selection Framework for Machine Learning-based
phishing detection system. Information Sciences, 484,
153–166. https://doi.org/10.1016/j.ins.2019.01.064
Desfontaines, D., Mohammadi, E., Krahmer, E., & Basin,
D. (2019). Differential privacy with partial
knowledge. arXiv preprint arXiv:1905.00650.
Dwork, C., & Roth, A. (2013). The Algorithmic
Foundations of Differential Privacy. https://doi.org/10.
1561/9781601988195
Garg, A., & Mago, V. (2021). Role of machine learning in
Medical Research: A survey. Computer Science Review,
40, 100370. https://doi.org/10.1016/j.cosrev.
2021.100370
Holohan, N., Braghin, S., Mac Aonghusa, P., & Levacher,
K. (2019). Diffprivlib: the IBM differential privacy
library. arXiv preprint arXiv:1907.02444.
Ibm. (n.d.). IBM/differential-privacy-library: Diffprivlib:
The IBM Differential Privacy Library. GitHub.
https://github.com/IBM/differential-privacy-library
Iwendi, C., Moqurrab, S. A., Anjum, A., Khan, S., Mohan,
S., & Srivastava, G. (2020). N-Sanitization: A semantic
privacy-preserving framework for unstructured medical
datasets. Computer Communications, 161, 160–171.
https://doi.org/10.1016/j.comcom.2020.07.032
Kaissis, G. A., Makowski, M. R., Rückert, D., & Braren, R.
F. (2020). Secure, privacy-preserving and Federated
Machine Learning in medical imaging. Nature Machine
Intelligence, 2(6), 305–311. https://doi.org/10.1038/
s42256-020-0186-1
Kanwal, T., Anjum, A., Malik, S. U. R., Sajjad, H., Khan,
A., Manzoor, U., & Asheralieva, A. (2021). A robust
privacy preserving approach for electronic health
records using multiple dataset with multiple sensitive
attributes. Computers & Security,
105, 102224.
https://doi.org/10.1016/j.cose.2021.102224
Khaire, U. M., & Dhanalakshmi, R. (2022). Stability of
Feature Selection Algorithm: A Review. Journal of