On the Evaluation of the Privacy Breach in Disassociated Set-valued Datasets

Sara Barakat, Bechara Al Bouna, Mohamed Nassar, Christophe Guyeux

Abstract

Data anonymization is gaining much attention these days as it provides the fundamental requirements to safely outsource datasets containing identifying information. While some techniques add noise to protect privacy others use generalization to hide the link between sensitive and non-sensitive information or separate the dataset into clusters to gain more utility. In the latter, often referred to as bucketization, data values are kept intact, only the link is hidden to maximize the utility. In this paper, we showcase the limits of disassociation, a bucketization technique that divides a set-valued dataset into km-anonymous clusters. We demonstrate that a privacy breach might occur if the disassociated dataset is subject to a cover problem. We finally evaluate the privacy breach using the quantitative privacy breach detection algorithm on real disassociated datasets.

References

  1. al Bouna, B., Clifton, C., and Malluhi, Q. M. (2015a). Anonymizing transactional datasets. Journal of Computer Security, 23(1):89-106.
  2. al Bouna, B., Clifton, C., and Malluhi, Q. M. (2015b). Efficient sanitization of unsafe data correlations. In Proceedings of the Workshops of the EDBT/ICDT 2015 Joint Conference (EDBT/ICDT), Brussels, Belgium, March 27th, 2015., pages 278-285.
  3. Barbaro, M. and Zeller, T. (2006). A face is exposed for aol searcher no. 4417749.
  4. Biskup, J., PreuB, M., and Wiese, L. (2011). On the inference-proofness of database fragmentation satisfying confidentiality constraints. In Proceedings of the 14th Information Security Conference, Xian, China.
  5. Ciriani, V., Vimercati, S. D. C. D., Foresti, S., Jajodia, S., Paraboschi, S., and Samarati, P. (2010). Combining fragmentation and encryption to protect privacy in data storage. ACM Trans. Inf. Syst. Secur., 13:22:1- 22:33.
  6. Cormode, G., Li, N., Li, T., and Srivastava, D. (2010). Minimizing minimality and maximizing utility: Analyzing method-based attacks on anonymized data. In Proceedings of the VLDB Endowment, volume 3, pages 1045-1056.
  7. Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Conference on Theory of Cryptography, TCC'06, pages 265-284, Berlin, Heidelberg. Springer-Verlag.
  8. Fard, A. M. and Wang, K. (2010). An effective clustering approach to web query log anonymization. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, pages 1-11. IEEE.
  9. He, Y. and Naughton, J. F. (2009). Anonymization of setvalued data via top-down, local generalization. Proc. VLDB Endow., 2(1):934-945.
  10. Jia, X., Pan, C., Xu, X., Zhu, K., and Lo, E. (2014). - uncertainty anonymization by partial suppression. In Bhowmick, S., Dyreson, C., Jensen, C., Lee, M., Muliantara, A., and Thalheim, B., editors, Database Systems for Advanced Applications, volume 8422 of Lecture Notes in Computer Science, pages 188-202. Springer International Publishing.
  11. Kifer, D. (2009). Attacks on privacy and definetti's theorem. In SIGMOD Conference, pages 127-138.
  12. Li, T., Li, N., Zhang, J., and Molloy, I. (2012). Slicing: A new approach for privacy preserving data publishing. IEEE Trans. Knowl. Data Eng., 24(3):561-574.
  13. Loukides, G., Liagouris, J., Gkoulalas-Divanis, A., and Terrovitis, M. (2014a). Disassociation for electronic health record privacy. Journal of Biomedical Informatics, 50:46-61.
  14. Loukides, G., Liagouris, J., Gkoulalas-Divanis, A., and Terrovitis, M. (2014b). Disassociation for electronic health record privacy. Journal of Biomedical Informatics, 50(0):46 - 61. Special Issue on Informatics Methods in Medical Privacy.
  15. Loukides, G., Liagouris, J., Gkoulalas-Divanis, A., and Terrovitis, M. (2015). Utility-constrained electronic health record data publishing through generalization and disassociation. In Gkoulalas-Divanis, A. and Loukides, G., editors, Medical Data Privacy Handbook, pages 149-177. Springer International Publishing.
  16. Machanavajjhala, A., Gehrke, J., Kifer, D., and Venkitasubramaniam, M. (2006). l-diversity: Privacy beyond kanonymity. In Proceedings of the 22nd IEEE International Conference on Data Engineering (ICDE 2006), Atlanta Georgia.
  17. Miller, G. A. (1995). Wordnet: A lexical database for english. Commun. ACM, 38(11):39-41.
  18. Ressel, P. (1985). De Finetti-type theorems: an analytical approach. Ann. Probab., 13(3):898-922.
  19. Samarati, P. (2001). Protecting Respondents' Identities in Microdata Release. IEEE Trans. Knowl. Data Eng., 13(6):1010-1027.
  20. Sweeney, L. (2001). Computational disclosure control - a primer on data privacy protection. Technical report, Massachusetts Institute of Technology.
  21. Sweeney, L. (2002). k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5):557-570.
  22. Terrovitis, M., Mamoulis, N., and Kalnis, P. (2008). Privacy-preserving anonymization of set-valued data. PVLDB, 1(1):115-125.
  23. Terrovitis, M., Mamoulis, N., Liagouris, J., and Skiadopoulos, S. (2012). Privacy preservation by disassociation. Proc. VLDB Endow., 5(10):944-955.
  24. Wong, R. C.-W., Fu, A. W.-C., Wang, K., and Pei, J. (2007). Minimality attack in privacy preserving data publishing. In VLDB, pages 543-554.
  25. Wong, R. C.-W., Fu, A. W.-C., Wang, K., Yu, P. S., and Pei, J. (2011). Can the utility of anonymized data be used for privacy breaches? ACM Trans. Knowl. Discov. Data, 5(3):16:1-16:24.
  26. Xiao, X. and Tao, Y. (2006). Anatomy: Simple and effective privacy preservation. In Proceedings of 32nd International Conference on Very Large Data Bases (VLDB 2006), Seoul, Korea.
Download


Paper Citation


in Harvard Style

Barakat S., Al Bouna B., Nassar M. and Guyeux C. (2016). On the Evaluation of the Privacy Breach in Disassociated Set-valued Datasets . In Proceedings of the 13th International Joint Conference on e-Business and Telecommunications - Volume 4: SECRYPT, (ICETE 2016) ISBN 978-989-758-196-0, pages 318-326. DOI: 10.5220/0005969403180326


in Bibtex Style

@conference{secrypt16,
author={Sara Barakat and Bechara Al Bouna and Mohamed Nassar and Christophe Guyeux},
title={On the Evaluation of the Privacy Breach in Disassociated Set-valued Datasets},
booktitle={Proceedings of the 13th International Joint Conference on e-Business and Telecommunications - Volume 4: SECRYPT, (ICETE 2016)},
year={2016},
pages={318-326},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005969403180326},
isbn={978-989-758-196-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 13th International Joint Conference on e-Business and Telecommunications - Volume 4: SECRYPT, (ICETE 2016)
TI - On the Evaluation of the Privacy Breach in Disassociated Set-valued Datasets
SN - 978-989-758-196-0
AU - Barakat S.
AU - Al Bouna B.
AU - Nassar M.
AU - Guyeux C.
PY - 2016
SP - 318
EP - 326
DO - 10.5220/0005969403180326