Characterizing Generalization Algorithms - First Guidelines for Data Publishers

Feten Ben Fredj, Nadira Lammari, Isabelle Comyn-Wattiau

Abstract

Many techniques, such as generalization algorithms have been proposed to ensure data anonymization before publishing. However, data publishers may feel unable to choose the best algorithm given their specific context. In this position paper, we describe synthetically the main generalization algorithms focusing on their constraints and their advantages. Then we discuss the main criteria that can be used to choose the best algorithm given a context. Two use cases are proposed, illustrating guidelines to help data holders choosing an algorithm. Thus we contribute to knowledge management in the field of anonymization algorithms. The approach can be applied to select an algorithm among other anonymization techniques (micro-aggregation, swapping, etc.) and even first to select a technique.

References

  1. Brand, R, 2002. Microdata protection through noise addition. In Domingo-Ferrer J, editor, Inference Control in Statistical Databases, Vol. 2316 of LNCS, pp 97-116, Springer Berlin Heidelberg.
  2. Defays, D and Nanopoulos, P, 1993. Panels of enterprises and confidentiality: the small aggregates method. In Proc. 92nd Symposium on Design and Analysis of Longitudinal Surveys, pp 195-204, Statistics Canada.
  3. Fienberg, SE, McIntyre, J., 2004. Data Swapping: Variations on a Theme by Dalenius and Reiss, In J. Domingo-Ferrer and V. Torra (Eds.): PSD 2004, LNCS 3050, pp. 14-29, Springer Berlin Heidelberg.
  4. Fung, BCM, Wang, K, Yu, PS, 2005. Top-down specialization for information and privacy preservation. In Proc. 21st IEEE Intl Conference on Data Engineering (ICDE). pp. 205-216.
  5. Fung, BCM, Wang, K, Chen, R and Yu PS, 2010. PrivacyPreserving Data Publishing: A Survey of Recent Developments. ACM Computing Surveys, Vol. 42, No. 4, Article 14
  6. Hundepool, A and Willenborg, L, 1996. µ - and t-argus: Software for statistical disclosure control. In Proc 3rd Intl Seminar on Statistical Confidentiality, Bled.
  7. Ilavarasi, AK, Sathiyabhama, B and Poorani, S., 2013. A Survey on Privacy Preserving Data Mining Techniques. In International Journal of Computer Science and Business Informatics. Vol 7, No 1.
  8. Kiran, P and Kavya, NP, 2012. A Survey on Methods, Attacks and Metric for Privacy Preserving Data Publishing. In International Journal of Computer Applications, Vol 53, No 18.
  9. LeFevre, K, DeWitt, DJ and Ramakrishnan, R, 2005. Incognito: Efficient full-domain k-anonymity. In Proc. ACM Intl Conf on Management of data (SIGMOD).
  10. LeFevre, K, DeWitt, DJ and Ramakrishnan, R, 2006a. Mondrian multidimensional k-anonymity. In Proc 22nd IEEE Intl Conference on Data Engineering (ICDE).
  11. LeFevre, K, DeWitt, DJ, Ramakrishnan, R, R., 2006b. Workload-aware anonymization. In Proc 12th ACM SIGKDD Intl Conf on Knowledge discovery and data mining.
  12. Li, N, Li, T and Venkatasubramanian, S, 2007. tcloseness: Privacy beyond k-anonymity and ldiversity. In Proc 21st IEEE International Conference on Data Engineering (ICDE).
  13. Machanavajjhala, A, Gehrke, J, Kifer, D and Venkitasubramaniam, M, 2007. l-diversity: Privacy beyond k-anonymity. In Proc. 22nd IEEE Intl Conf on Data Engineering (ICDE).
  14. Nayak, G, Devi, S, 2011. A General Survey of PrivacyPreserving Data Mining Models and Algorithms. In International Journal of Engineering Science and Technology (IJEST), Vol 3, No 3.
  15. Patel, L and Gupta, R, 2013. A Survey of Perturbation Technique For Privacy-Preserving of Data. In International Journal of Emerging Technology and Advanced Engineering, Vol 3, N° 6.
  16. Samarati, P. 2001. Protecting respondents' identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, Vol 13, N°6.
  17. Sharma, D, 2012. A Survey on Maintaining Privacy in Data Mining. In International Journal Of Engineering Research And Technology Vol. 1, N°2.
  18. Singh AP and Parihar D, 2013. A review of privacy preserving data publishing technique. In International Journal of Emerging Research in Management andTechnology Vol. 2, N°6.
  19. Sweeney, L. 1998. Datafly: A system for providing anonymity in medical data. In: Proceedings of the IFIP TC11 WG11.3 Eleventh International Conference on Database Security XI: Status and Prospects, Pages 356-381, Chapman and Hall, Ltd.
  20. Sweeney, L. 2002. k-Anonymity: A model for protecting privacy. Intl Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 10, N°5.
  21. Wang, K, Yu, P and Chakraborty, S. 2004. Bottom-up generalization: A data mining solution to privacy protection. In Proc. 4th IEEE Intl Conf on Data Mining (ICDM).
  22. Xu Y, Ma T, Tang M and Tian W, 2014. A survey of privacy preserving data publishing using generalization and suppression. In International Journal Applied Mathematics and Information Sciences, Vol 8, N°3.
Download


Paper Citation


in Harvard Style

Ben Fredj F., Lammari N. and Comyn-Wattiau I. (2014). Characterizing Generalization Algorithms - First Guidelines for Data Publishers . In Proceedings of the International Conference on Knowledge Management and Information Sharing - Volume 1: KMIS, (IC3K 2014) ISBN 978-989-758-050-5, pages 360-366. DOI: 10.5220/0005154603600366


in Bibtex Style

@conference{kmis14,
author={Feten Ben Fredj and Nadira Lammari and Isabelle Comyn-Wattiau},
title={Characterizing Generalization Algorithms - First Guidelines for Data Publishers},
booktitle={Proceedings of the International Conference on Knowledge Management and Information Sharing - Volume 1: KMIS, (IC3K 2014)},
year={2014},
pages={360-366},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005154603600366},
isbn={978-989-758-050-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Management and Information Sharing - Volume 1: KMIS, (IC3K 2014)
TI - Characterizing Generalization Algorithms - First Guidelines for Data Publishers
SN - 978-989-758-050-5
AU - Ben Fredj F.
AU - Lammari N.
AU - Comyn-Wattiau I.
PY - 2014
SP - 360
EP - 366
DO - 10.5220/0005154603600366