PRIVACY PRESERVING k-MEANS CLUSTERING IN MULTI-PARTY ENVIRONMENT

Saeed Samet, Ali Miri, Luis Orozco-Barbosa

Abstract

Extracting meaningful and valuable knowledge from databases is often done by various data mining algorithms. Nowadays, databases are distributed among two or more parties because of different reasons such as physical and geographical restrictions and the most important issue is privacy. Related data is normally maintained by more than one organization, each of which wants to keep its individual information private. Thus, privacy-preserving techniques and protocols are designed to perform data mining on distributed environments when privacy is highly concerned. Cluster analysis is a technique in data mining, by which data can be divided into some meaningful clusters, and it has an important role in different fields such as bio-informatics, marketing, machine learning, climate and medicine. k-means Clustering is a prominent algorithm in this category which creates a one-level clustering of data. In this paper we introduce privacy-preserving protocols for this algorithm, along with a protocol for Secure comparison, known as the Millionaires’ Problem, as a sub-protocol, to handle the clustering of horizontally or vertically partitioned data among two or more parties.

References

  1. Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. Y. (2002). Tools for privacy preserving data mining. SIGKDD Explorations, 4(2):28-34.
  2. Du, W. and Atallah, M. (2001). Privacy-preserving cooperative statistical analysis. In Proc. of the 17th Annual Computer Security Applications Conference, pages 102-110.
  3. Duda, R. O., Hart, P. E., and Stork, D. G. (2000). Pattern Classification (2nd ed). John Wiley.
  4. Ioannidis, I. and Grama, A. (2003). An efficient protocol for yao's millionaires' problem. In Proc. of the 36th Annual Hawaii International Conference on System Science, pages 205-211.
  5. Jagannathan, G., Pillaipakkamnatt, K., and Wright, R. N. (2006). A new privacy-preserving distributed kclustering algorithm. In Proc. of the 2006 SIAM International Conference on Data Mining.
  6. Jagannathan, G. and Wright, R. N. (2005). Privacypreserving distributed k-means clustering over arbitrarily partitioned data. In Proceeding of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining, pages 593-599.
  7. Jha, S., Kruger, L., and McDaniel, P. (2005). Privacy preserving clustering. In Proc. of the 10th European Symposium on Research in Computer Security, pages 397- 417.
  8. Malek, B. and Miri, A. (2006). Secure dot-product protocol using trace functions. 2006 IEEE International Symposium on Information Theory.
  9. Merugu, S. and Ghosh, J. (2003). Privacy-preserving distributed clustering using generative models. In Proc. of the 3rd IEEE International Conference on Data Mining, pages 211-218.
  10. Naor, M. and Pinkas, B. (2001). Efficient oblivious transfer protocols. In Proc. of the 12th annual ACM-SIAM symposium on Discrete algorithms, pages 448-457.
  11. Oliveira, S. R. M. and Zaiane, O. R. (2003). Privacy preserving clustering by data transformation. In Proc. of the 18th Brazilian Symposium on Databases), pages 304-318.
  12. Peng, K., Boyd, C., Dawson, E., and Lee, B. (2004). An efficient and verifiable solution to the millionaire problem. In Proc. of the 7th International Conference on Information Security and Cryptology, pages 51-66.
  13. Samet, S. and Miri, A. (2006). Privacy preserving ID3 using Gini Index over horizontally partitioned data. Submitted.
  14. Vaidya, J. and Clifton, C. (2003). Privacy-preserving kmeans clustering over vertically partitioned data. In Proc. of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 206-215.
  15. Xiao, M.-J., Huang, L.-S., Luo, Y.-L., and Shen, H. (2005). Privacy preserving ID3 algorithm over horizontally partitioned data. In Parallel and Distributed Computing, Applications and Technologies, pages 239-243.
  16. Yao, A. C. (1982). Protocols for secure computations. In Proc. of the 23th Symposium on Foundations of Computer Science, pages 160-164.
  17. Yao, A. C. (1986). How to generate and exchange secrets. In Proc. of the 27th Symposium on Foundations of Computer Science, pages 162--167.
Download


Paper Citation


in Harvard Style

Samet S., Miri A. and Orozco-Barbosa L. (2007). PRIVACY PRESERVING k-MEANS CLUSTERING IN MULTI-PARTY ENVIRONMENT . In Proceedings of the Second International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2007) ISBN 978-989-8111-12-8, pages 381-385. DOI: 10.5220/0002121703810385


in Bibtex Style

@conference{secrypt07,
author={Saeed Samet and Ali Miri and Luis Orozco-Barbosa},
title={PRIVACY PRESERVING k-MEANS CLUSTERING IN MULTI-PARTY ENVIRONMENT},
booktitle={Proceedings of the Second International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2007)},
year={2007},
pages={381-385},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002121703810385},
isbn={978-989-8111-12-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2007)
TI - PRIVACY PRESERVING k-MEANS CLUSTERING IN MULTI-PARTY ENVIRONMENT
SN - 978-989-8111-12-8
AU - Samet S.
AU - Miri A.
AU - Orozco-Barbosa L.
PY - 2007
SP - 381
EP - 385
DO - 10.5220/0002121703810385