otherwise it is closer to µ
q
. Thus, P
1
and P
2
can run
the secure comparison protocol, presented in the sec-
tion 4. Their inputs are d
j1
− d
q1
for P
1
, and d
q2
− d
j2
for P
2
. Therefore, they can jointly decide which mean
is closer to the entity e.
6 CONCLUSIONS AND FUTURE
WORK
Clustering is a method to categorize information into
meaningful partitions to make data analysis simpler
and more accurate. This technique has a wide range of
applications in the real world and also as a utility for
data summarization and compression. In many cases,
privacy is crucial and secure protocols are needed to
perform clustering in order to preserve the privacy of
shareholders. Two multi-party protocols for privacy-
preserving k-means clustering are presented for hor-
izontally and vertically partitioned data, along with
a protocol for secure two-party comparison. These
SMC techniques are based on secure multi-party ad-
dition and division sub-protocols. There are many
different clustering algorithms such as k-means, k-
medoid, and Agglomerative Hierarchical clustering.
Most existing work in privacy-preserving clustering
uses k-means. One possible extension of this work is
to design protocols for other algorithms, particularly
hierarchical clustering.
REFERENCES
Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu,
M. Y. (2002). Tools for privacy preserving data min-
ing. SIGKDD Explorations, 4(2):28–34.
Du, W. and Atallah, M. (2001). Privacy-preserving co-
operative statistical analysis. In Proc. of the 17th
Annual Computer Security Applications Conference,
pages 102–110.
Duda, R. O., Hart, P. E., and Stork, D. G. (2000). Pattern
Classification (2nd ed). John Wiley.
Ioannidis, I. and Grama, A. (2003). An efficient protocol for
yao’s millionaires’ problem. In Proc. of the 36th An-
nual Hawaii International Conference on System Sci-
ence, pages 205–211.
Jagannathan, G., Pillaipakkamnatt, K., and Wright, R. N.
(2006). A new privacy-preserving distributed k-
clustering algorithm. In Proc. of the 2006 SIAM In-
ternational Conference on Data Mining.
Jagannathan, G. and Wright, R. N. (2005). Privacy-
preserving distributed k-means clustering over arbi-
trarily partitioned data. In Proceeding of the 11th
ACM SIGKDD international conference on Knowl-
edge discovery in data mining, pages 593–599.
Jha, S., Kruger, L., and McDaniel, P. (2005). Privacy pre-
serving clustering. In Proc. of the 10th European Sym-
posium on Research in Computer Security, pages 397–
417.
Malek, B. and Miri, A. (2006). Secure dot-product protocol
using trace functions. 2006 IEEE International Sym-
posium on Information Theory.
Merugu, S. and Ghosh, J. (2003). Privacy-preserving dis-
tributed clustering using generative models. In Proc.
of the 3rd IEEE International Conference on Data
Mining, pages 211–218.
Naor, M. and Pinkas, B. (2001). Efficient oblivious trans-
fer protocols. In Proc. of the 12th annual ACM-SIAM
symposium on Discrete algorithms, pages 448–457.
Oliveira, S. R. M. and Zaiane, O. R. (2003). Privacy pre-
serving clustering by data transformation. In Proc. of
the 18th Brazilian Symposium on Databases), pages
304–318.
Peng, K., Boyd, C., Dawson, E., and Lee, B. (2004). An ef-
ficient and verifiable solution to the millionaire prob-
lem. In Proc. of the 7th International Conference on
Information Security and Cryptology, pages 51–66.
Samet, S. and Miri, A. (2006). Privacy preserving ID3 using
Gini Index over horizontally partitioned data. Submit-
ted.
Vaidya, J. and Clifton, C. (2003). Privacy-preserving k-
means clustering over vertically partitioned data. In
Proc. of the 9th ACM SIGKDD international confer-
ence on Knowledge discovery and data mining, pages
206–215.
Xiao, M.-J., Huang, L.-S., Luo, Y.-L., and Shen, H. (2005).
Privacy preserving ID3 algorithm over horizontally
partitioned data. In Parallel and Distributed Comput-
ing, Applications and Technologies, pages 239–243.
Yao, A. C. (1982). Protocols for secure computations. In
Proc. of the 23th Symposium on Foundations of Com-
puter Science, pages 160–164.
Yao, A. C. (1986). How to generate and exchange secrets. In
Proc. of the 27th Symposium on Foundations of Com-
puter Science, pages 162––167.
PRIVACY PRESERVING k-MEANS CLUSTERING IN MULTI-PARTY ENVIRONMENT
385