ment is to construct a 2-anonymized dataset. On the
other hand, the latter result shows that the risk val-
ues for many records are k < 100 and the maximum
is k = 340. This means that we have constructed a
dataset the risks of which records are distributed uni-
formly. Additionally, we decided on a focus attribute
(height in this experiment) first, and we can analyze
this attribute specifically.
6 CONCLUSION
In this paper, we proposed a method for construct-
ing a generalization hierarchy based on an analysis of
correlations between attribute values and analyzed the
effect of the method using an actual medical exami-
nation dataset. We conclude that our method is an ef-
fective way to generate more practical k anonymized
datasets.
ACKNOWLEDGEMENTS
This work was supported by CREST, JST.
REFERENCES
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R.,
Panigrahy, R., Thomas, D., and Zhu, A. (2005).
Anonymizing tables. In Proc. of ICDT 2005, LNCS,
volume 3363, pages 246–258.
Al-Fedaghi, S. S. (2005). Balanced k-anonymity. In Proc.
of WASET, volume 6, pages 179–182.
Basu, A., Nakamura, T., Hidano, S., and Kiyomoto, S.
(2015). k-anonymity: risks and the reality, accepted
for publication. In IEEE International Symposium
on Recent Advances of Trust, Security and Privacy in
Computing and Communications (RATSP, collocated
with the IEEE TrustCom).
Byun, J.-W., Kamra, A., Bertino, E., and Li, N. (2007). Effi-
cient k-anonymity using clustering technique. In Proc.
of the International Conference on Database Systems
for Advanced Applications, pages 188–200.
Dalenius, T. (1986). Finding a needle in a haystack —or
identifying anonymous census record. In Journal of
Official Statistics, volume 2(3), pages 329–336.
Dwork, C. (2006). Differential privacy. In Proc. of ICALP
2006, volume 4052, pages 1–12.
Dwork, C. (2008). Differential privacy: A survey of results.
In Proc. of TAMC 2008, volume 4978, pages 1–19.
Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., and
Naor, M. (2006a). Our data, ourselves: Privacy via
distributed noise generation. In Proc. of Eurocrypt
2006, LNCS, volume 4004, pages 486–503.
Dwork, C., McSherry, F., Nissim, K., and Smith, A.
(2006b). Calibrating noise to sensitivity in private data
analysis. In Proc. of TCC 2006, LNCS, volume 3876,
pages 265–284.
Fienberg, S. E. and McIntyre, J. (2004). Data swapping:
Variations on a theme by dalenius and reiss. In Proc.
of PSD 2004, LNCS, volume 3050, pages 14–29.
Freidman, J. H., Bentley, J. L., and Finkel, R. A. (2009).
An algorithm for finding best matches in logarithmic
expected time. In ACM Transactions on Mathematical
Software, volume 16 (5), pages 670–682.
Guha, S., Cheng, B., and Francis, P. (2010). Challenges in
measuring online advertising systems. In Proceedings
of the 10th ACM SIGCOMM conference on Internet
measurement, IMC ’10, pages 81–87.
Guttman, A. (1984). R-trees: A dynamic index structure
for spatial searching. In Proceedings of the 1984 ACM
SIGMOD international conference on Management of
data, volume 14, page 47.
He, X., Chen, H., Chen, Y., Dong, Y., Wang, P., and Huang,
Z. (2012). Clustering-based k-anonymity. In Ad-
vances in Knowledge Discovery and Data Mining SE,
volume 7301, pages 405–417. Springer-Verlag.
Iwuchukwu, T. and Naughton, J. F. (2007). K-
anonymization as spatial indexing: Toward scarable
and incremental anonymization. In Proceeding of the
33rd International Conference on Very Large Data
Bases, VLDB, pages 746–757.
Kiyomoto, S. and Martin, K. M. (2010). Towards a common
notion of privacy leakage on public database. In Proc.
of BWCCA 2010, to appear, pages 186–191. IEEE.
Korolova, A. (2010). Privacy violations using microtargeted
ads: A case study. In Proceedings of the 2010 IEEE
International Conference on Data Mining Workshops,
ICDMW ’10, pages 474–482.
LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2006).
Mondrian multidimensional k-anonymity. In Proc. of
the 22nd International Conference on Data Engineer-
ing (ICDE ’06), pages 25–35. IEEE.
Lin, J.-L. and Wei, M.-C. (2008). An efficient clustering
method for k-anonymization. In Proc. of the 2008 in-
ternational workshop on Privacy and anonymity in in-
formation society (PAIS ’08), pages 46–50. ACM.
Machanavajjhala, A., Gehrke, J., and Kifer, D. (2006). l-
diversity: Privacy beyond k-anonymity. In Proc. of
ICDE’06, pages 24–35.
Machanavajjhala, A., Gehrke, J., and Kifer, D. (2007).
t-closeness: Privacy beyond k-anonymity and l-
diversity. In Proc. of ICDE’07, pages 106–115.
Meyerson, A. and Williams, R. (2004). On the complex-
ity of optimal k-anonymity. In Proc. of PODS 2004,
pages 223–228.
Samarati, P. (2001). Protecting respondents’ identities in
microdata release. IEEE Trans. on Knowledge and
Data Engineering, 13(6):1010–1027.
Samarati, P. and Sweeney, L. (1998a). Generalizing data to
provide anonymity when disclosing information. In
Proc. of PODS 1998, page 188.
Samarati, P. and Sweeney, L. (1998b). Protecting pri-
vacy when disclosing information: k-anonymity and
Towards Practical k-Anonymization: Correlation-based Construction of Generalization Hierarchy
417