AN EFFECTIVE CLUSTERING APPROACH TO WEB QUERY LOG ANONYMIZATION

Amin Milani Fard, Ke Wang

Abstract

Web query log data contain information useful to research; however, release of such data can re-identify the search engine users issuing the queries. These privacy concerns go far beyond removing explicitly identifying information such as name and address, since non-identifying personal data can be combined with publicly available information to pinpoint to an individual. In this work we model web query logs as unstructured transaction data and present a novel transaction anonymization technique based on clustering and generalization techniques to achieve the k-anonymity privacy. We conduct extensive experiments on the AOL query log data. Our results show that this method results in a higher data utility compared to the state-of-the-art transaction anonymization methods.

References

  1. Adar, E. (2007). User 4XXXXX9: Anonymizing query logs, In Query Log Workshop, In WWW 2007.
  2. Barbaro, M. and Zeller, T. (2006). A face is exposed for AOL searcher no. 4417749, In The New York Times. 2006-08-09.
  3. Bayardo, R. J., and Agrawal, R. (2005). Data privacy through optimal k-anonymization. In ICDE 2005.
  4. Cooper, A. (2008). A survey of query log privacyenhancing techniques from a policy perspective, In ACM Transactions on the Web, Vol. 2, No. 4, 2008.
  5. Fellbaum, C. (1998). WordNet, an electronic lexical database, In MIT Press, Cambridge MA, 1998.
  6. Fung, B., Wang, K., Chen, R., Yu., P. (2010). Privacypreserving data publishing: a survey on recent developments. ACM Computing Surveys, Vol. 42, Issue No 4, December 2010
  7. Hafner, K. (2006). Tempting data, privacy concerns; researchers yearn to use AOL logs, but they hesitate, In The New York Times. 2006-09-13.
  8. He, Y., and Naughton, J. (2009). Anonymization of set valued data via top-down, local generalization. In VLDB 2009.
  9. Iyengar, V. (2002). Transforming data to satisfy privacy constraints, In SIGKDD 2002.
  10. Kumar, R., Novak, J., Pang, B., and Tomkins, A. (2007) On anonymizing query logs via token-based hashing. In WWW 2007.
  11. LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2005). Incognito: Efficient full-domain k-anonymity. In SIGMOD 2005.
  12. LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2006) Mondrian multidimensional k-anonymity. In ICDE 2006.
  13. Meyerson, A., Williams, R. (2004). On the complexity of optimal k-anonymity, In PODS 2004.
  14. Pass, G., Chowdhury, A., and Torgeson, C. (2006). A picture of search, The 1st International Conference on Scalable Information Systems, Hong Kong, 2006.
  15. Samarati, P. (2001). Protecting respondents' identities in microdata releases. In TKDE, vol. 13, no. 6, pp. 1010- 1027.
  16. Sweeney, L. (2002). Achieving k-anonymity privacy protection using generalization and suppression, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10 (5), 2002, p.p 571-588.
  17. Terrovitis, M., Mamoulis, N., and Kalnis, P. (2008). Privacy preserving anonymization of set valued data. In VLDB 2008.
  18. Xu, Y., Wang, K., Fu, A., and Yu, P. (2008). Anonymizing transaction databases for publication, In SIGKDD 2008.
  19. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., and Fu, A. (2006). Utility-based anonymization using local recoding. In SIGKDD 2006.
Download


Paper Citation


in Harvard Style

Milani Fard A. and Wang K. (2010). AN EFFECTIVE CLUSTERING APPROACH TO WEB QUERY LOG ANONYMIZATION . In Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2010) ISBN 978-989-8425-18-8, pages 109-119. DOI: 10.5220/0002924901090119


in Bibtex Style

@conference{secrypt10,
author={Amin Milani Fard and Ke Wang},
title={AN EFFECTIVE CLUSTERING APPROACH TO WEB QUERY LOG ANONYMIZATION},
booktitle={Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2010)},
year={2010},
pages={109-119},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002924901090119},
isbn={978-989-8425-18-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2010)
TI - AN EFFECTIVE CLUSTERING APPROACH TO WEB QUERY LOG ANONYMIZATION
SN - 978-989-8425-18-8
AU - Milani Fard A.
AU - Wang K.
PY - 2010
SP - 109
EP - 119
DO - 10.5220/0002924901090119