AN EFFECTIVE CLUSTERING APPROACH TO WEB QUERY LOG ANONYMIZATION
Amin Milani Fard, Ke Wang
2010
Abstract
Web query log data contain information useful to research; however, release of such data can re-identify the search engine users issuing the queries. These privacy concerns go far beyond removing explicitly identifying information such as name and address, since non-identifying personal data can be combined with publicly available information to pinpoint to an individual. In this work we model web query logs as unstructured transaction data and present a novel transaction anonymization technique based on clustering and generalization techniques to achieve the k-anonymity privacy. We conduct extensive experiments on the AOL query log data. Our results show that this method results in a higher data utility compared to the state-of-the-art transaction anonymization methods.
References
- Adar, E. (2007). User 4XXXXX9: Anonymizing query logs, In Query Log Workshop, In WWW 2007.
- Barbaro, M. and Zeller, T. (2006). A face is exposed for AOL searcher no. 4417749, In The New York Times. 2006-08-09.
- Bayardo, R. J., and Agrawal, R. (2005). Data privacy through optimal k-anonymization. In ICDE 2005.
- Cooper, A. (2008). A survey of query log privacyenhancing techniques from a policy perspective, In ACM Transactions on the Web, Vol. 2, No. 4, 2008.
- Fellbaum, C. (1998). WordNet, an electronic lexical database, In MIT Press, Cambridge MA, 1998.
- Fung, B., Wang, K., Chen, R., Yu., P. (2010). Privacypreserving data publishing: a survey on recent developments. ACM Computing Surveys, Vol. 42, Issue No 4, December 2010
- Hafner, K. (2006). Tempting data, privacy concerns; researchers yearn to use AOL logs, but they hesitate, In The New York Times. 2006-09-13.
- He, Y., and Naughton, J. (2009). Anonymization of set valued data via top-down, local generalization. In VLDB 2009.
- Iyengar, V. (2002). Transforming data to satisfy privacy constraints, In SIGKDD 2002.
- Kumar, R., Novak, J., Pang, B., and Tomkins, A. (2007) On anonymizing query logs via token-based hashing. In WWW 2007.
- LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2005). Incognito: Efficient full-domain k-anonymity. In SIGMOD 2005.
- LeFevre, K., DeWitt, D. J., and Ramakrishnan, R. (2006) Mondrian multidimensional k-anonymity. In ICDE 2006.
- Meyerson, A., Williams, R. (2004). On the complexity of optimal k-anonymity, In PODS 2004.
- Pass, G., Chowdhury, A., and Torgeson, C. (2006). A picture of search, The 1st International Conference on Scalable Information Systems, Hong Kong, 2006.
- Samarati, P. (2001). Protecting respondents' identities in microdata releases. In TKDE, vol. 13, no. 6, pp. 1010- 1027.
- Sweeney, L. (2002). Achieving k-anonymity privacy protection using generalization and suppression, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10 (5), 2002, p.p 571-588.
- Terrovitis, M., Mamoulis, N., and Kalnis, P. (2008). Privacy preserving anonymization of set valued data. In VLDB 2008.
- Xu, Y., Wang, K., Fu, A., and Yu, P. (2008). Anonymizing transaction databases for publication, In SIGKDD 2008.
- Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., and Fu, A. (2006). Utility-based anonymization using local recoding. In SIGKDD 2006.
Paper Citation
in Harvard Style
Milani Fard A. and Wang K. (2010). AN EFFECTIVE CLUSTERING APPROACH TO WEB QUERY LOG ANONYMIZATION . In Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2010) ISBN 978-989-8425-18-8, pages 109-119. DOI: 10.5220/0002924901090119
in Bibtex Style
@conference{secrypt10,
author={Amin Milani Fard and Ke Wang},
title={AN EFFECTIVE CLUSTERING APPROACH TO WEB QUERY LOG ANONYMIZATION},
booktitle={Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2010)},
year={2010},
pages={109-119},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002924901090119},
isbn={978-989-8425-18-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2010)
TI - AN EFFECTIVE CLUSTERING APPROACH TO WEB QUERY LOG ANONYMIZATION
SN - 978-989-8425-18-8
AU - Milani Fard A.
AU - Wang K.
PY - 2010
SP - 109
EP - 119
DO - 10.5220/0002924901090119