IMPROVING WEB SEARCH BY EXPLOITING SEARCH LOGS

Hongyan Ma

Abstract

With the increased use of Web search engines, acute needs evolve for more adaptive and more personalizable Information Retrieval (IR) systems. This study proposes an innovative probabilistic method exploiting search logs to gather useful data about contexts and users to support adaptive retrieval. Real users’ search logs from an operational Web search engine, Infocious, were processed to obtain past queries and click-through data for adaptive indexing and unified probabilistic retrieval. An empirical experiment of retrieval effectiveness was conducted. The results demonstrate that the log-based probabilistic system yields statistically superior performance over the baseline system.

References

  1. Agichtein, E., Brill E., & Dumais, S.T. (2006). Improving Web search ranking by incorporating user behavior information. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 19-26). Washington: ACM.
  2. Anick, P. (2003). Using terminological feedback for Web search refinement - a log-based study. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 88-95). New York: ACM.
  3. Beeferman, D., & Berger, A. (2000). Agglomerative clustering of a search engine query log. In Proceeding of International ACM SIGKDD Conference on Knowledge (ACM SIGKDD”00) (pp. 407-416). Boston: ACM.
  4. Billerbeck, B., Scholer, F. Williams, H.E, & Zobel, J. (2003). Query expansion using Associated Queries. In Proceedings of The Twelfth International Conference on Information and Knowledge Management. (pp. 2-9). New York: ACM.
  5. Borlund, P., & Ingwersen, P. (1997). The development of a method for the evaluation of interactive information retrieval systems. Journal of Documentation. 53(3):225-25.
  6. Burns, E. (2007). U.S. search engine rankings. Retrieved August 6, 2008, from www.searchenginewatch.com
  7. Cui, Q., & Dekhtyar, A. (2005), On Improving Local Website Search Using Web Server Traffic Logs: A Preliminary Report, In Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, (pp.59-66). New York: ACM
  8. Ding, C., & Zhou, J. (2007). Log-based indexing to improve Web site search. In Proceedings of the 2007 ACM symposium on Applied computing (pp 829-833). New York:ACM.
  9. Hou, Y., Zhu, H., & He, P. (2006). A framework of feedback search engine motivated by content relevance mining. In Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (pp.749-752). New York: ACM.
  10. Huang, C. Chien, L. & Oyang, Y. (2003). Relevant term suggestion in interactive Web search based on contextual information in query session logs. Journal of the American Society for Information Science and Technology, 54(7), 638-649.
  11. Joachims, T. (2002). Optimizing search engines using click-through data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 133-142). New York: ACM.
  12. Ma, H. (2008). User-System Coordination in Unified Probabilistic Retrieval: Exploiting Search Logs to Construct Common Ground. Unpublished Doctoral Dissertation. University of California, Los Angels.
  13. Nunberg, G. (2003). As google goes, so goes the nation. New York Times, May 18, 2003.
  14. Ozmutlu, H., Cavdur, F. (2005), Application of automatic topic identification on excite web search engine data logs, Information Processing and Management, 41,1243-62.
  15. Robertson, S. E. & Spärck Jones, K. (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27, 129-146.
  16. Robertson, S. E. & Spärck Jones, K. (1997). Simple, proven approaches to text retrieval. Technical Report. University of Cambridge Computer Laboratory.
  17. Scholer, F. Williams, H. & Turpin, A. (2004). Query association surrogates for Web search. Journal of the American Society for Information Science and Technology, 55(7), 637-650.
  18. Shen, X., Tan, B., & Zhai, C. (2005a). Context-sensitive information retrieval using implicit feedback, In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil:ACM
  19. Shen, X., Tan, B., & Zhai, C. (2005b). Implicit user modeling for personalized search. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management. (pp. 824 - 831). New York: ACM.
  20. Spärck Jones, K., & Willett, P. (1997). Overall introduction. In K. Spärck Jones and P. Willett (Eds.), Readings in Information Retrieval (pp 1 - 7), San Franciso: Morgan Kaufmann Public.
  21. Spärck Jones, K., Walker, S., & Robertson, S. E. (2000). A probabilistic model of information retrieval: development and comparative experiment. Information Processing and Management, 36, 779-808.
  22. Tan,Q., Chai, X., Ng, W., &Lee D.L. (2004). Applying co-training to clickthrough data for search engine adaptation. In Proceedings of the 9th International Conference on Database Systems for Advanced Applications (DASFAA). New York: ACM.
  23. Turpin, A. H., & Hersh, W. (2001). Why batch and user evaluations do not give the same results. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval. (pp. 225-231). ACM: New York.
  24. Xue,G. Zeng, H., Chen, Z., Ma, W., Zhang, H., & Lu, C. W. (2002). Log Mining to Improve the Performance of Site Search. In Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops) - (WISEw'02). (pp. 238). New York: ACM.
  25. Xue, G. Zeng, H., Cheng,Z. Yu, Y., Ma, W., Xi, W., & Fan, G. (2004) Optimizing Web search using Web click-through data. In Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management. (pp.118-126). New York: ACM.
  26. Wen, J. R., Nie. J., & Zhang, H. (2001). Clustering user queries of a search engine. In Proceedings of the 10th International World Wide Web Conference(WWW”01), (pp. 162-168). New York: ACM.
  27. Wen, J., Nie, J., & Zhang, H. (2002) Query clustering using user logs. ACM Transactions on Information Systems. 20(1). 59-81.
  28. Weires, R., Schommer, C., & Kaufmann S. (2008). SEREBIF - Search Engine Result Enhancement by Implicit Feedback. 4th Intl Conference on Web Information Systems and Technologies (WebIst). Funchal, Madeira. May 2008.
  29. Zha, H., Zheng, Z., Fu, H., & Sun, G. (2006). Incorporating query difference for learning retrieval functions in World Wide Web search. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (pp. 307- 316). New York: ACM.
  30. Zhou, J., Ding, C., & Androutsos, D. (2006). Improving Web site search using Web sever logs. In Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research (pp.35-48). New York: ACM.
Download


Paper Citation


in Harvard Style

Ma H. (2009). IMPROVING WEB SEARCH BY EXPLOITING SEARCH LOGS . In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-81-4, pages 208-216. DOI: 10.5220/0001843802080216


in Bibtex Style

@conference{webist09,
author={Hongyan Ma},
title={IMPROVING WEB SEARCH BY EXPLOITING SEARCH LOGS},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2009},
pages={208-216},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001843802080216},
isbn={978-989-8111-81-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - IMPROVING WEB SEARCH BY EXPLOITING SEARCH LOGS
SN - 978-989-8111-81-4
AU - Ma H.
PY - 2009
SP - 208
EP - 216
DO - 10.5220/0001843802080216