HIERARCHICAL TAXONOMY EXTRACTION BY MINING TOPICAL QUERY SESSIONS

Miguel Fernández-Fernández, Daniel Gayo-Avello

Abstract

Search engine logs store detailed information on Web users interactions. Thus, as more and more people use search engines on a daily basis, important trails of users common knowledge are being recorded in those files. Previous research has shown that it is possible to extract concept taxonomies from full text documents, while other scholars have proposed methods to obtain similar queries from query logs. We propose a mixture of both lines of research, that is, mining query logs not to find related queries nor query hierarchies but actual term taxonomies. In this first approach we have researched the feasibility of finding hyponymy relations between terms or noun-phrases by exploiting specialization search patterns in topical sessions, obtaining encouraging preliminary results.

References

  1. Baeza-Yates, R. and Tiberi, A. (2007). Extracting semantic relations from query logs. In KDD 7807: Proc. of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 76-85, New York, NY, USA. ACM.
  2. Berland, M. and Charniak, E. (1999). Finding parts in very large corpora. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 57-64.
  3. Boldi, P., Bonchi, F., Castillo, C., Donato, D., and Vigna, S. (2009). Query suggestions using query-flow graphs. In WSCD 7809: Proc. of the 2009 workshop on Web Search Click Data, pages 56-63, New York, NY, USA. ACM.
  4. Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2):3-10.
  5. Caraballo, S. A. (1999). Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 120-126, Morristown, NJ, USA. Association for Computational Linguistics.
  6. Chuang, S.-L. and Chien, L.-F. (2003). Enriching web taxonomies through subject categorization of query terms from search engine logs. Decis. Support Syst., 35(1):113-127.
  7. Chuang, S.-L. and Chien, L.-F. (2004). A practical webbased approach to generating topic hierarchy for text segments. In CIKM 7804: Proc. of the thirteenth ACM international conference on Information and knowledge management, pages 127-136, New York, NY, USA. ACM.
  8. Chuang, S.-L. and Chien, L.-F. (2005). Taxonomy generation for text segments: A practical web-based approach. ACM Trans. Inf. Syst., 23(4):363-396.
  9. Clough, P., Joho, H., and Sanderson, M. (2005). Automatically organising images using concept hierarchies,. In Proc. of the SIGIR Workshop on Multimedia Information Retrieval.
  10. Fallows, D. (2008). Almost half of all internet users now use search engines on a typical day. Technical report, Pew Internet & American Life Project. Accessed 6 February 2009. Available at: http://www.pewinternet.org/pdfs//PIP Search Aug08.pdf.
  11. Gabrilovich, E. and Markovitch, S. (2007). Harnessing the expertise of 70,000 human editors: Knowledge-based feature generation for text categorization. J. Mach. Learn. Res., 8:2297-2345.
  12. Gayo-Avello, D. (2009). A survey on session detection methods in query logs and a proposal for future evaluation. Inf. Sci., 179(12):1822-1843.
  13. Girju, R., Badulescu, A., and Moldovan, D. (2003). Learning semantic constraints for the automatic discovery of part-whole relations. In NAACL 7803: Proc. of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 1-8, Morristown, NJ, USA. Association for Computational Linguistics.
  14. He, D., Göker, A., and Harper, D. J. (2002). Combining evidence for automatic web session identification. Inf. Process. Manage., 38(5):727-742.
  15. Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics, pages 539- 545, Morristown, NJ, USA. Association for Computational Linguistics.
  16. Heymann, P. and Garcia-Molina, H. (2006). Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical Report 2006-10, Stanford University.
  17. Jansen, B. J., Booth, D. L., and Spink, A. (2008). Determining the informational, navigational, and transactional intent of web queries. Inf. Process. Manage., 44(3):1251-1266.
  18. Komachi, M. and Suzuki, H. (2008). Minimally supervised learning of semantic knowledge from query logs. In Proc. of the 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008), pages 358-365.
  19. Mandala, R., Tokunaga, T., and Tanaka, H. (1999). Complementing wordnet with roget's and corpus-based thesauri for information retrieval. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, pages 94-101, Morristown, NJ, USA. Association for Computational Linguistics.
  20. Microsoft (2006). Microsoft Research Microsoft Live Labs: Accelerating Search in Academic Research 2006. Available at: http://research.microsoft.com/ ur/us/fundingopps/RFPs/Search 2006 RFP.aspx. (Accessed 24 November 2008).
  21. Mihalcea, R. (2003). Turning wordnet into an information retrieval resource: Systematic polysemy and conversion to hierarchical codes. International Journal of Pattern Recognition and Articial Intelligence (IJPRAI), pages 689-704.
  22. Mika, P. (2007). Ontologies are us: A unified model of social networks and semantics. Web Semant., 5(1):5- 15.
  23. Miller, G. A. (1990). Wordnet: An on-line lexical database. International Journal of Lexicography, pages 235- 312.
  24. Morin, E. and Jacquemin, C. (2003). Automatic acquisition and expansion of hypernym links. Computer and the humanities, 38:363-396.
  25. Pas¸ca, M. (2007a). Organizing and searching the world wide web of facts - step two: harnessing the wisdom of the crowds. In WWW 7807: Proc. of the 16th international conference on World Wide Web, pages 101-110, New York, NY, USA. ACM.
  26. Pas¸ca, M. (2007b). Weakly-supervised discovery of named entities using web search queries. In CIKM 7807: Proc. of the sixteenth ACM conference on Conference on information and knowledge management, pages 683- 690, New York, NY, USA. ACM.
  27. Pas¸ca, V. D. (2007c). What you seek is what you get: Extraction of class attributes from query logs. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2832-2837.
  28. Pass, G., Chowdhury, A., and Torgeson, C. (2006). A picture of search. In InfoScale 7806: Proc. of the 1st international conference on Scalable information systems, page 1, New York, NY, USA. ACM.
  29. Schmitz, P. (2006). Inducing ontology from flickr tags. In Proc. of the Collaborative Web Tagging Workshop (WWW 7806).
  30. Schwarzkopf, E., Heckmann, D., Dengler, D., and Kroner, A. (2007). Mining the structure of tag spaces for user modeling. In Workshop on Data Mining for User Modeling.
  31. Sekine, S. and Suzuki, H. (2007). Acquiring ontological knowledge from query logs. In WWW 7807: Proc. of the 16th international conference on World Wide Web, pages 1223-1224, New York, NY, USA. ACM.
  32. Shen, D., Qin, M., Chen, W., Yang, Q., and Chen, Z. (2008). Mining web query hierarchies from clickthrough data. In AAAI07: Proc. of the Twenty-Second Conference on Artificial Intelligence.
  33. Spink, A., Wilson, T., Ellis, D., and Ford, N. (1998). Modeling users' successive searches in digital environments. D-Lib Magazine. Accesed 6 February 2009. Available at: http://www.dlib.org/dlib/april98/04spink.html.
  34. Vossen, P., editor (1998). EuroWordNet: a multilingual database with lexical semantic networks. Kluwer Academic Publishers, Norwell, MA, USA.
  35. Vossen, P. and Fellbaum, C. (2004). Wordnets in the world. Technical report, Global WordNet Association [http://www.globalwordnet.org/]. Accessed 06-02-09.
  36. Xiong, L. and Agichtein, E. (2007). Towards privacypreserving query log publishing'. In Query Log Analysis: Social And Technological Challenges. A workshop at the 16th International World Wide Web Conference (WWW2007).
Download


Paper Citation


in Harvard Style

Fernández-Fernández M. and Gayo-Avello D. (2009). HIERARCHICAL TAXONOMY EXTRACTION BY MINING TOPICAL QUERY SESSIONS . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009) ISBN 978-989-674-011-5, pages 229-235. DOI: 10.5220/0002331402290235


in Bibtex Style

@conference{kdir09,
author={Miguel Fernández-Fernández and Daniel Gayo-Avello},
title={HIERARCHICAL TAXONOMY EXTRACTION BY MINING TOPICAL QUERY SESSIONS},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)},
year={2009},
pages={229-235},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002331402290235},
isbn={978-989-674-011-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)
TI - HIERARCHICAL TAXONOMY EXTRACTION BY MINING TOPICAL QUERY SESSIONS
SN - 978-989-674-011-5
AU - Fernández-Fernández M.
AU - Gayo-Avello D.
PY - 2009
SP - 229
EP - 235
DO - 10.5220/0002331402290235