AUGMENTING SEARCH WITH CORPUS-DERIVED SEMANTIC RELEVANCE

Zachary Mason

Abstract

This paper describes a system for doing contextually-steered web search. The system is based on a method for estimating the semantic relevance of a web page to a query. Consider doing a web search for conferences about web search. The query “search conferences” is not effective, as it produces results relevant for the most part to searching over conferences, rather than conferences on the topic of search. The system described in this paper enables queries of the form “search conference context:pagerank”. The context field in this example specifies a preference for results semantically relevant to the term “pagerank”, although there is no requirement that said results contain the word “pagerank” itself. This a more semantic, less lexical way of refining the query than adding literal conjuncts. Contextual search, as implemented in this paper, is based on the Google (Google) search engine. For each query, the top one hundred search results are fetched from Google and sorted according to their relevance to the context query. Relevance is computed as a distance function between the vocabulary vectors associated with a web-page and a query. For queries, the vocabulary vector is formed by aggregating the web-pages in the search results for that query. For web-pages, the vocabulary vector is aggregated from that web-page and other web-pages nearby in link-space.

References

  1. James Allan and Hema Raghavan (2002). Using Part-ofSpeech Patterns to Reduce Query Ambiguity. SIGIR 7802, Tampere, Finland.
  2. P.D. Bruza and S.Dennis. (1997) Query-reformulation on the internet: empirical data and the hyperindex search engine. In Proceedings of the RIAO Conference: Intel-
  3. ligent Text and Image Handling, pages 488-499, Mon-
  4. Andrew Burton-Jones, Veda C. Storey, Vijayan Sugumaran and Sandeep Purao. (2003) A Heuristic-Based Methodology for Semantic Augmentation of User Queries on the Web. International conference on conceptual modeling, ER'03, pp. 476-489,
  5. Michalis Faloutsos, Petros Faloutsos, Christos Faloutsos (1999) On power-law relationships of the Internet topology. Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication.
  6. Cheng Niu, Wei Li, Rohini K. Srihari, Huifeng Li, Laurie Crist. (2004). Context Clustering for Word Sense Disambiguation Based on Modeling Pairwise Context Similarities. SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona.
  7. Geoffrey Leech, Paul Rayson, Andrew Wilson (2001). Word Frequencies in Written and Spoken English: based on the British National Corpus. Longman, London.
  8. M. Sanderson and D. Lawrie. (2000) Building, testing, and applying concept hierarchies. In W. Bruce Croft, editor, Advances in Information Retrieval: Recent Research from the CIIR, W. Brude Croft, ed., Kluwer Academic Publishers, chapter 9, pages 235-266. Kluwer Academic Press, 2000.
  9. Schutze, Hinrich. (1998) Automatic Word Sense Discrimination. Computational Linguistics. 24:1, 97-123.
Download


Paper Citation


in Harvard Style

Mason Z. (2007). AUGMENTING SEARCH WITH CORPUS-DERIVED SEMANTIC RELEVANCE . In Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-972-8865-78-8, pages 367-371. DOI: 10.5220/0001259403670371


in Bibtex Style

@conference{webist07,
author={Zachary Mason},
title={AUGMENTING SEARCH WITH CORPUS-DERIVED SEMANTIC RELEVANCE},
booktitle={Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2007},
pages={367-371},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001259403670371},
isbn={978-972-8865-78-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - AUGMENTING SEARCH WITH CORPUS-DERIVED SEMANTIC RELEVANCE
SN - 978-972-8865-78-8
AU - Mason Z.
PY - 2007
SP - 367
EP - 371
DO - 10.5220/0001259403670371