CONCEPT BASED QUERY AND DOCUMENT EXPANSION USING HIDDEN MARKOV MODEL

Jiuling Zhang, Zuoda Liu, Beixing Deng, Xing Li

Abstract

Query and document expansion techniques have been widely studied for improving the effectiveness of information retrieval. In this paper, we propose a method for concept based query and document expansion employing the hidden Markov model(HMM). WordNet is adopted as the thesaurus set of concepts and terms. Expanded query and document candidates are yielded basing on the concepts which are recovered from the original query/document term sequence by employing the hidden Markov model. Using 50000 web pages crawled from universities as our test collection and Lemur Toolkit as our retrieval tool, preliminary experiment on query expansion show that the score of top 20 retrieved documents have a 2.7113 average score increment. Numbers of documents with score higher than a given value also increased significantly.

References

  1. Yonggang Qiu, H.P. Frei, 1993. Concept based query expansion. In SIGIR'93, 16th Int. ACM/SIGIR Conf. on R&D in Information Retrieval, pages 160-167, Pittsburgh, PA, USA.
  2. Ellen M. Voorhees, 1994. Query expansion using lexicalsemantic relations. In SIGIR'94, 17th Int. ACM/SIGIR Conf. on R&D in Information Retrieval, pages 61-69, Dublin, Ireland.
  3. Orland Hoeber, Xue-Dong Yang, Yiyu Yao, 2005. Conceptual query expansion. In Proceedings of the Atlantic Web Intelligence Conference. Lodz, Poland.
  4. Manning, C. D., Schutze, H. 1999. Foundations of statistical natural language processing. Cambridge Massachusetts: MIT Press.
  5. Min Zhang, Ruihua Song, Shaoping Ma, 2004. Document Refinement Based On Semantic Query Expansion. Journal of Computer, Vol.27, No.10, pp1395-1401.
  6. Singhal and Pereira, 1999. Document Expansion for Speech Retrieval. In SIGIR'99, 22th Int. ACM/SIGIR Conf. on R&D in Information Retrieval. pages 34-41
  7. Liu, X. and Croft, W. B, 2004. Cluster-based retrieval using language models. In Proceedings of SIGIR 7804, pages 186-193.
  8. John Makhoul, Richard Schwartz, 1994. State of the art in continuous speech recognition, National Academy Press, Washington, DC, USA.
  9. David R. H. Miller, Tim Leek, Richard M. Schwartz, 1999. A hidden Markov model information retrieval system, In proceedings of the 1999 ACM SIGIR Conf. on R&D in Information Retrieval, page 214-221.
  10. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press. 2008
  11. A. J. Viterbi, 1967. Error bounds for convolutional codes and an asymptotically optimal decoding algorithms. IEEE Trans. Informat. Theory, vol. IT-13, pp. 260-269.
Download


Paper Citation


in Harvard Style

Zhang J., Liu Z., Deng B. and Li X. (2009). CONCEPT BASED QUERY AND DOCUMENT EXPANSION USING HIDDEN MARKOV MODEL . In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-81-4, pages 688-691. DOI: 10.5220/0001842506880691


in Bibtex Style

@conference{webist09,
author={Jiuling Zhang and Zuoda Liu and Beixing Deng and Xing Li},
title={CONCEPT BASED QUERY AND DOCUMENT EXPANSION USING HIDDEN MARKOV MODEL},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2009},
pages={688-691},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001842506880691},
isbn={978-989-8111-81-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - CONCEPT BASED QUERY AND DOCUMENT EXPANSION USING HIDDEN MARKOV MODEL
SN - 978-989-8111-81-4
AU - Zhang J.
AU - Liu Z.
AU - Deng B.
AU - Li X.
PY - 2009
SP - 688
EP - 691
DO - 10.5220/0001842506880691