score higher than the mean of top 20 retrieved
documents without expansion is recorded, as shown
in figure 1.
Figure 1: Numbers of retrieved documents with scores
higher than the threshold.
Figure 1 shows that there are more high score
candidate documents are retrieved. This implicitly
means that we can obtain more possibly related
documents using this expansion method.
To illustrate the average score improvement, the
retrieval results of expanded queries are combined
together. Then they are compared with the retrieval
result of the original query as is shown in figure 2.
Figure 2: Comparison of average scores of top 20
retrieved documents.
Figure 2 shows the score of combined documents
of expanded queries is obviously higher than without
any expansion. The average score increment over the
10 queries is 2.7113.
The figures above show that concept based query
expansion can not only provide more high score
candidate documents, but also improve the average
score of the top 20 retrieved documents.
5 CONCLUSIONS
In this paper, we proposed concept based query and
document expansion method using hidden Markov
model. Preliminary experimental result on query
expansion show that our concept based method can
not only retrieve more high score documents, but
also improve the average score of top 10 retrieved
documents.
Theoretically, document expansion can also
improve the retrieval effectiveness by providing
more candidate documents, and it will be part of our
future work. Apart from that, using TREC
collections to show how the expansion method
reallyaffect the recall and precision is also
imperative.
ACKNOWLEDGEMENTS
This work is supported by National Basic Research
Program of China (973 Project, No.
2007CB310806).
REFERENCES
Yonggang Qiu, H.P. Frei, 1993. Concept based query
expansion. In SIGIR’93, 16th Int. ACM/SIGIR Conf.
on R&D in Information Retrieval, pages 160-167,
Pittsburgh, PA, USA.
Ellen M. Voorhees, 1994. Query expansion using lexical-
semantic relations. In SIGIR’94, 17th Int. ACM/SIGIR
Conf. on R&D in Information Retrieval, pages 61-69,
Dublin, Ireland.
Orland Hoeber, Xue-Dong Yang, Yiyu Yao, 2005.
Conceptual query expansion. In Proceedings of the
Atlantic Web Intelligence Conference. Lodz, Poland.
Manning, C. D., Schutze, H. 1999. Foundations of
statistical natural language processing. Cambridge
Massachusetts: MIT Press.
Min Zhang, Ruihua Song, Shaoping Ma, 2004. Document
Refinement Based On Semantic Query Expansion.
Journal of Computer, Vol.27, No.10, pp1395-1401.
Singhal and Pereira, 1999. Document Expansion for
Speech Retrieval. In SIGIR’99, 22th Int. ACM/SIGIR
Conf. on R&D in Information Retrieval. pages 34-41
Liu, X. and Croft, W. B, 2004. Cluster-based retrieval
using language models. In Proceedings of SIGIR '04,
pages 186-193.
John Makhoul, Richard Schwartz, 1994. State of the art in
continuous speech recognition, National Academy
Press, Washington, DC, USA.
David R. H. Miller, Tim Leek, Richard M. Schwartz, 1999.
A hidden Markov model information retrieval system,
In proceedings of the 1999 ACM SIGIR Conf. on R&D
in Information Retrieval, page 214-221.
Christopher D. Manning, Prabhakar Raghavan and Hinrich
Schütze. Introduction to Information Retrieval.
Cambridge University Press. 2008
A. J. Viterbi, 1967. Error bounds for convolutional codes
and an asymptotically optimal decoding algorithms.
IEEE Trans. Informat. Theory, vol. IT-13, pp. 260-269.
WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies
700