score higher than the mean of top 20 retrieved 
documents without expansion is recorded, as shown 
in figure 1.  
 
Figure 1: Numbers of retrieved documents with scores 
higher than the threshold. 
Figure 1 shows that there are more high score 
candidate documents are retrieved. This implicitly 
means that we can obtain  more possibly related 
documents using this expansion method. 
To illustrate the average score improvement, the 
retrieval results of expanded queries are combined 
together. Then they are compared with the retrieval 
result of the original query as is shown in figure 2.  
 
Figure 2: Comparison of average scores of top 20 
retrieved documents. 
Figure 2 shows the score of combined documents 
of expanded queries is obviously higher than without 
any expansion. The average score increment over the 
10 queries is 2.7113. 
The figures above show that concept based query 
expansion can not only provide more high score 
candidate documents, but also improve the average 
score of the top 20 retrieved documents. 
5 CONCLUSIONS 
In this paper, we proposed concept based query and 
document expansion method using hidden Markov 
model. Preliminary experimental result on query 
expansion show that our concept based method can 
not only retrieve more high score documents, but 
also improve the average score of top 10 retrieved 
documents.  
Theoretically, document expansion can also 
improve the retrieval effectiveness by providing 
more candidate documents, and it will be part of our 
future work. Apart from that, using TREC 
collections to show how the expansion method 
reallyaffect the recall and precision is also 
imperative. 
ACKNOWLEDGEMENTS 
This work is supported by National Basic Research 
Program of China (973 Project, No. 
2007CB310806).
 
REFERENCES 
Yonggang Qiu, H.P. Frei, 1993. Concept based query 
expansion. In SIGIR’93, 16th Int. ACM/SIGIR Conf. 
on R&D in Information Retrieval, pages 160-167, 
Pittsburgh, PA, USA. 
Ellen M. Voorhees, 1994. Query expansion using lexical-
semantic relations. In SIGIR’94, 17th Int. ACM/SIGIR 
Conf. on R&D in Information Retrieval, pages 61-69, 
Dublin, Ireland. 
Orland Hoeber, Xue-Dong Yang, Yiyu Yao, 2005. 
Conceptual query expansion. In Proceedings of the 
Atlantic Web Intelligence Conference. Lodz, Poland. 
Manning, C. D., Schutze, H. 1999. Foundations of 
statistical natural language processing. Cambridge 
Massachusetts: MIT Press. 
Min Zhang, Ruihua Song, Shaoping Ma, 2004. Document 
Refinement Based On Semantic Query Expansion. 
Journal of Computer, Vol.27, No.10, pp1395-1401. 
Singhal and Pereira, 1999. Document Expansion for 
Speech Retrieval. In SIGIR’99, 22th Int. ACM/SIGIR 
Conf. on R&D in Information Retrieval. pages 34-41 
Liu, X. and Croft, W. B, 2004. Cluster-based retrieval 
using language models. In Proceedings of SIGIR '04, 
pages 186-193. 
John Makhoul, Richard Schwartz, 1994. State of the art in 
continuous speech recognition, National Academy 
Press, Washington, DC, USA. 
David R. H. Miller, Tim Leek, Richard M. Schwartz, 1999. 
A hidden Markov model information retrieval system, 
In proceedings of the 1999 ACM SIGIR Conf. on R&D 
in Information Retrieval, page 214-221. 
Christopher D. Manning, Prabhakar Raghavan and Hinrich 
Schütze.  Introduction to Information Retrieval. 
Cambridge University Press. 2008 
A. J. Viterbi, 1967. Error bounds for convolutional codes 
and an asymptotically optimal decoding algorithms. 
IEEE Trans. Informat. Theory, vol. IT-13, pp. 260-269. 
WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies
700