DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL

Shuguang Wang, Shyam Visweswaran, Milos Hauskrecht

Abstract

We are interested in enhancing information retrieval methods by incorporating domain knowledge. In this paper, we present a new document retrieval framework that learns a probabilistic knowledge model and exploits this model to improve document retrieval. The knowledge model is represented by a network of associations among concepts defining key domain entities and is extracted from a corpus of documents or from a curated domain knowledge base. This knowledge model is then used to perform concept-related probabilistic inferences using link analysis methods and applied to the task of document retrieval. We evaluate this new framework on two biomedical datasets and show that this novel knowledge-based approach outperforms the state-of-art Lemur/Indri document retrieval method.

References

  1. Aronson, A. R. and Rindflesch, T. C. (1997). Query expansion using the umls metathesaurus. In TREC 7804: Proceedings of the AMIA Annual Fall Symposium 97, JAMIA Suppl, pages 485-489. AMIA.
  2. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993-1022.
  3. B├╝ttcher, S., Clarke, C. L. A., and Cormack, G. V. (2004). Domain-specific synonym expansion and validation for biomedical information retrieval. In TREC 7804: Proceedings of the 13th Text REtrieval Conference.
  4. Cohn, D. and Chang, H. (2000). Learning to probabilistically identify authoritative documents. In Proc. 17th International Conf. on Machine Learning, pages 167- 174. Morgan Kaufmann, San Francisco, CA.
  5. Collins, M. (1999). Head-driven statistical models for natural language parsing. PhD Dissertation.
  6. Hofmann, T. (1999). Probabilistic latent semantic indexing. In SIGIR 7899: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, pages 50-57. ACM Press.
  7. Landauer, T. K., Foltz, P. W., and Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25:259-284.
  8. Lavrenko, V. and Croft, B. W. (2001). Relevance based language models. In SIGIR 7801: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 120-127. ACM.
  9. Lee, W.-J., Raschid, L., Srinivasan, P., Shah, N., Rubin, D., and Noy, N. (2007). Using annotations from controlled vocabularies to find meaningful associations. In DILS 7807: In Fourth International Workshop on Data Integration in the Life Sciences, pages 27-29.
  10. Lin, J. and Demner-Fushman, D. (2006). The role of knowledge in conceptual retrieval: a study in the domain of clinical medicine. In SIGIR 7806: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 99-106. ACM.
  11. Pickens, J. and MacFarlane, A. (2006). Term context models for information retrieval. In CIKM 7806: Proceedings of the 15th ACM international conference on Information and knowledge management, pages 559- 566. ACM.
  12. Pillet, V., Zehnder, M., Seewald, A. K., Veuthey, A.-L., and Petra, J. (2005). Gpsdb: a new database for synonyms expansion of gene and protein names. Bioinformatics, 21(8):1743-1744.
  13. Wei, X. and Croft, W. B. (2006). Lda-based document models for ad-hoc retrieval. In SIGIR 7806: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pages 178-185. ACM.
  14. Zhou, W., Yu, C., Smalheiser, N., Torvik, V., and Hong, J. (2007). Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature. In SIGIR 7807: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pages 655-662. ACM.
Download


Paper Citation


in Harvard Style

Wang S., Visweswaran S. and Hauskrecht M. (2009). DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009) ISBN 978-989-674-011-5, pages 26-33. DOI: 10.5220/0002293400260033


in Bibtex Style

@conference{kdir09,
author={Shuguang Wang and Shyam Visweswaran and Milos Hauskrecht},
title={DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)},
year={2009},
pages={26-33},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002293400260033},
isbn={978-989-674-011-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)
TI - DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL
SN - 978-989-674-011-5
AU - Wang S.
AU - Visweswaran S.
AU - Hauskrecht M.
PY - 2009
SP - 26
EP - 33
DO - 10.5220/0002293400260033