THESAURUS BASED SEMANTIC REPRESENTATION IN LANGUAGE MODELING FOR MEDICAL ARTICLE INDEXING

Jihen Majdoubi, Mohamed Tmar, Faiez Gargouri

Abstract

Language modeling approach plays an important role in many areas of natural language processing including speech recognition, machine translation, and information retrieval. In this paper, we propose a contribution for conceptual indexing of medical articles by using the MeSH (Medical Subject Headings) thesaurus, then we propose a tool for indexing medical articles called SIMA (System of Indexing Medical Articles) which uses a language model to extract the MeSH descriptors representing the document. To assess the relevance of a document to a MeSH descriptor, we estimate the probability that the MeSH descriptor would have been generated by language model of this document.

References

  1. Ambroziak J. (1997). Conceptually assisted web browsing. In Sixth International World Wide Web conference, Santa Clara.
  2. Aronson A. (2001). Effective mapping of biomedical text to the umls metathesaurus: the metamap program. In AMIA, pages 17-21.
  3. Aronson A., J. Mork, C. Gay, S. Humphrey and W. Rogers (2004). The nlm indexing initiative's medical text indexer. In Medinfo.
  4. Baziz M. (2006). Indexation conceptuelle guidée par ontologie pour la recherche d'information. PhD thesis, Univ. of Paul sabatier.
  5. Cunningham M., D. Maynard, K. Bontcheva and V. Tablan (2002). Gate: A framework and graphical development environment for robust nlp tools and applications. ACL.
  6. Gamet J. (1998). Indexation de pages web. Report of dea, universit de Nantes.
  7. Gonzalo J., F. Verdejo, I. Chugur and J. Cigarran (1998). Indexing with wordnet synsets can improve text retrieval. In COLING-ACL 7898 Workshop on Usage of Word.Net in Natural Language Processing Systems, Montreal, Canada.
  8. Hiemstra D. (2001). Using Language Models for Information Retrieval. PhD thesis, University of Twente.
  9. Jin R., A. G. Hauptman and C. Zhai (2002). Title language model for information retrieval. In SIGIR02, pages 42-48.
  10. Khan L. (2000). Ontology-based Information Selection. PhD thesis, Faculty of the Graduate School, University of Southern California.
  11. Kim W., A. Aronson and W. Wilbur (2001). Automatic mesh term assignment and quality assessment. In AMIA.
  12. Lafferty J. and Zhai C. (2001). Document language models, query models, and risk minimization for information retrieval. In SIGIR'01, pages 111-119.
  13. Lenoir P., R. Michel, C. Frangeul and G. Chales (1981). Réalisation, développement et maintenance de la base de données a.d.m. In Médecine informatique.
  14. Majdoubi J, M. Tmar and F. Gargouri (2009). Using the mesh thesaurus to index a medical article:combination of content, structure and semantics. In KnowledgeBased and Intelligent Information and Engineering Systems, 13th International Conference, KES'2009, page 278285.
  15. Mauldin M. L. (1991). Retrieval performance in ferret: a conceptual information retrieval system. In lSth International A CM-SIGIR Conference on Research and Development in Information Retrieval, pages 347- 355, Chicago.
  16. Mihalcea D. and Moldovan I. (2000). An iterative approach to word sense disambiguation. In FLAIRS2000, pages 219-223, Orlando,.
  17. Muller H., E. Kenny and P. Sternberg (2004). Textpresso: An ontology-based information retrieval and extraction system for biological literature. In PLoS Biol.
  18. Névéol A. (2005). Automatisation des taches documentaires dans un catalogue de santé en ligne. PhD thesis, Institut National des Sciences Appliques de Rouen.
  19. Névéol A., Mary V., A. Gaudinat, C. Boyer, Rogozan A. and S. Darmoni (2005). A benchmark evaluation of the french mesh indexers. In 10th Conference on Artificial Intelligence in Medicine, AIME 2005.
  20. Névéol A., S. Pereira, G. Kerdelhu, B. Dahamna, M. Joubert, and S. Darmoni (2007). Evaluation of a simple method for the automatic assignment of mesh descriptors to health resources in a french online catalogue. In MedInfo.
  21. Ponte M. and Croft W. (1998). A language modeling approach to information retrieval. In ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 275-281.
  22. Pouliquen B. (2002). Indexation de textes médicaux par indexation de concepts, et ses utilisations. PhD thesis, Universit Rennes 1.
  23. Sanderson M. (1994). Word sense disambiguation and information retrieval. In 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 142-151.
  24. Schmid H. (1994a). Probabilistic part-of-speech tagging using decision trees. International Conference on New Methods in Language Processing. Manchester.
  25. Schmid H. (1994b). Probabilistic part-of-speech tagging using decision trees. In International Conference on New Methods in Language Processing, Manchester.
  26. Stairmand A. and William J. (1996). Conceptual and contextual indexing of documents using wordnet-derived lexical chains. In 18th BCS-IRSG Annual Colloquium on Information Retrieval Research.
  27. Stein J. A. (1997). Alternative methods of indexing legal material: Development of a conceptual index. In Law Via the Internet 97, Sydney, Australia.
  28. Voorhees E. M. (1994). Query expansion using lexicalsemantic relations. In 17th Annual International ACM SIGIR, Conference on Research and Development in Information Retrieval, pages 61-69, Dublin, Ireland.
  29. Voorhees E. M. (1998). Using wordnet for text retrieval. In WordNet, An Electronic Lexical Database, pages 285- 303.
  30. Woods W. A. (1997). Conceptual indexing: A better way to organize knowledge. Technical Report TR-97-61, Digital Equipment Corporation, Sun Mierosysterns Laboratories.
  31. Yarowski D. (1993). One sense per collocation. In the ARPA Human Language Technology Workshop.
  32. Zhang J., Min.Q, Sun.L and Sun.Y (2004). An improved language model-based chinese ir system. In Journal of Chinese Information Processing, pages 23-29.
Download


Paper Citation


in Harvard Style

Majdoubi J., Tmar M. and Gargouri F. (2010). THESAURUS BASED SEMANTIC REPRESENTATION IN LANGUAGE MODELING FOR MEDICAL ARTICLE INDEXING . In Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8425-05-8, pages 65-74. DOI: 10.5220/0002903300650074


in Bibtex Style

@conference{iceis10,
author={Jihen Majdoubi and Mohamed Tmar and Faiez Gargouri},
title={THESAURUS BASED SEMANTIC REPRESENTATION IN LANGUAGE MODELING FOR MEDICAL ARTICLE INDEXING},
booktitle={Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2010},
pages={65-74},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002903300650074},
isbn={978-989-8425-05-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - THESAURUS BASED SEMANTIC REPRESENTATION IN LANGUAGE MODELING FOR MEDICAL ARTICLE INDEXING
SN - 978-989-8425-05-8
AU - Majdoubi J.
AU - Tmar M.
AU - Gargouri F.
PY - 2010
SP - 65
EP - 74
DO - 10.5220/0002903300650074