MAIF give the best results with a value of F-measure
equal to 27, 04. Concerning SIMA, it generates 26, 95
as value of F-measure at rank 10.
At rank 4, SIMA displayed the best performance
results with a F-measure rate of 27, 75% .
Concerning rank 50, the best result was scored
by SIMA with 9, 25 for precision and 15, 02 for F-
measure. Regarding MAIF, even though the precision
obtained (7,04) is the highest one, its F-measure have
been less than SIMA.
8 CONCLUSIONS
The work developed in this paper outlined a concept
language model using the Mesh thesaurus for repre-
senting the semantic content of medical articles.
Our proposed conceptual indexing approach con-
sists of three main steps. At the first step (Pretreat-
ment), being given an article, MeSH thesaurus and
the NLP tools, the system extracts two sets: the first
is the article’s lemma, and the second is the list of
lemma existing in the the MeSH thesaurus. At step
2, these sets are used in order to extract the Mesh
concepts existing in the document. After that, our
system interpret the relevance of a document d to a
MeSH descriptor des by measuring the probability
of this descriptor to be generated by a document lan-
guage (P(d|des
i
)). Finally, the MeSH descriptors are
rankeded by decreasing score P(d|des
i
).
We can thus summarize our major contribution by:
We evaluated the methods using three measures: pre-
cision, recall and F-measure. Our experimental eval-
uation shows the effectiveness of our approach.
REFERENCES
Ambroziak J. (1997). Conceptually assisted web brows-
ing. In Sixth International World Wide Web confer-
ence, Santa Clara.
Aronson A. (2001). Effective mapping of biomedical text
to the umls metathesaurus: the metamap program. In
AMIA, pages 17–21.
Aronson A., J. Mork, C. Gay, S. Humphrey and W. Rogers
(2004). The nlm indexing initiative’s medical text in-
dexer. In Medinfo.
Baziz M. (2006). Indexation conceptuelle guid ´ee par on-
tologie pour la recherche d’information. PhD thesis,
Univ. of Paul sabatier.
Cunningham M., D. Maynard, K. Bontcheva and V. Tablan
(2002). Gate: A framework and graphical develop-
ment environment for robust nlp tools and applica-
tions. ACL.
Gamet J. (1998). Indexation de pages web. Report of dea,
universit de Nantes.
Gonzalo J., F. Verdejo, I. Chugur and J. Cigarran (1998).
Indexing with wordnet synsets can improve text re-
trieval. In COLING-ACL ’98 Workshop on Usage of
Word.Net in Natural Language Processing Systems,
Montreal, Canada.
Hiemstra D. (2001). Using Language Models for Informa-
tion Retrieval. PhD thesis, University of Twente.
Jin R., A. G. Hauptman and C. Zhai (2002). Title language
model for information retrieval. In SIGIR02, pages
42–48.
Khan L. (2000). Ontology-based Information Selection.
PhD thesis, Faculty of the Graduate School, Univer-
sity of Southern California.
Kim W., A. Aronson and W. Wilbur (2001). Automatic
mesh term assignment and quality assessment. In
AMIA.
Lafferty J. and Zhai C. (2001). Document language models,
query models, and risk minimization for information
retrieval. In SIGIR’01, pages 111–119.
Lenoir P., R. Michel, C. Frangeul and G. Chales (1981).
R ´ealisation, d ´eveloppement et maintenance de la base
de donn ´ees a.d.m. In M ´edecine informatique.
Majdoubi J, M. Tmar and F. Gargouri (2009). Using the
mesh thesaurus to index a medical article:combination
of content, structure and semantics. In Knowledge-
Based and Intelligent Information and Engineering
Systems, 13th International Conference, KES’2009,
page 278285.
Mauldin M. L. (1991). Retrieval performance in ferret: a
conceptual information retrieval system. In lSth In-
ternational A CM-SIGIR Conference on Research and
Development in Information Retrieval, pages 347–
355, Chicago.
Mihalcea D. and Moldovan I. (2000). An iterative ap-
proach to word sense disambiguation. In FLAIRS-
2000, pages 219–223, Orlando,.
Muller H., E. Kenny and P. Sternberg (2004). Textpresso:
An ontology-based information retrieval and extrac-
tion system for biological literature. In PLoS Biol.
N ´ev ´eol A. (2005). Automatisation des taches documen-
taires dans un catalogue de sant ´e en ligne. PhD thesis,
Institut National des Sciences Appliques de Rouen.
N ´ev ´eol A., Mary V., A. Gaudinat, C. Boyer, Rogozan A.
and S. Darmoni (2005). A benchmark evaluation of
the french mesh indexers. In 10th Conference on Ar-
tificial Intelligence in Medicine, AIME 2005.
N ´ev ´eol A., S. Pereira, G. Kerdelhu, B. Dahamna, M. Jou-
bert, and S. Darmoni (2007). Evaluation of a simple
method for the automatic assignment of mesh descrip-
tors to health resources in a french online catalogue. In
MedInfo.
Ponte M. and Croft W. (1998). A language modeling ap-
proach to information retrieval. In ACM-SIGIR Con-
ference on Research and Development in Information
Retrieval, pages 275–281.
THESAURUS BASED SEMANTIC REPRESENTATION IN LANGUAGE MODELING FOR MEDICAL ARTICLE
INDEXING
73