demanding for IR systems and we intend to conduct
these experiments in the near future.
7 CONCLUSIONS
The notion of Entropy on Ontology, introduced
above, involves a topology of entities in a
topological space. This feature was realized through
a weight extension on the semantic similarity cover
as a connected component on ontology and can be
used as a pattern to similarly define entropy for
entities from other topological spaces to formalize
some semantics like similarity, closeness, or
correlation between entities. This new notion can be
used to measure information in a message or
collection of entities when we know weights of
entities that compose a message and, in addition,
how entities “semantically” relate to each other in a
topological space.
The quality of the presented algorithm that
allows us to estimate Entropy on Ontology and the
State of the Document depends entirely on the
correctness and sufficiency of the hierarchical
thesaurus on which it is based. As mentioned earlier,
there are many thesauruses and their maintenance
and evolution are vital for the proper functioning of
such algorithms. The world also has acquired a great
deal of knowledge in different forms, like
dictionaries, and it is very important to convert them
into a hierarchy to be used for the proper
interpretation of texts that contain special topics.
The minimum that defines Entropy on Ontology
and the State of the Document may not be unique or
there may be multiple local minima. For developing
approximations it is important to find conditions on
ontology or terms topology under which the
minimum is unique.
Current release of AIAS uses MeSH Descriptors
vocabulary and WordWeb Pro general purpose
thesaurus in electronic form to select terms from
ontology using words from a document. Many
misunderstandings of documents by AIAS that were
automatically caught were the result of
insufficiencies of these sources when processing
MEDLINE abstracts. The next release will integrate
the whole MeSH thesaurus, Descriptors, Qualifiers,
and Supplementary Concept Records, to make AIAS
more educated regarding the subject of chemistry.
Also, any additional thesaurus made available
electronically would be integrated into AIAS.
The algorithm that was presented in Section 5
was only tested on the MEDLINE database and
MeSH ontology. Its implementation does not depend
on a particular indexing thesaurus or ontology and it
would be interesting to try it on other existing text
corpora and appropriate ontology such as WordNet
(http://wordnet.princeton.edu) or others.
REFERENCES
Agrawal, R., Chakrabarti, S., Dom, B.E., Raghavan, P.,
2001. Multilevel taxonomy based on features derived
from training documents classification using fisher
values as discrimination values. United State Patent
6,233,575.
Aronson, A.R., Mork, J.G., Gay, C.W., Humphrey, S.M.,
Rogers, W.J., 2004. The NLM indexing initiative’s
Medical Text Indexer, Stud Health Technol Inform
107 (Pt 1), pp. 268–272.
Calmet, J., Daemi, A., 2004. From entropy to ontology.
Fourth International Symposium "From Agent Theory
to Agent Implementation", R. Trappl, Ed., vol. 2, pp.
547 – 551.
Cho, M., Choi, C., Kim, W., Park, J., Kim, P., 2007.
Comparing Ontologies using Entropy. 2007
International Conference on Convergence Information
Technology, Korea, 873-876.
Grobelnik, M., Brank, J., Fortuna, B., Mozetič, I., 2008.
Contextualizing Ontologies with OntoLight: A
Pragmatic Approach. Informatica 32, 79–84.
Guseynov, Y., 2009. XML Processing. No Parsing.
Proceedings WEBIST 2009 - 5th International
Conference on Web Information Systems and
Technologies, INSTICC, Lisbon, Portugal, pp. 81 –
84.
Klein, D., Manning, C.D., 2003. Accurate Unlexicalized
Parsing. Proceedings of the 41st Meeting of the
Association for Computational Linguistics, pp. 423-
430.
Lee, J.H., Kim, M.H., Lee, Y.J., 1993. Information
retrieval based on conceptual distance in IS-A
hierarchies. Journal of Documentation, 49(2):188-207,
June.
Lindberg, D.A.B., Humphreys, B.L., McCray, A.T., 1993.
The Unified Medical Language System. Methods of
Information in Medicine, 32(4): 281-91.
Manning, C.D., Schütze, H., 1999. Foundations of
Statistical Natural Language Processing. The MIT
Press.
Manning, C. D., Raghavan, P., Schütze, H., 2008.
Introduction to Information Retrieval. Cambridge
University Press.
Medelyan, O., Witten, I.H., 2006a. Thesaurus Based
Automatic Keyphrase Indexing. JCDL’06, June 11–
15, Chapel Hill, North Carolina, USA.
Medelyan, O., Witten, I.H., 2006b. Measuring Inter-
Indexer Consistency Using a Thesaurus. JCDL’06,
June 11–15, Chapel Hill, North Carolina, USA.
MEDLINE
®
, Medical Literature, Analysis, and Retrieval
System Online. http://www.nlm.nih.gov/databases/
databases_medline.html.
ENTROPY ON ONTOLOGY AND INDEXING IN INFORMATION RETRIEVAL
565