Clustering of Medical Terms based on Morpho-syntactic Features

Agnieszka Mykowiecka, Malgorzata Marciniak

Abstract

The paper presents the first results of clustering terms extracted from hospital discharge documents written in Polish. The aim of the task is to prepare data for an ontology reflecting the domain of documents. To begin, the characteristic of the language of texts, which differs significantly from general Polish, is given. Then, we describe the method of term extraction. In the process of finding related terms, we use lexical and syntactical information. We define term similarity based on: term contexts; coordinated sequences of terms; words that are parts of terms, e.g. their heads and modifiers. Then we performed several experiments with hierarchical clustering of the 300 most frequent terms. Finally, we describe the results and present an evaluation that compares the results with manually obtained groups.

References

  1. Amigó, E., Gonzalo, J., Artiles, J., and Verdejo, F. (2009). A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 12(5):613.
  2. Bagga, A. and Baldwin, B. (1998). Algorithms for scoring coreference chains. In LREC Workshop on Linguistics Coreference, pages 563-566.
  3. Baneyx, A., Charlet, J., and Jaulent, M.-C. (2006). Methodology to build medical ontology from textual resources. AMIA Annual Symposium proceedings, 2006:21-25.
  4. Cimiano, P. (2006) Ontology Lerning and Population from Text. pages 85-184. Springer.
  5. Fernndez, A. and Gmez, S. (2008). Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms. Journal of Classification, 25:43-65.
  6. Frantzi, K., Ananiadou, S., and Mima,2000). Automatic recognition of multi-word terms: the C-value/NCvalue method. Int. Journal on Digital Libraries, 3:115-130.
  7. Ittoo, A. and Maruster, L. (2009). Ensemble similarity measures for clustering terms. In Congres on Computer Science and Information Engineering, volume 4, pages 315-319.
  8. Le Moigno, S., Charlet, J., Bourigault, D., Degoulet, P., and Jaulent, M.-C. (2002). Terminology extraction from text to build an ontology in surgical intensive care. In Proceedings of the Workshop Machine Learning and Natural Language Processing for Ontology Engineering.
  9. Lin, D. and Pantel, P. (2001). Induction of semantic classes from natural language text. In KDD'01, pages 317- 322.
  10. Navigli, R., Velardi, P., and Gangemi, A. (2003). Ontology learning and its application to automated terminology translation. Intelligent Systems, IEEE, 18(1):22 - 31.
  11. Nenadic, G., Spasic, I., and Ananiadou, S. (2004). Automatic discovery of term similarities using pattern mining. Int. Journal of Terminology, 10(1):55-80.
  12. Nenadic, G., Spasic, I., and Ananiadou, S. (2006). Term clustering using a corpus-based similarity measure.
  13. In Sojka, P., Kopecek, I., and Pala, K., editors, Text, Speech and Dialogue, volume 2448 of LNCS, pages 89-109. Budapest, Hungary.
  14. Pedersen, T., Pakhomov, S. V., Patwardhan, S., and Chute, C. G. (2007). Measures of semantic similarity and relatedness in the biomedical domain. J. of Biomedical Informatics, 40(3):288-299.
  15. Piasecki, M. (2007). Polish tagger TaKIPI: Rule based construction and optimisation. Task Quarterly, 11(1- 2):151-167.
  16. Ushioda, A. (1996). Hierarchical clustering of words. In Proceedings of the 16th conference on Computational linguistics - Volume 2, COLING 7896, pages 1159- 1162, Stroudsburg, PA, USA. ACL.
  17. Woliski, M. (2006). Morfeusz-a Practical Tool for the Morphological Analysis of Polish. In Kopotek, M., Wierzcho, S., and Trojanowski, K., eds, Intelligent Information Processing and Web Mining, IIS:IIPWM'06, pages 503-512. Springer.
Download


Paper Citation


in Harvard Style

Mykowiecka A. and Marciniak M. (2012). Clustering of Medical Terms based on Morpho-syntactic Features . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2012) ISBN 978-989-8565-30-3, pages 214-219. DOI: 10.5220/0004137502140219


in Bibtex Style

@conference{keod12,
author={Agnieszka Mykowiecka and Malgorzata Marciniak},
title={Clustering of Medical Terms based on Morpho-syntactic Features},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2012)},
year={2012},
pages={214-219},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004137502140219},
isbn={978-989-8565-30-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2012)
TI - Clustering of Medical Terms based on Morpho-syntactic Features
SN - 978-989-8565-30-3
AU - Mykowiecka A.
AU - Marciniak M.
PY - 2012
SP - 214
EP - 219
DO - 10.5220/0004137502140219