Authors:
Agnieszka Mykowiecka
and
Malgorzata Marciniak
Affiliation:
Polish Academy of Sciences, Poland
Keyword(s):
Terminology Extraction, Term Clustering, Medical Data, Ontology.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Domain Analysis and Modeling
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Natural Language Processing
;
Pattern Recognition
;
Symbolic Systems
Abstract:
The paper presents the first results of clustering terms extracted from hospital discharge documents written in Polish. The aim of the task is to prepare data for an ontology reflecting the domain of documents. To begin, the characteristic of the language of texts, which differs significantly from general Polish, is given. Then, we describe the method of term extraction. In the process of finding related terms, we use lexical and syntactical information. We define term similarity based on: term contexts; coordinated sequences of terms; words that are parts of terms, e.g. their heads and modifiers. Then we performed several experiments with hierarchical clustering of the 300 most frequent terms. Finally, we describe the results and present an evaluation that compares the results with manually obtained groups.