Authors:
Ying-Chi Lin
1
;
Victor Christen
1
;
Anika Groß
2
;
Toralf Kirsten
3
;
4
;
Silvio Domingos Cardoso
5
;
6
;
Cédric Pruski
5
;
Marcos Da Silveira
5
and
Erhard Rahm
1
Affiliations:
1
Department of Computer Science, Leipzig University, Germany
;
2
Department Computer Science and Languages, Anhalt University of Applied Sciences in Koethen/Anhalt, Germany
;
3
Faculty Applied Computer Sciences and Biosciences, Mittweida University of Applied Sciences, Germany
;
4
LIFE Research Centre for Civilization Diseases, Leipzig University, Germany
;
5
LIST, Luxembourg Institute of Science and Technology, Luxembourg
;
6
LRI, University of Paris-Sud XI, France
Keyword(s):
Cross-lingual Semantic Annotation, Medical Forms, UMLS, Machine Translation.
Abstract:
Annotating documents or datasets using concepts of biomedical ontologies has become increasingly important. Such ontology-based semantic annotations can improve the interoperability and the quality of data integration in health care practice and biomedical research. However, due to the restrictive coverage of non-English ontologies and the lack of comparably good annotators as for English language, annotating non-English documents is even more challenging. In this paper we aim to annotate medical forms in German language. We present a parallel corpus where all medical forms are in both German and English languages. We use three annotators to automatically generate annotations and these annotations are manually verified to construct an English Silver Standard Corpus (SSC). Based on the parallel corpus of German and English documents and the SSC, we evaluate the quality of different annotation approaches, mainly 1) direct annotation using German corpus and German ontologies and 2) inte
grating machine translators to translate German corpus and annotate the translated corpus with English ontologies. The results show that using German ontologies only produces very restricted results, whereas translation achieves better annotation quality and is able to retain almost 70% of the annotations.
(More)