Annotating Cohesive Statements of Anatomical Knowledge Toward Semi-automated Information Extraction

Kazuo Hara, Ikumi Suzuki, Kousaku Okubo, Isamu Muto

2014

Abstract

Anatomical knowledge written in a textbook is almost completely unreusable computationally, because it is embedded in a cohesive discourse. In discourse contexts, the frequent use of cohesive ties such as reference expressions and coordinated phrases not only troubles the function of automated systems (i.e., natural language parsers) to extract knowledge from the resulting complicated sentences, but also affects the identification of mentions of anatomical named entities (NEs). We propose to revamp the prose style of anatomical textbooks by transforming cohesive discourse into itemized text, which can be accomplished by annotating reference expressions and coordinating conjunctions. Then, automatically, each anaphor will be replaced by its antecedent in each reference expression, and the conjoined elements are distributed to sentences duplicated for each coordinating conjunction connecting phrases. We demonstrate that, compared to the original text, the transformed one is easy for machines to process and hence convenient as a way of identifying mentions of anatomical NEs and their relations. Since the transformed text is human readable as well, we believe our approach provides a promising new model for language resources accessible by both human and machine, improving the computational reusability of textbooks.

References

  1. Abe, S., Inui, K., Hara, K., Morita, H., Sao, C., Eguchi, M., Sumida, A., Murakami, K., and Matsuyoshi, S. (2011). Mining personal experiences and opinions from web documents. Web Intelligence and Agent Systems, 9(2):109-121.
  2. Aramaki, E., Maskawa, S., and Morita, M. (2011). Twitter catches the flu: Detecting influenza epidemics using twitter. In EMNLP, pages 1568-1576.
  3. Augenstein, I., Padó, S., and Rudolph, S. (2012). Lodifier: Generating linked data from unstructured text. In Proceedings of the 9th Extended Semantic Web Conference (ESWC), pages 210-224.
  4. Etzioni, O., Banko, M., Soderland, S., and Weld, D. S. (2008). Open information extraction from the web. Commun. ACM, 51(12):68-74.
  5. Gray, H. (1918). Anatomy of the Human Body. Philadelphia: Lea & Febiger, 20 edition.
  6. Hara, K., Shimbo, M., Okuma, H., and Matsumoto, Y. (2009). Coordinate structure analysis with global structural constraints and alignment-based local features. In ACL-IJCNLP, pages 967-975, Suntec, Singapore. Association for Computational Linguistics.
  7. Kim, J.-D., Ohta, T., Tateisi, Y., and Tsujii, J. (2003). Genia corpus - a semantically annotated corpus for biotextmining. In ISMB (Supplement of Bioinformatics), pages 180-182.
  8. Marcus, M. P., Marcinkiewicz, M. A., and Santorini, B. (1993). Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2):313-330.
  9. Ng, V. (2010). Supervised noun phrase coreference research: The first fifteen years. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1396-1411, Uppsala, Sweden. Association for Computational Linguistics.
  10. Riedel, S., Yao, L., McCallum, A., and Marlin, B. M. (2013). Relation extraction with matrix factorization and universal schemas. In NAACL-HLT, pages 74-84, Atlanta, Georgia. Association for Computational Linguistics.
  11. Rosse, C. and Mejino, J. L. V. (2008). The Foundational Model of Anatomy Ontology Anatomy Ontologies for Bioinformatics. In Burger, A., Davidson, D., and Baldock, R., editors, Anatomy Ontologies for Bioinformatics, volume 6 of Computational Biology, chapter 4, pages 59-117. Springer London, London.
  12. Schäfer, U., Spurk, C., and Steffen, J. (2012). A fully coreference-annotated corpus of scholarly papers from the acl anthology. In COLING (Posters), pages 1059-1070.
  13. Witte, S. P. and Faigley, L. (1981). Coherence, cohesion, and writing quality. College Composition and Communication, 32:189-204.
Download


Paper Citation


in Harvard Style

Hara K., Suzuki I., Okubo K. and Muto I. (2014). Annotating Cohesive Statements of Anatomical Knowledge Toward Semi-automated Information Extraction . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 342-347. DOI: 10.5220/0005132303420347


in Bibtex Style

@conference{kdir14,
author={Kazuo Hara and Ikumi Suzuki and Kousaku Okubo and Isamu Muto},
title={Annotating Cohesive Statements of Anatomical Knowledge Toward Semi-automated Information Extraction},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},
year={2014},
pages={342-347},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005132303420347},
isbn={978-989-758-048-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - Annotating Cohesive Statements of Anatomical Knowledge Toward Semi-automated Information Extraction
SN - 978-989-758-048-2
AU - Hara K.
AU - Suzuki I.
AU - Okubo K.
AU - Muto I.
PY - 2014
SP - 342
EP - 347
DO - 10.5220/0005132303420347