Chemical Named Entity Recognition with Deep Contextualized Neural Embeddings

Zainab Awan, Tim Kahlke, Peter Ralph, Paul Kennedy

Abstract

Chemical named entity recognition (ChemNER) is a preliminary step in chemical information extraction pipelines. ChemNER has been approached using rule-based, dictionary-based, and feature-engineered based machine learning, and more recently also deep learning based methods. Traditional word-embeddings, like word2vec and Glove, are inherently problematic because they ignore the context in which an entity appears. Contextualized embeddings called embedded language models (ELMo) have been recently introduced to represent contextual information of a word in its embedding space. In this work, we quantify the impact of contextualized embeddings for ChemNER by using Bi-LSTM-CRF (bidirectional long short term memory networks - conditional random fields) networks. We benchmarked our approach using four well-known corpora for chemical named entity recognition. Our results show that incorporation of ELMo results in statistically significant improvements in F1 score in all of the tested datasets.

Download


Paper Citation