Authors:
Zainab Awan
1
;
Tim Kahlke
2
;
Peter J. Ralph
2
and
Paul J. Kennedy
1
Affiliations:
1
School of Computer Science, University of Technology Sydney, Sydney and Australia
;
2
Climate Change Cluster, University of Technology Sydney, Sydney and Australia
Keyword(s):
Named Entity Recognition, Deep Learning, Word Representation, BiLSTM.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
BioInformatics & Pattern Discovery
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Symbolic Systems
Abstract:
Chemical named entity recognition (ChemNER) is a preliminary step in chemical information extraction pipelines. ChemNER has been approached using rule-based, dictionary-based, and feature-engineered based machine learning, and more recently also deep learning based methods. Traditional word-embeddings, like word2vec and Glove, are inherently problematic because they ignore the context in which an entity appears. Contextualized embeddings called embedded language models (ELMo) have been recently introduced to represent contextual information of a word in its embedding space. In this work, we quantify the impact of contextualized embeddings for ChemNER by using Bi-LSTM-CRF (bidirectional long short term memory networks - conditional random fields) networks. We benchmarked our approach using four well-known corpora for chemical named entity recognition. Our results show that incorporation of ELMo results in statistically significant improvements in F1 score in all of the tested datasets.