Authors:
Susmitha Wunnava
1
;
Xiao Qin
1
;
Tabassum Kakar
1
;
Xiangnan Kong
1
;
Elke A. Rundensteiner
1
;
Sanjay K. Sahoo
2
and
Suranjan De
2
Affiliations:
1
Worcester Polytechnic Institute, United States
;
2
U.S. Food and Drug Administration, United States
Keyword(s):
Pharmacovigilance, Adverse Drug Reaction, Class Imbalance, Ensemble Learning.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Health Information Systems
;
Pattern Recognition and Machine Learning
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Recognizing named entities in Adverse Drug Reactions narratives is a fundamental step towards extracting
valuable patient information from unstructured text into a structured thus actionable format. This then unlocks
advanced data analytics towards intelligent pharmacovigilance. Yet existing biomedical named entity
recognition (NER) tools are limited in their ability to identify certain entity types from these domain-specific
narratives and result in significant performance differences in terms of accuracy. To address these challenges,
we propose an ensemble approach that integrates a rich variety of named entity recognizers to procure the final
result. First, one critical problem faced by NER in the biomedical context is that the data is highly skewed.
That is, only 1% of words belong to a certain medical entity type, such as, the reason for medication usage
compared to all other non-reason words. We propose a balanced, under-sampled bagging strategy that is dependent
on th
e imbalance level to overcome the class imbalance problem. Second, we present an ensemble
of heterogeneous recognizers approach that leverages a novel ensemble combiner. Our experimental results
show that for biomedical text datasets: (i) a balanced learning environment along with an Ensemble of Heterogeneous
Classifiers constantly improves the performance over individual base learners and, (ii) stacking-based
ensemble combiner methods outperform simple Majority Voting by 0.30 F-measure.
(More)