Authors:
Mathieu Roche
1
;
2
;
Elena Arsevska
3
;
2
;
Sarah Valentin
1
;
3
;
2
;
4
;
5
;
Sylvain Falala
3
;
6
;
Julien Rabatel
7
and
Renaud Lancelot
3
;
2
Affiliations:
1
UMR TETIS (Land, Environment, Remote Sensing and Spatial Information), University of Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier, France
;
2
French Agricultural Research for Development (CIRAD), France
;
3
UMR ASTRE (Unit for Animals, Health, Territories, Risks and Ecosystems), University of Montpellier, CIRAD, INRAE, Montpellier, France
;
4
Department of Biology, University of Sherbrooke, Sherbrooke, Canada
;
5
Quebec Centre for Biodiversity Science, McGill University, Montreal, Canada
;
6
National Research Institute for Agriculture, Food and the Environment (INRAE), France
;
7
Freelance Data Scientist, Montpellier, France
Keyword(s):
Text Mining, Information Retrieval, Named Entity Recognition, Event-based Surveillance, Epidemic Intelligence.
Abstract:
The ability to rapidly detect outbreaks of emerging infectious diseases is a health priority of global health agencies. In this context, event-based surveillance (EBS) systems gather outbreak-related information from heterogeneous data sources, including online news articles. EBS systems, thus, increasingly marshal text-mining methods to alleviate the amount of manual curation of the freely available text. This paper documents the use of datasets obtained through an EBS system, PADI-Web (Platform for Automated extraction of Disease Information from the web), dedicated to digital outbreak detection in animal health. This paper describes the datasets used for improving 3 important tasks related to PADI-Web, i.e., news classification, information extraction and dissemination.