Authors:
Eduardo Barçante
;
Milene Jezuz
;
Felipe Duval
;
Ernesto Caffarena
;
Oswaldo G. Cruz
and
Fabricio Silva
Affiliation:
Fundação Oswaldo Cruz, Brazil
Keyword(s):
Text Mining, Drug Repositioning, Clusters, PubMed Abstracts, Neglected Diseases, Ontology, Biomedical Technology, Drug Industry, Information Extraction, Semi-Structured.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining Text and Semi-Structured Data
;
Process Mining
;
Symbolic Systems
Abstract:
The current scenario of computational biology relies on the know-how of many technological areas, with focus on information, computing, and, particularly on the construction and use of existing Internet databases such as MEDLINE, PubMed and PDB. In recent years, these databases provide an environment to access, integrate and produce new knowledge by storing ever increasing volumes of genetic or protein data. The transformation and management of these data in a different way than from the one that were originally thought can be a challenge for research in biology. The problems appear by the lack of textual structure or appropriate markup tags. The main goal of this work is to explore the PubMed database, the main source of information about health sciences, from the National Library of Medicine. By means of this database of digital textual documents, we aim to develop a method capable of identifying protein terms that will serve as a substrate to laboratory practices for repositioni
ng drugs. In this perspective, in this work we use text mining to extract terms related to protein names in the field of neglected diseases.
(More)