Identifying Drug Repositioning Targets using Text Mining
Eduardo Barçante, Milene Jezuz, Felipe Duval, Ernesto Caffarena, Oswaldo G. Cruz, Fabricio Silva
2014
Abstract
The current scenario of computational biology relies on the know-how of many technological areas, with focus on information, computing, and, particularly on the construction and use of existing Internet databases such as MEDLINE, PubMed and PDB. In recent years, these databases provide an environment to access, integrate and produce new knowledge by storing ever increasing volumes of genetic or protein data. The transformation and management of these data in a different way than from the one that were originally thought can be a challenge for research in biology. The problems appear by the lack of textual structure or appropriate markup tags. The main goal of this work is to explore the PubMed database, the main source of information about health sciences, from the National Library of Medicine. By means of this database of digital textual documents, we aim to develop a method capable of identifying protein terms that will serve as a substrate to laboratory practices for repositioning drugs. In this perspective, in this work we use text mining to extract terms related to protein names in the field of neglected diseases.
References
- Belloze, K.T. 2013. Priorização de alvos para fármacos no combate a Doenças Tropicais Negligenciadas causadas por protozoários. FIOCRUZ/IOC. (In Portuguese) [PhD thesis]
- Berners-Lee T., 1990. Information Management: A proposal. available from: http://www.w3.org/ History/1989/proposal.html [Accessed June 2014]
- BRASIL. 2010. Ministry of Health. Neglected diseases: the strategies of the Brazilian Ministry of Health. In Journal of the Public Health. 2010;44(1):200-202.
- Campos, M.L.A., Campos, M.L.M. 2009. METHODOLOGICAL ASPECTS ON ONTOLOGY REUSE: a study on the domain of trypanosomatids. RECIIS - In R. Eletr. de Com. Inf. Inov. Saúde. Rio de Janeiro, v.3, n.1, p.64-75, mar., 2009. online: www.reciis.cict.fiocruz.br DOI: 10.3395/reciis.v3i1. 243en
- DECS. 2014. Health Sciences Descriptors In Virtual Health Library (VHL). available from http://decs. bvsalud.org. [Accessed 08/08/2014]
- DNDi. 2010. Drugs for neglected diseases initiative available from: http://www.dndi.org.br/pt/doencasnegligenciadas. [Accessed 08/08/2014]
- Feinerer, I., 2008. A text mining framework in R and its applications. [PhD thesis]. Vienna. Department of Statistics and Mathematics, Vienna University of Economics and Business Administration.
- Feinerer, I., 2014. Text Mining Package available from: http://cran.r-project.org
- Feldman, R., Aumann, Y., Zilberstein, A., Ben-Yehuda, Y., 2002. Mining biomedical literature using information extraction. In Current Drug Discovery, Volume2, Issue 10, pages 19-23,October 2002.
- Gadelha, C.A.G., Quental C., Fialho B.C., 2003. Health and innovation: a systemic approach in health industries. In Reports in Public Health.
- Haupt, V.J.; Schroeder, M. 2011. Old Friends in New Guise:Repositioning of Known Drugs with Structural Bioinformatics. In Brief. Bioinform. 12, 312-326. doi:10.1093/bib/bbr011
- INCT. 2014. National Institute for Science and Technology on Innovation on Neglected Diseases (INCT/IDN). Neglected Diseases. online: http://www. cdts.fiocruz.br[Accessed 08/08/2014]
- Jardim, R., 2013. Estudo de reposicionamento de fármacos para Doenças Negligenciadas causadas por protozoários através da integração de bases de dados biológicas usando web semântica. FIOCRUZ/IOC. (In Portuguese) [PhD thesis] online: http://arca.icict.fiocruz.br/handle/icict/7027
- Jezuz, M.P.G., 2013. Scientific text mining aiming at the identification of bioactive compounds with therapeutical potential against Chagas disease, dengue and malaria. FIOCRUZ/IOC [PhD thesis]
- Lancaster, F. W., 1986. Vocabulary control for information retrieval. 2nd ed. Arlington, In VA: Information Resources Press.
- Markus L.M., 2001. Toward a theory of knowledge reuse: Types of knowledge reuse situations and factors in reuse success. In Journal of management information systems, 2001;18(1):57-93.
- NLM. 2014. MedLine® PubMed® XML Element Descriptions and their Attributes available from: http://www.nlm.nih.gov/bsd/licensee/elements_descrip tions.html [Accessed June 2014]
- OBO. 2014. The Open Bilogical and Biomedical Ontologies.. online: www.obofoundry.org. [Accessed 08/08/2014]
- PDB. 2014. Protein Data Bank. online: www.rcsb.org
- PIR. 2014. Protein Information Resouce. online: pir.georgetown.edu/pro/pro.shtml
- PubMed, 2014. National Center for Biotechnology Information online: http://www.ncbi.nlm.nih.gov /pubmed. [Accessed 08/08/2014]
- R Foundation, 2002. The R project for statistical computing. online: http://www.r-project.org/ [Accessed 08/08/2014]
- Salton, G., Wang, A., Yang, C.S., 1975. A vector space model for information retrieval. Communications of the ACM. 1975;18(11):613-620.
- Schmitt, T., Messina, D.N., Schreiber, F., Sonnhammer E.L.., 2011. Letter to the editor: SeqXML and OrthoXML: standards for sequence and orthology information. In Brief. Bioinform. 2011;12:485-488.
- Swanson, D.R., 1986. Fish-oil, Raynaud's syndrome and undiscovered public knowledge. In Perspectives in biology and medicine, 1986;30(1):7-18.
- Swanson, D.R., 1987. Two medical literatures that are logically but not bibliographically connected. In Journal of the American society for information science. 1987;38(4):228-333.
- Swanson, D.R., Smalheiser, N.R., Torvik, V.I., 2006. Ranking indirect connections in literature based discovery. The role of Medical Subject Headings. In Journal of the American Society for Information Science and Technology, 2006;57(11):1427-39.
- Torres, L.B. 2014. UFRJ. available from http://www. portaldosfarmacos.ccs.ufrj.br/atualidades_profwermut h.html [Accessed 08/08/2014]
- UniProt Consortium, 2013. Update on activities at the Universal Protein Resource (UniProt) in 2013. In Nucleic Acids Res. 2013;41:D43-D47.
- United Nations, 2000. Millennium development goals and beyond 2015. online: http://www.un.org/millennium goals/.[Accessed 08/08/2014]
- WHO, 2013. Drugs for Neglected Diseases initiative online: http://www.who.int
- WHO, 2010. World Health Organization. First WHO report on neglected tropical diseases: working to overcome the global impact of neglected tropical diseases. Geneva; 2010. online http://whqlibdoc. who.int/publications [Accessed 08/08/2014]
- Witten I.H, Don K.J, Dewsnip, M., Tablan V., 2004. Text mining in a digital library. In Int J Digit Libr Journal. 2004;4:56-9.
Paper Citation
in Harvard Style
Barçante E., Jezuz M., Duval F., Caffarena E., G. Cruz O. and Silva F. (2014). Identifying Drug Repositioning Targets using Text Mining . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 348-353. DOI: 10.5220/0005134903480353
in Bibtex Style
@conference{kdir14,
author={Eduardo Barçante and Milene Jezuz and Felipe Duval and Ernesto Caffarena and Oswaldo G. Cruz and Fabricio Silva},
title={Identifying Drug Repositioning Targets using Text Mining},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},
year={2014},
pages={348-353},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005134903480353},
isbn={978-989-758-048-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - Identifying Drug Repositioning Targets using Text Mining
SN - 978-989-758-048-2
AU - Barçante E.
AU - Jezuz M.
AU - Duval F.
AU - Caffarena E.
AU - G. Cruz O.
AU - Silva F.
PY - 2014
SP - 348
EP - 353
DO - 10.5220/0005134903480353