A NEW LATENT SEMANTIC ANALYSIS BASED METHODOLOGY FOR KNOWLEDGE EXTRACTION FROM BIOMEDICAL LITERATURE AND BIOLOGICAL PATHWAYS DATABASES
F. Abate, A. Acquaviva, E. Ficarra, E. Macii
2011
Abstract
Nowadays, a considerable amount of genetic and biomedical studies are mostly diffused on theWeb and freely available. This exciting capability, if from one side opens the way to new scenarios of cooperating research, on the other side makes the knowledge retrieval and extraction an extremely time consuming operation. In this context, the development of new tools and algorithms to automatically support the scientist activity to achieve a reliable interpretation of the complex interactions among biological entities is mandatory. In this paper we present a new methodology aimed at quantifying the biological degree of correlation among biomedical terms present in literature. The proposed method overcomes the limitation of current tools based on public literature information only, by exploiting the trustworthy information provided by biological pathways databases. We demonstrate how to integrate trusted pathway information in a semantic correlation extraction chain based on UMLS Metathesaurus and relying on PubMed as literature database. The effectiveness of the obtained results remarks the importance of automatically quantifying the degree of correlation among biomedical terms in order to helpfully support the scientist research activity.
References
- Abate, F., Ficarra, E., Acquaviva, A., and Macii, E. (2010). An automated tool for scoring biomedical terms correlation based on semantic analysis. In International Conference on Complex, Intelligent and Software Intensive Systems.
- Aronson, A. R. (2001). Effective mapping of biomedical text to the umls. metathesaurus: The metamap program. In AMIA Fall Symposium.
- BioPAX (2007). Biological pathways exchange. http://www.biopax.org.
- Bodenreider, O. (2004). The unified medical language system (umls): integrating biomedical terminology. Nucleic Acids Research.
- Cerami, E. G., Bader, G. D., Gross, B. E., and Sander, C. (2006). cpath: open source software for collecting, storing, and querying biological pathways. In Bioinformatics.
- Chakraborti, S., Mukras, R., Lothian, R., Wiratunga, N., Watt, S., and Harper, D. (2006). Sprinkling: Supervised latent semantic indexing. Advances in Information Retrieval.
- Doms, A. and Schroeder, M. (2005). Gopubmed: exploring pubmed with the gene ontology. Nucleic Acids Research.
- Gliozzo, A. M. and Strapparava, C. (2005.). Domain kernels for text categorization. In Ninth Conference on Computational Natural Language Learning.
- Hermjakob, H. et al. (2004). The hupo psi's molecular interaction format-a commu- nity standard for the representation of protein interaction data. Natural Biotechnology.
- Hill, D. P., Smith, B., McAndrews-Hill, M. S., and Blake, J. A. (2008). Gene ontology annotations: what they mean and where they come from. In Bioinformatics.
- Kanehisa, M. and Goto, S. (1999). Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research.
- MeSH (2005). Medical subject headings (mesh) fact sheet. National Library of Medicine.
- Pathway Commons (2007). Pathway http://www.pathwaycommons.org.
- Plake, C., Royer, L., Winnenburg, R., Hakenberg, J., and Schroeder, M. (2009). Gogene: gene annotation in the fast lane. Nucleic Acids Research.
- Romero, P., Wagg, J., Green, M. L., Kaiser, D., Krummenacker, M., and Karp, P. D. (2004). Computational prediction of human metabolic pathways from the complete human genome. Genome Biology.
- Stark, C., Breitkreutz, B. J., Reguly, T., Boucher, L., Breitkreutz, A., and Tyers, M. (2006). Biogrid: a general repository for interaction datasets. Nucleic Acids Research.
Paper Citation
in Harvard Style
Abate F., Acquaviva A., Ficarra E. and Macii E. (2011). A NEW LATENT SEMANTIC ANALYSIS BASED METHODOLOGY FOR KNOWLEDGE EXTRACTION FROM BIOMEDICAL LITERATURE AND BIOLOGICAL PATHWAYS DATABASES . In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011) ISBN 978-989-8425-36-2, pages 66-74. DOI: 10.5220/0003171400660074
in Bibtex Style
@conference{bioinformatics11,
author={F. Abate and A. Acquaviva and E. Ficarra and E. Macii},
title={A NEW LATENT SEMANTIC ANALYSIS BASED METHODOLOGY FOR KNOWLEDGE EXTRACTION FROM BIOMEDICAL LITERATURE AND BIOLOGICAL PATHWAYS DATABASES },
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011)},
year={2011},
pages={66-74},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003171400660074},
isbn={978-989-8425-36-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2011)
TI - A NEW LATENT SEMANTIC ANALYSIS BASED METHODOLOGY FOR KNOWLEDGE EXTRACTION FROM BIOMEDICAL LITERATURE AND BIOLOGICAL PATHWAYS DATABASES
SN - 978-989-8425-36-2
AU - Abate F.
AU - Acquaviva A.
AU - Ficarra E.
AU - Macii E.
PY - 2011
SP - 66
EP - 74
DO - 10.5220/0003171400660074