Text Mining Technologies for Database Curation
Fabio Rinaldi
2014
Abstract
Text mining technologies, coupled with advanced user interfaces, have a great potential in the life sciences, for example supporting the process of database curation. We present a system which has achieved competitive results in several community-organized evaluations of text mining technologies and we discuss how such technologies can be integrated in a curation workflow.
References
- Androutsopoulos, I. (2013). A challenge on large-scale biomedical semantic indexing and question answering. In BioNLP workshop (part of the ACL Conference).
- Arighi, C., Roberts, P., Agarwal, S., Bhattacharya, S., Cesareni, G., Chatr-aryamontri, A., Clematide, S., Gaudet, P., Giglio, M., Harrow, I., Huala, E., Krallinger, M., Leser, U., Li, D., Liu, F., Lu, Z., Maltais, L., Okazaki, N., Perfetto, L., Rinaldi, F., Saetre, R., Salgado, D., Srinivasan, P., Thomas, P., Toldo, L., Hirschman, L., and Wu, C. (2011). Biocreative iii interactive task: an overview. BMC Bioinformatics, 12(Suppl 8):S4.
- Aronson, A. R. and Lang, F. M. (2010). An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc, 17(3):229-236.
- Bairoch, A. (2009). The future of annotation/biocuration. Nature Precedings.
- Campos, D., Matos, S., and Oliveira, J. L. (2013). Gimli: open source and high-performance biomedical name recognition. BMC Bioinformatics, 14:54.
- Cohen, K. B., Demner-Fushman, D., Ananiadou, S., Pestian, J., Tsujii, J., and Webber, B., editors (2009). Proceedings of the BioNLP 2009 Workshop. Association for Computational Linguistics, Boulder, Colorado.
- Davis, A., King, B., Mockus, S., Murphy, C., SaraceniRichards, C., Rosenstein, M., Wiegers, T., and Mattingly, C. (2011). The comparative toxicogenomics database: update 2011. Nucleic Acids Res., 39(Database issue):D1067-72.
- Gama-Castro, S., Rinaldi, F., Lpez-Fuentes, A., BalderasMartnez, Y. I., Clematide, S., Ellendorff, T. R., and Collado-Vides, J. (2013). Assisted curation of growth conditions that affect gene expression in e. coli k-12. In Proceedings of the Fourth BioCreative Challenge Evaluation Workshop, volume 1, pages 214-218.
- Gama-Castro, S., Rinaldi, F., Lpez-Fuentes, A., BalderasMartnez, Y. I., Clematide, S., Ellendorff, T. R., Santos-Zavaleta, A., Marques-Madeira, H., and Collado-Vides, J. (2014). Assisted curation of regulatory interactions and growth conditions of OxyR in E. coli K-12. Database: The Journal of Biological Databases and Curation, bau049.
- , Hoffmann, R. and Valencia, A. (2004). A gene network for navigating the literature. Nature Genetics, 36:664.
- Jonquet, C., Shah, N. H., and Musen, M. A. (2009). The open biomedical annotator. Summit on Translat Bioinforma, 2009:56-60.
- Kim, J., Pezik, P., and Rebholz-Schuhmann, D. (2008). Medevi: Retrieving textual evidence of relations between biomedical concepts from medline. Bioinformatics, 24(11):1410-1412.
- Kim, J., Pyysalo, S., Ohta, T., Bossy, R., Nguyen, N., and Tsujii, J. (2011). Overview of bionlp shared task 2011. ACL HLT 2011, page 1.
- Rebholz-Schuhmann, D., Arregui, M., Gaudan, S., Kirsch, H., and Jimeno, A. (2008). Text processing through Web services: calling Whatizit. Bioinformatics, 24(2):296-298.
- Rinaldi, F., Clematide, S., and Hafner, S. (2012b). Ranking of ctd articles and interactions using the ontogene pipeline. In Proceedings of the 2012 BioCreative workshop, Washington D.C.
- Rinaldi, F., Clematide, S., Hafner, S., Schneider, G., Grigonyte, G., Romacker, M., and Vachon, T. (2013a). Using the OntoGene pipeline for the triage task of BioCreative 2012. The Journal of Biological Databases and Curation, Oxford Journals.
- Rinaldi, F., Gama-Castro, S., Lpez-Fuentes, A., BalderasMartnez, Y., and Collado-Vides, J. (2013b). Digital curation experiments for regulondb. In BioCuration 2013, April 10th, Cambridge, UK.
- Rinaldi, F., Kaljurand, K., and Saetre, R. (2011). Terminological resources for text mining over biomedical scientific literature. Journal of Artificial Intelligence in Medicine, 52(2):107-114.
- Rinaldi, F., Kappeler, T., Kaljurand, K., Schneider, G., Klenner, M., Clematide, S., Hess, M., von Allmen, J.-M., Parisot, P., Romacker, M., and Vachon, T. (2008). OntoGene in BioCreative II. Genome Biology, 9(Suppl 2):S13.
- Rinaldi, F., Schneider, G., Kaljurand, K., Clematide, S., Vachon, T., and Romacker, M. (2010). OntoGene in BioCreative II.5. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(3):472-480.
- Sangkuhl, K., Berlin, D. S., Altman, R. B., and Klein, T. E. (2008). PharmGKB: Understanding the effects of individual genetic variants. Drug Metabolism Reviews, 40(4):539-551. PMID: 18949600.
- Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., and Chute, C. G. (2010). Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc, 17(5):507-513.
- Schneider, G., Kaljurand, K., Rinaldi, F., and Kuhn, T. (2007). Pro3Gres parser in the CoNLL domain adaptation shared task. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 1161-1165, Prague.
- Segura-Bedmar, I., Martnez, P., and Snchez-Cisneros, D. (2011). The 1st ddi extraction-2011 challenge task: Extraction of drug-drug interactions from biomedical texts. In Proc DDI Extraction-2011 challenge task, pages 1-9, Huelva, Spain.
- Sun, W., Rumshisky, A., and Uzuner, O. (2013). Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc, 20(5):806-813.
Paper Citation
in Harvard Style
Rinaldi F. (2014). Text Mining Technologies for Database Curation . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2014) ISBN 978-989-758-048-2, pages 544-548. DOI: 10.5220/0005174905440548
in Bibtex Style
@conference{sstm14,
author={Fabio Rinaldi},
title={Text Mining Technologies for Database Curation},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2014)},
year={2014},
pages={544-548},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005174905440548},
isbn={978-989-758-048-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2014)
TI - Text Mining Technologies for Database Curation
SN - 978-989-758-048-2
AU - Rinaldi F.
PY - 2014
SP - 544
EP - 548
DO - 10.5220/0005174905440548