BigTexts - A Framework for the Analysis of Electronic Health Record Narrative Texts based on Big Data Technologies
Wilson Alzate Calderón, Alexandra Pomares Quimbaya, Rafael A. Gonzalez, Oscar Mauricio Muñoz
2015
Abstract
In the healthcare domain the analysis of Electronic Medical Records (EMR) may be classified as a Big Data problem since it has the three fundamental characteristics: Volume, Variety and Speed. A major drawback is that most of the information contained in medical records is narrative text, where natural language processing and text mining are key technologies to enhance the utility of medical records for research, analysis and decision support. Among the tasks performed for natural language processing, the most critical, in terms of time consumption, are the pre-processing tasks that give some structure to the original non-structured text. Studying existing research on the use of Big Data techniques in the healthcare domain reveals few practical contributions, especially for EMR analysis. To fill this gap, this paper presents BigTexts, a framework that provides pre-built functionalities for the execution of pre-processing tasks over narrative texts contained in EMR using Big Data techniques. BigTexts enables faster results on EMR narrative text analysis improving decision making in healthcare.
References
- Brinkmann, B. H., Bower, M. R., Stengel, K. A., Worrell, G. A., and Stead, M. (2009). Large-scale electrophysiology: Acquisition, compression, encryption, and storage of big data. Journal of Neuroscience Methods, 180(1):185-192.
- Capriolo, E., Wampler, D., and Rutherglen, J. (2012). Programming Hive. O'Reilly Media, 1 edition edition.
- Chawla, N. V. and Davis, D. A. (2013). Bringing big data to personalized healthcare: A patient-centered framework. Journal of General Internal Medicine, 28(S3):660-665.
- Das, T. and Mohan Kumar, P. (2013). Big data analytics: A framework for unstructured data analysis. School of Information Technology and Engineering, VIT University.
- Fox, B. (2011). Using big data for big impact. leveraging data and analytics provides the foundation for rethinking how to impact patient behavior. Health management technology, 32(11):16.
- Gates, A. (2011). Programming Pig. O'Reilly Media, 1 edition edition.
- Lancaster University (2014). What is stemming? Retrieved from http://www.comp.lancs.ac.uk/computing/research stemming/general/.
- Liyanage, H., Liaw, S.-T., and de Lusignan, S. (2012). Accelerating the development of an information ecosystem in health care, by stimulating the growth of safe intermediate processing of health information (IPHI). Informatics in primary care, 20(2):81-86.
- Maheshwari, A. (2015). Data Analytics Made Accessible.
- Mayer-Schonberger, V. and Cukier, K. (2013). Big Data: A Revolution That Will Transform How We Live, Work, and Think. Houghton Mifflin Harcourt, Boston.
- McAfee, A. and Brynjolfsson, E. (2012). Big data: The management revolution. Harvard business review, 90(10):p60-68.
- Meeker, W. Q. and Hong, Y. (2014). Reliability meets big data: Opportunities and challenges. Quality Engineering, 26(1):102-116.
- Moore, K. D., Eyestone, K., and Coddington, D. C. (2013). The big deal about big data. Healthcare financial management: journal of the Healthcare Financial Management Association, 67(8):60-66, 68. PMID: 23957187.
- Purkayastha, S. and Braa, J. (2013). Big data analytics for developing countries - using the cloud for operational BI in health. The Electronic Journal of Information Systems in Developing Countries, 59(0).
- Rajaraman, A. and Ullman, J. D. (2012). Mining of massive datasets. Cambridge University Press, New York, N.Y.; Cambridge.
- Sadalage, P. J. and Fowler, M. (2012). NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley Professional, Upper Saddle River, NJ, 1 edition edition.
- Sengupta, P. P. (2013). Intelligent platforms for disease assessment. JACC: Cardiovascular Imaging, 6(11):1206-1211.
- Tablan, V., Roberts, I., Cunningham, H., and Bontcheva, K. (2012). GATECloud.net: a platform for largescale, open-source text processing on the cloud. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(1983):20120071-20120071.
- The Stanford NLP Group (2014a). Coreference resolution. Retrieved from http://nlp.stanford.edu/projects/coref.shtml.
- The Stanford NLP Group (2014b). Stanford CoreNLP. Retrieved from http://nlp.stanford.edu/software/corenlp.shtml.
- The Stanford NLP Group (2014c). Stanford loglinear part-of-speech tagger. Retrieved from http://nlp.stanford.edu/software/tagger.shtml.
- The Stanford NLP Group (2014d). Stanford named entity recognizer (NER). Retrieved from http://nlp.stanford.edu/software/CRF-NER.shtml.
- The University of Waikato (2014). Weka 3 - data mining with open source machine learning software in java. Retrieved from http://www.cs.waikato.ac.nz/ml/weka/.
- White, T. (2012). Hadoop: The Definitive Guide. Yahoo Press, Beijing, third edition edition edition.
Paper Citation
in Harvard Style
Alzate Calderón W., Pomares Quimbaya A., A. Gonzalez R. and Muñoz O. (2015). BigTexts - A Framework for the Analysis of Electronic Health Record Narrative Texts based on Big Data Technologies . In Proceedings of the 1st International Conference on Information and Communication Technologies for Ageing Well and e-Health - Volume 1: ICT4AgeingWell, ISBN 978-989-758-102-1, pages 129-136. DOI: 10.5220/0005434101290136
in Bibtex Style
@conference{ict4ageingwell15,
author={Wilson Alzate Calderón and Alexandra Pomares Quimbaya and Rafael A. Gonzalez and Oscar Mauricio Muñoz},
title={BigTexts - A Framework for the Analysis of Electronic Health Record Narrative Texts based on Big Data Technologies},
booktitle={Proceedings of the 1st International Conference on Information and Communication Technologies for Ageing Well and e-Health - Volume 1: ICT4AgeingWell,},
year={2015},
pages={129-136},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005434101290136},
isbn={978-989-758-102-1},
}
in EndNote Style
TY  - CONF 
JO  - Proceedings of the 1st International Conference on Information and Communication Technologies for Ageing Well and e-Health - Volume 1: ICT4AgeingWell,
TI  - BigTexts - A Framework for the Analysis of Electronic Health Record Narrative Texts based on Big Data Technologies
SN  - 978-989-758-102-1
AU  - Alzate Calderón W. 
AU  - Pomares Quimbaya A. 
AU  - A. Gonzalez R. 
AU  - Muñoz O. 
PY  - 2015
SP  - 129
EP  - 136
DO  - 10.5220/0005434101290136