proposed, and it will be further investigated through
academic research in the future. This architecture will
enable medical data to be extracted from scanned doc-
uments, standardized using FHIR/OpenEHR and then
stored in EHRs.
In recent years, there has been an increasing interest
in using the quantity of data found in scanned medical
documents. Healthcare delivery could be revolution-
ized by using scanned medical documents to predict
patient outcomes and utilizing them in this way has
the potential to improve patient outcomes, save doc-
tors’ times, and save costs for the healthcare systems.
Extracting structured data from scanned documents
using technologies such as OCR and NLP, and ML
models to extract useful information standardizing it
in a common format like FHIR or OpenEHR, and us-
ing methods like ML and BERT models to generate
predictions is a relatively new field. As healthcare or-
ganizations look for ways to use the enormous quan-
tity of data included in scanned medical documents to
enhance patient outcomes and reduce costs, this ap-
proach has gained increasing attention in recent years.
However, it is important to note that the process for
collecting, standardizing, and analyzing the data can
be challenging, time-consuming, and expensive. Nev-
ertheless, the advantages of using scanned documents
in this way make it a worthwhile endeavor for health-
care researchers.
ACCERN (2022). Differences between
structured, unstructured, and semi-
structured data. https://accern.com/blog/
Aggarwal, A., Garhwal, S., and Kumar, A. (2018). Hedea:
a python tool for extracting and analysing semi-
structured information from medical records. Health-
care informatics research, 24(2):148–153.
Ahmed, A., Rebeiro-Hargrave, A., Nohara, Y., Kai, E.,
Ripon, Z. H., and Nakashima, N. (2014). Targeting
morbidity in unreached communities using portable
health clinic system. IEICE Transactions on Commu-
nications, E97.B(3):540–545.
Bishop, C. M. (1994). Neural networks and their applica-
tions. Review of scientific instruments, 65(6):1803–
Chowdhary, K. R. (2020). Natural Language Processing,
pages 603–649. Springer India, New Delhi.
Hossain, F. and Ahmed, A. (2021). Visualization of health-
care data for busy doctors in developing countries to
make efficient clinical decisions. In 10th Social Busi-
ness Academia Conference.
Hossain, F., Islam, R., Ahmed, M. T., and Ahmed, A.
(2022). Technical requirements to design a personal
medical history visualization tool for doctors. In Pro-
ceedings of the 8th International Conference on Hu-
man Interaction and Emerging Technologies. IHIET,
https://ihiet. org.
Hsu, E., Malagaris, I., Kuo, Y.-F., Sultana, R., and Roberts,
K. (2022). Deep learning-based nlp data pipeline for
ehr-scanned document information extraction. JAMIA
open, 5(2):ooac045.
Kaneko, K., Onozuka, D., Shibuta, H., and Hagihara, A.
(2018). Impact of electronic medical records (emrs)
on hospital productivity in japan. International jour-
nal of medical informatics, 118:36–43.
Kessels, R. P. (2003). Patients’ memory for medical in-
formation. Journal of the Royal Society of Medicine,
Kodali, R. K., Swamy, G., and Lakshmi, B. (2015). An im-
plementation of iot for healthcare. In 2015 IEEE Re-
cent Advances in Intelligent Computational Systems
(RAICS), pages 411–416. IEEE.
LaValley, M. P. (2008). Logistic regression. Circulation,
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H.,
and Kang, J. (2019). BioBERT: a pre-trained biomed-
ical language representation model for biomedical text
mining. Bioinformatics, 36(4):1234–1240.
nczuk, M. M. and Protasiewicz, J. (2018). A recent
overview of the state-of-the-art elements of text clas-
sification. Expert Systems with Applications, 106:36–
Mithe, R., Indalkar, S., and Divekar, N. (2013). Optical
character recognition. International journal of recent
technology and engineering (IJRTE), 2(1):72–75.
Mohit, B. (2014). Named Entity Recognition, pages 221–
245. Springer Berlin Heidelberg, Berlin, Heidelberg.
Pawar, Y., Henriksson, A., Hedberg, P., and Naucler, P.
(2022). Leveraging clinical bert in multimodal mortal-
ity prediction models for covid-19. In 2022 IEEE 35th
International Symposium on Computer-Based Medi-
cal Systems (CBMS), pages 199–204. IEEE.
Pisner, D. A. and Schnyer, D. M. (2020). Support vector
machine. In Machine learning, pages 101–121. Else-
Rasmy, L., Xiang, Y., Xie, Z., Tao, C., and Zhi, D. (2021).
Med-bert: pretrained contextualized embeddings on
large-scale structured electronic health records for dis-
ease prediction. NPJ digital medicine, 4(1):1–13.
Rhodes, S., Greene, N. R., and Naveh-Benjamin, M.
(2019). Age-related differences in recall and recog-
nition: A meta-analysis. Psychonomic Bulletin & Re-
view, 26(5):1529–1547.
Rigatti, S. J. (2017). Random forest. Journal of Insurance
Medicine, 47(1):31–39.
Rogers, A., Kovaleva, O., and Rumshisky, A. (2020). A
primer in bertology: What we know about how bert
works. Transactions of the Association for Computa-
tional Linguistics, 8:842–866.
A Machine Learning Approach to Digitize Medical History and Archive in a Standard Format