6 CONCLUSIONS
In conclusion, our research introduces a novel
algorithm, the cluster-focused combination algorithm,
designed to overcome the challenges associated with
annotating Electronic Health Record (EHR) text using
interface terminology. This algorithm addresses
critical issues in text annotation by utilizing a dynamic
programming approach, effectively balancing the need
for high annotation coverage and breadth while
mitigating common pitfalls of previous methods. Our
extensive evaluation on benchmark datasets, such as
Mimic III, reveals an improvement in annotation
coverage and captured 5756 missed annotated
concepts by the traditional BioPortal Annotator.
Additionally, the cluster-focused combination
algorithm demonstrates a notable reduction in
execution time by an average of about 8000 times,
enhancing its scalability for large datasets.
These findings make the optimized CFC a highly
effective tool for real-world text annotation tasks that
rely on interface terminology. By providing a more
efficient and comprehensive solution, this work not
only advances the capabilities in EHR text annotation
but also contributes to the broader field of Natural
Language Processing. This is particularly significant
for the development of Large Language Models, which
depend on vast, well-annotated datasets. Our algorithm
paves the way for future innovations in dataset
preparation, promising to streamline and accelerate the
annotation process for large-scale NLP applications.
ACKNOWLEDGMENTS
H. Liu acknowledges startup funds from Montclair
State Univ.
REFERENCES
Aronson, A. R., & Lang, F.-M. (2010). An overview of
MetaMap: Historical perspective and recent advances.
Journal of the American Medical Informatics
Association, 17(3), 229–236.
Blumenthal, D. (2009). Stimulating the adoption of health
information technology. The West Virginia Medical
Journal, 105(3), 28–29.
Bodenreider, O. (2004). The Unified Medical Language
System (UMLS): Integrating biomedical terminology.
Nucleic Acids Research, 32(Database issue), D267–
D270.
Dai, M. (2021). Mgrep. In GitHub repository. GitHub.
https://github.com/daimh/mgrep
Dean, M., Schreiber, A. T., Bechofer, S. K., Harmelen, F.
van, Hendler, J. A., Horrocks, I., MacGuinness, D.,
Patel-Schneider, P. F., & Stein, L. A. (2004).
OWL Web Ontology Language—Reference.
https://api.semanticscholar.org/CorpusID:60998041
Dehkordi, M. K. H. & others. (2023). Using annotation for
computerized support for fast skimming of cardiology
electronic health record notes. 2023 IEEE International
Conference on Bioinformatics and Biomedicine
(BIBM), 4043–4050.
Demner-Fushman, D., Rogers, W. J., & Aronson, A. R.
(2017). MetaMap Lite: An evaluation of a new Java
implementation of MetaMap. Journal of the American
Medical Informatics Association, 24(4), 841–844.
Donnelly, K. (2006). SNOMED-CT: The advanced
terminology and coding system for eHealth. Stud
Health Technol Inform, 121, 279–290.
Jonquet, C., Shah, N. H., & Musen, M. A. (2009). The open
biomedical annotator. Summit on Translational
Bioinformatics, 2009, 56–60.
Kanter, A. S., Wang, A. Y., Masarie, F. E., Naeymi-Rad,
F., & Safran, C. (2008). Interface Terminologies:
Bridging the Gap between Theory and Reality for
Africa. Studies in Health Technology and Informatics,
136, 27–32.
Keloth, V. K., Zhou, S., Lindemann, L., Zheng, L.,
Elhanan, G., Einstein, A. J., Geller, J., & Perl, Y.
(2023). Mining of EHR for interface terminology
concepts for annotating EHRs of COVID patients.
BMC Medical Informatics and Decision Making, 23.
https://api.semanticscholar.org/CorpusID:257106827
Keloth, V., Zhou, S., Einstein, A., Elhanan, G., Chen, Y., &
Geller, et al., J. (2020). Generating Training Data for
Concept-Mining for an ‘Interface Terminology’
Annotating Cardiology EHRs. 2020 IEEE International
Conference on Bioinformatics and Biomedicine
(BIBM).
M, S., & Chacko, A. M. (2020). A Case for Semantic
Annotation Of EHR. 2020 IEEE 44th Annual
Computers, Software, and Applications Conference
(COMPSAC), 1363–1367.
Miles, A., & Bechhofer, S. (2009). SKOS Simple
Knowledge Organization System Reference
.
Musen, M. A., Shah, N. H., Noy, N., Dai, B., Dorf, M.,
Griffith, N., Buntrok, J., Jonquet, C., Montegut, M., &
Rubin, D. (2008). BioPortal: Ontologies and data
resources with the click of a mouse. AMIA ... Annual
Symposium Proceedings. AMIA Symposium, 1223–
1224.
Noy, N. F. & others. (2009). BioPortal: Ontologies and
integrated data resources at the click of a mouse.
Nucleic Acids Research, 37(suppl_2), W170–W173.
Rosenbloom, S. T. & others. (2006). Interface terminologies:
Facilitating direct entry of clinical data into electronic
health record systems. Journal of the American Medical
Informatics Association, 13(3), 277–288.
Rosenbloom, S. T. & others. (2008). A model for evaluating
interface terminologies. Journal of the American
Medical Informatics Association, 15(1), 65–76.