Authors:
Monah Bou Hatoum
1
;
Jean Claude Charr
1
;
Alia Ghaddar
2
;
3
;
Christophe Guyeux
1
and
David Laiymani
1
Affiliations:
1
FEMTO-ST Institute, UMR 6174 CNRS, University of Franche-Comte, 90000 Belfort, France
;
2
Department of Computer Science, the International University of Beirut, Beirut P.O. Box 146404, Lebanon
;
3
Department of Computer Science, Lebanese International University, Beirut, Lebanon
Keyword(s):
Deep Learning, Relevancy Comparison, Hierarchical Relationships, Semantic Similarity, Custom Loss Function, Icd-10 Coding, Cosine Similarity, Medical Coding Automation, Machine Learning In Healthcare.
Abstract:
Background: Accurate ICD-10 coding is vital for healthcare operations, yet manual processes are inefficient and error-prone. Machine learning offers automation potential but struggles with complex relationships between codes and clinical text. Objective: We propose a semantics-aware approach using custom loss functions to improve accuracy and clinical relevance in multi-label ICD-10 coding by leveraging cosine similarity to measure semantic relatedness between predicted and actual codes. Methods: Four custom loss functions (True Label Cardinality Loss (TLCL), Predicted Label Cardinality Loss (PLCL), Balanced Harmonic Mean Loss (BHML), and Weighted Harmonic Mean Loss (WHML)) were designed to capture hierarchical and semantic relationships. These were validated on a dataset of 9.57 million clinical notes from 24 medical specialties, using binary cross-entropy (BCE) loss as a baseline. Results: Our approach achieved a test micro-F1 score of 88.54%, surpassing the 74.64% baseline, with f
aster convergence and improved performance across specialties. Conclusion: Incorporating semantic similarity into the loss functions enhances ICD-10 code prediction, addressing clinical nuances and advancing machine learning in medical coding.
(More)