One of the main advantages of the EMTE ap-
proach is its flexibility and maintainability. The dic-
tionaries can be updated at any time without any need
to retrain the models on new medical terms. In ad-
dition, EMTE can be used as a document quality en-
hancer as it can unify the negations writing styles and
replace the abbreviations with their full-terms.
This paper presented a cleansing approach that im-
proves the quality of medical terms extraction from
unstructured clinical data using pattern matching
rules based on dictionaries. The solution was con-
ceived with flexibility and maintainability in mind for
industrial use. The experiments showed that our ap-
proach helps solving the the ICD-10 prediction prob-
lem by improving the quality of the data fed to the
DNNs. As a result, the performance of the trained
models was improved according to various metrics.
The proposed approach also reduced the required re-
sources to train the models and decreased the training
time by accelerating the convergence of the models.
In future works and in order to improve further-
more the quality of the medical data, we aim to ex-
tend this work to improve data quality by tackling
several challenges like: medical term synonyms, im-
prove abbreviation detection by adding more features
(e.g. body site, gender, and age), and medical investi-
gation results (laboratory and radiology) in CCs.
All computations have been performed on the
esocentre of Franche-Comt
e, France and the med-
ical data was aquired from the Specialized Medical
Center Hospital in Riyadh, KSA.
