in the evaluation and the information contained in the
knowledge base. While the knowledge base is
composed of features that describe the CKD medical
condition, these features were not always present in
the EHR of a patient suffering from CKD. For this
reason, three of the instances belonging to the CKD
class were empty, thus their classification was
incorrect due to the missing information requested by
the knowledge base. This finding deepens our
understanding regarding the way the investigations
are conducted in the medical domain. It is a known
fact that the strategies for determining the cause of
patients’ suffering are different across institutions,
and thus, across medical doctors. These findings
support the premise that the collection and linkage of
as many sources of data as possible, even if their
structure and purpose may seem dissimilar, leads to
more accurate solutions.
6 CONCLUSIONS
The present research explores strategies for handling
the transformation of the unstructured data into
structured format via a knowledge transformation
flow. The output of the transformation enables the
classification of the input unstructured text,
represented by EHRs, into a specific diagnosis. While
the current solution covers a single diagnosis, we
propose extending the training dataset with further
diagnoses, thus extending the feature vector with
features that are present when assessing the presence
of specific diseases. The current status of the medical
Assistive Decision Support System covers complete
solutions for automatically structuring medical
documents and extracting relevant medical concepts
via the PreNex and MedCIN strategies, while the
prediction module is argued in favour, being
validated with an actual use case.
The proposed strategy represents the final step in
our proposed medical Assistive Decision Support
System, introduced in our previous research
(Bărbănțan and Potolea, 2015). Starting from raw
medical data, the proposed solution infers the
appropriate suggestion to each specific task (further
investigations, diagnosis or medication). The solution
enables the transformation of the medical documents
– which are usually stored in unstructured format –
into a structured format by exploring and applying a
taxonomy based mapping technique. This technique
involves the extraction of the relevant terms from the
text assisted by a domain specific terminology and a
context based classification. A number of pre-
processing steps are involved in normalizing both the
input text (unstructured data) and the terminology
sources (structured data), which proved to carry a
significant role. The filtering step which allows for
the discrimination between medical and non-medical
concepts proves to be an efficient method. In the
selection of the terminology sources (WordNet and
SNOMED-CT) their ability to cover the biomedical
domain and also to obtain accurate information was
considered.
ACKNOWLEDGEMENTS
This work has been partially supported by Brained
City - Information Technology based Innovative
Development of Cluj-Napoca Fully Integrated Urban
Ecosystem.
REFERENCES
Alag, S., 2009. Collective Intelligence in Action.
Greenwich: Manning Publications Co.
Albin, A. et al., 2014. Enabling Online Studies of
Conceptual Relationships Between Medical Terms:
Developing an Efficient Web Platform. JMIR Med
Inform, 2(2:e23).
Bărbănțan, I., Lemnaru, C., Potolea, R., 2014. Disease
Identification in Electronic Health Records. An
ontology based approach. Rome, Italy, SCITEPRESS,
pp. 261-268.
Bărbănțan, I., Lemnaru, C., Potolea, R., 2015. Concepts
Identification in Medical Documents. York, University
of Sheffield.
Bărbănțan, I., Porumb, M., Lemnaru, C., Potolea, R., 2016.
Feature Engineered Relation Extraction – Medical
Documents Setting. International Journal of Web
Information Systems (IJWIS), 12(3), pp. 336-358.
Bărbănțan, I., Potolea, R., 2014. Exploiting Word Meaning
for Negation Identification in Electronic Health
Records. Cluj-Napoca, IEEE Computer Society, pp.
283-289.
Bărbănțan, I., Potolea, R., 2015. Knowledge Extraction and
Prediction from Medical Documents. Ohrid, ICT ACT.
Boaz, D., Shahar, Y., 2003. Idan: A distributed temporal-
abstraction mediator for medical databases. Protaras,
Cyprus, Proceedings of the 9th Conference on Artificial
Intelligence in Medicine—Europe (AIME).
Bodenreider, O., 2004. The Unified Medical Language
System (UMLS): integrating biomedical terminology.
Nucleic Acids Res. 32(Database issue), pp. 264-270.
Chapman, W., Bridewell, W., Hanbury, P., Cooper, G. F.,
2001. A simple algorithm for identifying negated
findings and diseases in discharge summaries. Journal
of Biomedical Informatics, 34(5), pp. 301-310.
D'Avolio, L., 2013. 6 Questions to Guide Natural Language
Processing Strategy. Information Week, 18 February.