Finally, Table 9 shows the final results when the
best combination was selected by inspecting the re-
sults in the training set. The best combination in
the training set achieved a micro-F1 of 0.9143 and a
macro-F1 of 0.8596 whereas in the test set it attained
a micro-F1 of 0.8844 and a macro-F1 of 0.7271. Re-
sults show that there is a clear overfitting to the train-
ing set because the macro-F1 on the test set is around
13 percentage points smaller. In this optimal configu-
ration, rule-based methods were mostly selected.
5 CONCLUSIONS
In this paper we proposed a system for the automatic
classification of 13 binary selection criteria given only
patient clinical records. The development of systems
as the one herein described is vital for helping physi-
cians in the selection of patient cohorts for clinical tri-
als, which is a task known to be both time-consuming
and complex.
Our system contains rule-based methods and ma-
chine learning algorithms that are accordingly se-
lected to better classify each criterion. In this work,
we developed hand-crafted rules for almost all the cri-
teria. However, the process of creating adequate rules
is hard and cumbersome since it requires an analysis
of the data, not excluding the medical expertise that
is oftentimes required. Moreover, while rule-based
methods achieved good results, these require the de-
velopment of a distinct algorithm for each criterion
while machine learning classifiers do not face this
problem, being easier to re-use.
In this task, classical machine learning classifiers
worked much better when compared to deep learning
classifiers. In most cases, deep learning models pre-
dicted the same label every time, behaving similarly
to a baseline classifier and proving that the dataset had
a reduced size. Our results do also show that machine
learning classifiers provided better results for crite-
ria with balanced labels, evidencing that other criteria
lack training data.
As future work, a better pre-processing step can
be followed and the developed rules can be improved
with the help of medical expertise. Furthermore, an-
other possible way of improving the performance of
the system in some criteria consists in the implemen-
tation of different techniques for augmenting training
data with data from external resources. Finally, other
techniques for using distributed word representations
could be considered, and optimization of the classi-
fier hyperparameters could be performed through grid
search.
ACKNOWLEDGEMENTS
This project was partially funded by the In-
tegrated Programme of SR&TD “SOCA” (Ref.
CENTRO-01-0145-FEDER-000010) and “MMIR”
(Ref. PTDC/EEI-ESS/6815/2014), co-funded by
Centro 2020 program, Portugal 2020, European
Union, through the European Regional Development
Fund.
Rui Antunes is supported by the Fundação
para a Ciência e a Tecnologia (PhD Grant
SFRH/BD/137000/2018). João Figueira Silva is
supported by the Fundação para a Ciência e a Tec-
nologia (PhD Grant PD/BD/142878/2018). Arnaldo
Pereira is supported by the Fundação para a Ciência
e a Tecnologia (PhD Grant PD/BD/142877/2018).
REFERENCES
Chen, T. and Guestrin, C. (2016). XGBoost: a scalable
tree boosting system. In Proceedings of the 22nd
ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, pages 785–794, San
Francisco, California, USA. ACM.
Chollet, F. et al. (2015). Keras. https://keras.io.
ˇ
Reh˚u
ˇ
rek, R. and Sojka, P. (2010). Software framework for
topic modelling with large corpora. In Proceedings of
the LREC 2010 Workshop on NewChallenges for NLP
Frameworks, pages 45–50, Valletta, Malta. ELRA.
Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L.-
w. H., Feng, M., Ghassemi, M., Moody, B., Szolovits,
P., Anthony Celi, L., and Mark, R. G. (2016). MIMIC-
III, a freely accessible critical care database. Scientific
Data, 3.
Ludvigsson, J. F., Pathak, J., Murphy, S., Durski, M.,
Kirsch, P. S., Chute, C. G., Ryu, E., and Murray, J. A.
(2013). Use of computerized algorithm to identify in-
dividuals in need of testing for celiac disease. Jour-
nal of the American Medical Informatics Association,
20(e2):e306–e310.
Mann, C. J. (2003). Observational research methods.
Research design II: cohort, cross sectional, and
case-control studies. Emergency Medicine Journal,
20(1):54–60.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).
Efficient estimation of word representations in vector
space. arXiv e-print.
Pathak, J., Kho, A. N., and Denny, J. C. (2013). Electronic
health records-driven phenotyping: challenges, recent
advances, and perspectives. Journal of the American
Medical Informatics Association, 20(e2):e206–e211.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer,
P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,
A., Cournapeau, D., Brucher, M., Perrot, M., and
Duchesnay, E. (2011). Scikit-learn: machine learning
HEALTHINF 2019 - 12th International Conference on Health Informatics
66