REFERENCES
Aggarwal, C. C. and Zhai, C., editors (2012). Mining Text
Data. Springer.
BIREME (2020a). DeCS. http://decs.bvsalud.org/. Last ac-
cess: August 24, 2020. BIREME (Latin American and
Caribbean Center on Health Sciences Information).
BIREME (2020b). DeCS Web Services. http://wiki.
reddes.bvsalud.org/index.php/Servicios DeCS. Last
access: August 24, 2020. BIREME (Latin American
and Caribbean Center on Health Sciences Informa-
tion).
Breiman, L. (2001). Random Forests. Machine Learning,
45(1):5–32.
Buckland, M. and Gey, F. (1994). The relationship between
recall and precision. Journal of the American Society
for Information Science, 45(1):12–19.
Caruana, R. and Niculescu-Mizil, A. (2006). An empiri-
cal comparison of supervised learning algorithms. In
23rd International Conference on Machine Learning
(ICML 2006), pages 161–168. ACM.
Casta
˜
no, J., Gambarte, M. L., Park, H. J., Avila Williams,
M. d. P., P
´
erez, D., Campos, F., Luna, D., Ben
´
ıtez,
S., Berinsky, H., and Zanetti, S. (2016). A machine
learning approach to clinical terms normalization. In
15th Workshop on Biomedical Natural Language Pro-
cessing, pages 1–11. Association for Computational
Linguistics.
Chakrabarti, S., Cox, E., Frank, E., Gting, R. H., Han, J.,
Jiang, X., Kamber, M., Lightstone, S. S., Nadeau,
T. P., Neapolitan, R. E., Pyle, D., Refaat, M., Schnei-
der, M., Teorey, T. J., and Witten, I. H. (2008). Data
Mining: Know It All. Morgan Kaufmann Publishers
Inc.
Chen, T., Dredze, M., Weiner, J. P., Hernandez, L., Kimura,
J., and Kharrazi, H. (2019). Extraction of geri-
atric syndromes from electronic health record clini-
cal notes: Assessment of statistical natural language
processing methods. JMIR Medical Informatics,
7(1):e13039:1–e13039:12.
Coppersmith, G., Leary, R., Crutchley, P., and Fine, A.
(2018). Natural language processing of social media
as screening for suicide risk. Biomedical Informatics
Insights, 10:1–11.
Cornet, R. and de Keizer, N. (2008). Forty years of
SNOMED: a literature review. BMC Medical Infor-
matics and Decision Making, 8(S1).
Costumero, R., Lopez, F., Gonzalo-Mart
´
ın, C., Millan, M.,
and Menasalvas, E. (2014). An approach to detect
negation on medical documents in Spanish. In Brain
Informatics and Health, pages 366–375. Springer.
Dave, H. (2019). FrequencyWords – Repository for Fre-
quency Word List Generator and processed files.
https://github.com/hermitdave/FrequencyWords. Last
access: August 24, 2020.
Explosion AI (2016–2020). spaCy. https://spacy.io. Last
access: August 24, 2020.
Fredkin, E. (1960). Trie memory. Communications of the
ACM, 3(9):490–499.
Garbe, W. (2019). SymSpell. https://github.com/wolfgarbe/
SymSpell. Last access: August 24, 2020.
Gholamrezazadeh, S., Salehi, M. A., and Gholamzadeh, B.
(2009). A comprehensive survey on text summariza-
tion systems. In Second International Conference on
Computer Science and its Applications (CSA 2009),
pages 1–6.
Haapala, A. (2019). Python-Levenshtein. https://github.
com/ztane/python-Levenshtein. Last access: August
24, 2020.
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J.,
and Scholkopf, B. (1998). Support Vector Ma-
chines. IEEE Intelligent Systems and their Applica-
tions, 13(4):18–28.
IACS (2018). Gu
´
ıas de Pr
´
actica Cl
´
ınica del Sistema Na-
cional de Salud / Clinical Practice Guidelines of the
National Health System. https://portal.guiasalud.es.
Last access: August 24, 2020. Instituto Aragon
´
es de
Ciencias de la Salud (IACS).
Jiang, J. (2012). Information extraction from text. In Mining
Text Data, pages 11–41. Springer.
Kibriya, A. M., Frank, E., Pfahringer, B., and Holmes, G.
(2004). Multinomial Naive Bayes for text categoriza-
tion revisited. In Australasian Joint Conference on Ar-
tificial Intelligence (AI 2004), volume 3339 of Lecture
Notes in Computer Science, pages 488–499. Springer.
Lan, M., Tan, C. L., Su, J., and Low, H. B. (2007). Text
representations for text categorization: A case study
in biomedical domain. In International Joint Con-
ference on Neural Networks (IJCNN 2007)), pages
2557–2562. IEEE.
Liao, X. and Zhao, Z. (2019). Unsupervised approaches for
textual semantic annotation, a survey. ACM Comput-
ing Surveys, 52(4):66:1–66:45.
Loper, E. and Bird, S. (2002). NLTK: The Natural Lan-
guage Toolkit. arXiv, cs/0205028.
mammothb (2019). symspellpy – Python port of SymSpell.
https://github.com/mammothb/symspellpy. Last ac-
cess: August 24, 2020.
Manning, C. D., Raghavan, P., and Sch
¨
utze, H. (2008). In-
troduction to Information Retrieval. Cambridge Uni-
versity Press.
Marimon, M., Gonzalez-Agirre, A., Intxaurrondo, A.,
Rodr
´
ıguez, H., Martin, J. L., Villegas, M., and
Krallinger, M. (2019). Automatic de-identification
of medical texts in Spanish: the MEDDOCAN track,
corpus, guidelines, methods and evaluation of results.
In Iberian Languages Evaluation Forum (IberLEF
2019), volume 2421, pages 618–638. CEUR Workhop
Proceedings.
Marrero, M., S
´
anchez-Cuadrado, S., Urbano, J., Morato, J.,
and Moreiro, J.-A. (2010). Sistemas de recuperaci
´
on
de informaci
´
on adaptados al dominio biom
´
edico. In-
formaci
´
on biom
´
edica, 19(3):246–254.
Marrero, M., Urbano, J., S
´
anchez-Cuadrado, S., Morato,
J., and G
´
omez-Berb
´
ıs, J. M. (2013). Named Entity
Recognition: Fallacies, challenges and opportunities.
Computer Standards & Interfaces, 35(5):482–489.
N
´
ev
´
eol, A., Dalianis, H., Velupillai, S., Savova, G., and
Zweigenbaum, P. (2018). Clinical natural language
Text Mining of Medical Documents in Spanish: Semantic Annotation and Detection of Recommendations
205