Authors:
Hanna Berg
;
Aron Henriksson
;
Uno Fors
and
Hercules Dalianis
Affiliation:
Department of Computer and Systems Sciences, Stockholm University, Sweden
Keyword(s):
De-identification, Privacy, Electronic Health Records, Clinical Text, Natural Language Processing.
Abstract:
Privacy is challenged by both advances in AI-related technologies and recently introduced legal regulations. The problem of privacy has been extensively studied within the privacy community, but has largely focused on methods for protecting and assessing the privacy of structured data. Research aiming to protect the integrity of patients based on clinical text has primarily referred to US law and relied on automatically recognising predetermined, both direct and indirect, identifiers. This article discusses the various challenges concerning the re-use of unstructured clinical data, in particular in the form of clinical text, and focuses on ambiguous and vague terminology, how different legislation affects the requirements for de-identification, differences between methods for unstructured and structured data, the impact of approaches based on named entity recognition and replacing sensitive data with surrogates, as well as the lack of measures for usability and re-identification risk.