How to Collect Consent for an Anonymous Medical Database

Emmanuel Benoist and Jan Sliwa

RISIS, Bern University of Applied Sciences - TI, Quellgasse 21, CH-2501 Biel, Switzerland

Keywords:

Medical Databases, Anonymous Consent, Privacy, PKI Infrastructure.

Abstract:

The goal of some medical databases is not to support the actual treatment of individual patients, but to provide

the platform for medical research. Health data collected in such databases have to be anynomized - they should

be analyzed only statistically and should not permit to retrieve the patient’s identity. Medical data collected

for research should be anonymized to protect the patients’ privacy. In many countries it is mandatory. In

many cases, not only one person treats a patient for a given illness. The documentation of a case requires the

collaboration of different physicians that share information. This sharing of information requires the patient

to authorise the access to the data stored by one physician by another one. We need therefore to implement a

system for collecting the consent of an anonymous person. We present a novel solution to allow the practitioner

to collect the consent of the patient in order to access the data recorded for that person. This solution is based

on existing infrastructure, such as X509 certiﬁcates (present in e-ID or e-Health cards). Patients do not require

to acquire any new hardware or to remember any new secret. We produce the ﬁngerprint of the private key

of the patient that can be used to re-identify the patient without having to know the identity of the patient (for

instance the certiﬁcate) or even the patient’s public key.

1 INTRODUCTION

The goal of this paper is to present a practical solution

for efﬁcient, secure and privacy preserving sharing

of anonymous medical information stored in registry

used for medical evaluative research. The aspects of

constructing and managing registries are presented in

(Gliklich and Dreyer, 2010).

Important work is currently being done in the

area of the Electronic Health Records, where infor-

mation is combined from many distributed and au-

tonomous sources. Often such information is het-

erogeneous, in various formats, including unstruc-

tured notes. Many networks assuming mutual trust

and permitting the exchange of medical data are cur-

rently in use or are being implemented. We can men-

tion here e-toile

(Geneva, Switzerland), Clalit Health

Services

(Israel) or GCS EMOSIST-FC

(Franche-

Comt

e, France). Such decentralized systems do not

create a new central database, but rather let data be

stored locally where they have been produced and

provide the methods for remote access. Data are

used for treatment only and cannot be used for statis-

www.e-toile-ge.ch/etoile.html

www.clalit-global.co.il/en/

www.ch-dole.fr/contenu.php?idR=1

tics, since they are mainly heterogeneous (each data

provider having its own format).

The setting we examine in this paper is different.

We consider the case of a database (registry) used to

collect data for medical evaluative research. On the

contrary to Electronic Health Records, a medical reg-

istry contains a limited set of data, but coherent for

all patients, as its main goal is to allow to perform

meaningful statistics. Medical data are collected in

a centralized database where they can be compared

and analyzed. Unlike for the EHR, data in a registry

can be anonymous, since they are not used for treat-

ment of the patient but only for statistical purposes.

As personal information is necessary to retrieve a pa-

tient in order to add supplementary information (e.g.

a follow-up record), it is also stored, but separately,

so that connecting medical cases with personal data is

impossible. We will describe the architecture used to

handle both parts of data in a privacy protecting man-

ner.

The physician who treats the patient can access all

data records he/she has created. The same patient may

be also treated by another physician who has no direct

access to the records created by other doctors. The

physician may however need this information in order

to apply correct treatment or to further document the

405

Benoist E. and Sliwa J..

How to Collect Consent for an Anonymous Medical Database.

DOI: 10.5220/0004902404050412

In Proceedings of the International Conference on Health Informatics (HEALTHINF-2014), pages 405-412

ISBN: 978-989-758-010-9

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: Conﬁguration.

case. As we assume that this happens in the presence

of the patient during a consultation, the patient may

express the consent to make it available to the doc-

tor. The item used to control the access is the patient’s

health related smartcard. The data records concerning

a given patient are stored in the database along with

the “ﬁngerprint” generated from the data on his/her

smartcard. This ﬁngerprint marks the data ownership

and protects them from an unauthorized access. The

same patient’s card may be used to unblock the ac-

cess. The patient expresses the consent to use his/her

data by allowing the physician to use his/her smart-

card. No other items are necessary. This solution is

practical and provides simultaneously an adequate se-

curity level. This protocol is the main subject of this

paper, along with the presentation of the necessary en-

vironment and data structures.

As the smartcards play the essential role in our

protocol, we assume their widespread use by the pa-

tients and by the health professionals. Currently,

in many countries such cards are being deployed:

European Health Insurance Card

with the related

project NETC@RDS

, Carte Vitale

in France, Ver-

sichertenkarte

in Switzerland, and many others.

The advantages of the proposed scheme are:

• the patient data are stored anonymously and the

patient nevertheless retains control over it

• the patient needs neither to acquire any new token

nor to remember any new secret

We ﬁrst show the basic scenario that we want to

handle. We then present the published work related to

ec.europa.eu/social/main.jsp?catId=559&langId=en

www.netcards-project.com/web/frontpage

www.sesam-vitale.fr/index.asp

www.bag.admin.ch/themen/krankenversicherung/07060/

the considered case and show what distinguishes our

approach from theirs. We discuss the various risks

faced by such an application. Then, we will expose

the details of the proposed protocol and comment on

the advantages and the problems of the scheme. In the

following, we discuss the related problems not treated

in this paper and ﬁnally suggest the directions for the

future research.

2 BASIC SCENARIO

The environment we consider in this paper is a med-

ical database (Fig. 1). Its goal is evaluative research,

i.e. assessing the efﬁciency of various therapy meth-

ods and devices. The database is centralized and con-

tains homogenous data, because only in this way valid

statistical comparisons can be performed. Medical

data are anonymous, identity of the patients is irrel-

evant. As the patient’s case consists of a sequence

of events scattered over many years, they have to be

internally connected in the database. If the patient ap-

pears for a subsequent consultation, his/her case has

to be retrieved based on the his/her identity. Therefore

a pseudonymisation server (later called module) is

used. On this server the patients as well as the physi-

cians are registered. The same patient may be regis-

tered by many physicians, every relation physician-

patient receives a distinct ID. This permits to de-

limit the data directly accessible to a speciﬁc physi-

cian. On the other hand, the patient him-/herself has

a unique long term pseudonym that permits to con-

nect anonymously all events in his/her medical his-

tory for statistical analysis. When a physician logged

into the system requires to access a patient with use

of his/her personal data (name, etc.), the pseudonymi-

HEALTHINF2014-InternationalConferenceonHealthInformatics

406

sation server responds with tho numbers. One is the

internal ID that denotes the items in the central medi-

cal database created by this physician to which he/she

has a direct access. The other one is the patient’s long

term pseudonym that is not seen by the physician but

can be later used to access other items created by other

physicians. This access depends on the patient’s con-

sent, as implemented in the protocol presented in this

paper.

In practice, the need to treat the same patient may

arise when he/she moves to another city or needs con-

sultations at many physicians: general practitioner

(family doctor) and various specialists. The patient

may be treated by an oncologist and the surgeon,

his/her tissues may be analyzed in a laboratory. At

a certain moment, a doctor has to collect all this in-

formation in order to assess the case and to deﬁne fur-

ther therapy. The patient, trusting the doctor, can al-

low him/her to access the relevant data created by all

specialists.

A registry (with the corresponding pseudonymi-

sation server - module) is organized around a speciﬁc

medical problem, like cancer, orthopedical prosthe-

ses, or else. The central database can support many

distinct modules. We intend not to connect data from

different modules. It could be of value for the medi-

cal research, but would pose new security and privacy

related problems.

Each module uses a speciﬁc function to construct

the long term pseudonym. The principle is always

the same: the module combines a selection of par-

tial identiﬁers, adds salt (speciﬁc for this module) and

computes a hash on this information. The selection of

the partial identiﬁers and the way they are combined

may vary from module to module. For instance, for

one module the social security number birth year and

gender can be used. For another module we can use

last name at birth, ﬁrst name at birth, birth date, city

and country of birth. Each time a patient visits the

same module, the same pseudonym will be computed.

In some very rare cases two patients will receive the

same pseudonym, but only if they have exactly the

same identiﬁers.

Using (secret) salt prohibits the administrators of

the central database to discover the patients’ real iden-

tity.

Until now, we have discussed the long term

pseudonym that is used to connect anonymously the

patient’s data records. Another item, stored along

with the medical data and controlling the access to

them is the ﬁngerprint. The ﬁngerprint is created us-

ing the data stored on the patient’s health smartcard. If

the patient presents the same smartcard to the physi-

cian, the same ﬁngerprint can be generated what is

understood as the consent to access the data given by

the patient present in persona. In following sections

the handling of the ﬁngerprint will be presented.

3 RELATED WORK

As medical information is more and more stored elec-

tronically, studying various aspects of this process

has created a vast research area. Our interest is di-

rected towards privacy and security of medical data,

especially in the interaction of the systems used for

health support (Hospital Information Systems, Elec-

tronic Health Records) and the systems used for med-

ical research (clinical trials repositories, medical reg-

istries).

We will present here recent literature regarding

this subject.

In our article, we do not enter in detail into the

way the pseudonym is created to link all the cases

of the same patient in the database. This subject has

already been covered by other publications. For in-

stance (Elger et al., 2010) present the aspects of the

reuse of health data for clinical research, especially

the anonymization of data and the construction of

pseudonyms. (Wilson, 2005) addresses also the prob-

lem of pseudonymization, suggesting the use of the

PKI smartcards. We opted for the computation of a

hash based on information that remain stable (identity

at birth for instance) and a salt speciﬁc for each study.

Our system is much simpler and does not require any

new token or information from the patient.

It has to be stressed that retrieving and connecting

data is a different problem from controlling the access

to them. Data items have to be marked with the long

term pseudonyms if they have to be treated as a set,

what is a necessity in the case of medical research.

Consent validation is another problem and we use dif-

ferent means (a ﬁngerprint based on the smartcard) to

achieve this goal.

(Kwon, 2011) proposes to use X.509 certiﬁcates

to provide anonymized session identiﬁers that could

be deanonymized under certain circumstances. This

is very near to what we aim, but we concentrate here

on the consent. The anonymity or the way how the

pseudonym is created is out of the scope of our article.

Another related problem is the one stressed by

(Camenisch and Lysyanskaya, 2001). They focus on

anonymous credentials delivered by a central author-

ity to anonymous users. We try to solve the inverse

problem, where an anonymous user (for us a patient)

gives a credential to a central system, while remaining

anonymous.

HowtoCollectConsentforanAnonymousMedicalDatabase

407

4 RISK ASSESSMENT

In this section we will present the different risks for

the application and how we mitigate them.

The ﬁrst risk to consider is the attack from an out-

sider. The protection is done according to OWASP

guide lines and protects mainly against the OWASP

Top 10 ﬂaws

. The details of the protection are not

included in the scope of this article.

An outsider cannot become a legal user of our sys-

tem, since the module administrators verify the iden-

tity of the users registered in their module. Since the

modules are in most of the cases operated by medical

societies, the user is typically a member of such a so-

ciety. This fact should also limit the motivation of the

users (physicians) for misusing their access rights and

disclosing conﬁdential data. The price for misbehav-

ing would be a rejection from the community and an

end of the professional career. It does not make a data

theft impossible but raises substantially the bar for it.

In any case, they cannot browse freely in the database

- they have only access to the data they entered them-

selves or to which they have obtained explicit consent

from the patient.

Administrators of a module do just have access

to the data of the module (i.e. the identity of the

patients). This information is important and can al-

ready be stigmatizing, like being registered in a HIV

database. However, the module administrators do not

have access to the central database, so they do not

know the medical details of the case.

Administrators of the central database have only

access to anonymized data. They cannot infer the

identity of the patients from the hash they have. Even

a dictionary attack is not possible, since they do not

have access to the salt used in the module for comput-

ing the hash.

We propose a system that allows to collect consent

of a patient to share his/her medical data between dif-

ferent physicians without disclosing the identity of the

patient to the system.

The system must rely on a preexisting public key

infrastructure. We can not access the certiﬁcates of

the patients in this PKI, since the certiﬁcates contain

the identity of the patients. We will therefore produce

a ﬁngerprint of this certiﬁcate, that does not reveal it.

The consent to access is considered as given, if the

patient’s certiﬁcate produces the same ﬁngerprint as

was previously stored with the data. In this process,

the identity of the patient remains unknown for the

server.

In a public key infrastructure, the changing and

revocation of keys is always of crucial importance.

www.owasp.org

(Ferguson et al., 2010) is here a good reference and

discusses also other practical aspects of key manage-

ment. Our system does not have the possibility to

handle revocation lists or expiration dates. We pro-

pose therefore a way to update the ﬁngerprint of the

key when the key is changed, e.g. when the card is

lost or renewed. In this process, the physician takes

the role of the trustee. Even if our system does not

trust physicians in general (they should only access to

their own data), we will rely on them for renewing the

ﬁngerprint of the key of their patients. Since this step

is central in our system, we will require the physician

to sign any modiﬁcation in the ﬁngerprint with his/her

health professional’s card. This will allow the admin-

istrators to monitor any misuse of the system and to

react accordingly to protect the data they have in cus-

tody.

5 PROPOSED PROTOCOL

The protocol is separated in two parts. The ﬁrst part

concerns the visit at the physician when the medical

data are collected and stored on the server. The sec-

ond part refers to another visit when the patient al-

lows another health practitioner to access the previ-

ously stored data.

As the medical data are stored, a “ﬁngerprint”

controlling the access is created and stored along with

the data. This ﬁngerprint is a shared secret, based on

the pair of the private keys (of the server and of the pa-

tient) in the Public Key Infrastructure (PKI) scheme.

It does not require the server to know the patient’s cer-

tiﬁcate, and not even to know the patient’s public key.

It can be later activated by the security keys stored

on the patient’s smartcard. It is stored on the server

and the patient just uses his standard card and needs

not to remember or store any new information. The

doctor (or another health professional) plays also an

important role in the process, for example verifying

the identity of a physically present patient. There-

fore he/she is included in our scheme, together with

his/her Health Professional’s Card that can be used to

sign and certify his/her actions.

In the remainder of this article, we make no deeper

analysis of the communication between the physician

and the client on one side and the module on the other

side. The creation of the internal ID and the long term

pseudonym is also out of the scope of this article.

5.1 Enrollment (Fig. 2)

The patient visits a physician and the physician gen-

erates a record that must be inserted in the central

HEALTHINF2014-InternationalConferenceonHealthInformatics

408

Figure 2: Enrollment.

database. He/she has his/her patient’s smartcard and

inserts it into a reader (client device). The commu-

nication between the physician’s computer and the

central server uses a secure channel. Since both the

physician and the central server know each other, this

channel does not require to offer any anonymity. We

use a HTTPS (i.e. TLS) channel for securing the com-

munication that can neither been intercepted nor mod-

iﬁed by a third party.

In our protocol, we will use S to denote the

server containing data and P to denote the patient

(patient’s smartcard). S has a private-public key pair

(Pub

, Priv

) and a certiﬁcate Cert

containing Pub

and signed by a trusted certiﬁcate authority. P has

also a private-public key pair (Pub

, Priv

). The pa-

tient should remain anonymous on the server, there-

fore neither the patient’s certiﬁcate cannot be known

by S nor can Priv

(the public key of P) be known by

the server since it could be used as a unique identiﬁer.

We do not discuss the PKI infrastructure design

for this protocol, we assume it simply exists and is

adequately deployed.

Protocol:

1. S and P create a shared message K using the

Difﬁe-Hellman protocol, so that S and P both

know K

2. P signs the message K and produces K

- the con-

catenation of K and sign

Priv

(K):

= K + sign

Priv

(K)

3. P ﬁrst encrypts K

using Pub

= enc

Pub

)

4. P then encrypts K

using the public key of S

000

= enc

Pub

)

so that the message is encrypted with both keys

5. P sends K

000

to S

6. S decrypts K

000

and gets K

= decrypt

Priv

000

)

7. S stores the pair (K, K

) together with the user’s

data

5.2 Re-Identiﬁcation (Fig. 3)

The patient visits another physician that participates

in this project (i.e. having an access to the server and

its research database) for a consultation or a treat-

ment. The health practitioner indicates the need to

access to already stored data. The patient accepts the

necessity of retrieving the data and gives a consent to

do so. The server S is confronted with a patient P

pretending to be P and in order to accept the consent,

has to verify his/her rights, without revealing his/her

HowtoCollectConsentforanAnonymousMedicalDatabase

409

Figure 3: Re-Identiﬁcation.

real identity. We speak of the re-identiﬁcation of the

anonymous patient.

S starts the second part of the protocol:

1. S sends K

to P

2. if P

is not P, the message can not be decrypted.

Since P

does not know Priv

. If P

= P then the

private key is known, and the message can be de-

crypted.

= decrypt

Priv

)

3. P separates the message in two parts: the message

itself and its signature

K + sig = K

4. P veriﬁes the signature of the message to assert

that the message has not been modiﬁed.

veri f

Pub

(sig, K)

5. If the signature is valid, then P sends back K to S,

encrypted with the public key of S.

M = enc

Pub

(K)

6. If the value received from P is the same as the

value stored for the user, then S accepts the re-

identiﬁcation of P.

test i f (K = decrypt

Priv

(M))

5.3 Re-deployment of the Security Keys

Server key pair (K, K

) can only be used once, oth-

erwise a replay attack would be easily successful.

Therefore at the end of the re-identiﬁcation process,

the two partners will generate a new pair using the

same protocol as in the enrollment.

5.4 Changing the Keys (Fig. 4)

In our protocol we have assumed that the patient’s

smartcard is of critical importance. It is used to cre-

ate the access keys and to verify them in order to ac-

cess the data. In real life such dependence is risky as

the card can be lost, exchanged or upgraded. In such

a case the access to the data would be irreversibly

lost. Therefore we propose a backdoor procedure to

transgress this limitation. Naturally, it is a trade-off

between security and usability.

If the patient for any reason receives a new card,

the health institutions cannot expect to be informed

about it. A hospital will just be confronted with the

situation that the patient does not own anymore the

card that has been used to protect the data. The pa-

tient’s identity should be however reliably veriﬁed to

a seasonable degree with use of other documents, like

a national identity card. In this case the access should

HEALTHINF2014-InternationalConferenceonHealthInformatics

410

Figure 4: Signing by the doctor.

be refreshed by creating new ﬁngerprint based on the

new patient’s card.

The doctor is the sole person verifying the iden-

tity of the patient. So a doctor could use this feature

to gain access to undue records. In order to prevent

abuse, the new ﬁngerprint has to be signed by the doc-

tor with his/her Health Professional’s Card. This card

also contains the private key and a certiﬁcate with the

public key.

Handling of the case of card exchange - as dis-

cussed above - will induce an extension of the en-

rollment procedure. In addition to K and K”, the

server will store the identity of the doctor and the sig-

nature (σ

PrivM

)) created when signing the mes-

sage K” with his/her private key. As during the re-

identiﬁcation a new access key (key pair (K, K

)) is

created, all accesses will be signed by the doctors in-

volved. This means that our database will contain the

certiﬁcates of all participating doctors.

In this process, the doctor plays a role similar to

certiﬁcation authority in the PKI architecture as the

trust in the doctor’s integrity asserts the trust in the

stored access key (ﬁngerprint).

Since the doctor receives more power, the plausi-

bility of the change of the key has to be veriﬁed. For

instance, an alarm should be raised if a physician ac-

cesses a case without entering new data.

5.5 Discussion

The only requirement is that the patient has a X509

compatible card. Technically, it can be a e-Health

card, a national identity card or any other valid card

accepted by the physician. This card has to contain

a private and a public key and a certiﬁcate signed by

a Certiﬁcate Authority. Currently in many developed

countries such card are being deployed. We need no

supplementary features or software to be loaded on

the card. The entire algorithm is implemented on the

database server which is under control of the institu-

tion hosting the medical registry.

The proposed protocol has following important

security features:

• Impossible for someone to register using some-

one else’s identity (re-identiﬁcation is not possi-

ble, since the wrong signature will not be accepted

by P)

• Impossible to send a message to P in order to let

P decrypt it, since P only accepts messages that

were signed by him- or herself.

The scheme has to ensure a reasonable protection

level but also has to be robust in practical situations.

We assume that refusing the access to patient’s own

data is a real threat to the his/her health. Therefore al-

though elevated security standards are important, they

should not be fulﬁlled at any price. We have to con-

sider not only the point of view of a computer security

specialist, but also that of a medical practitioner.

HowtoCollectConsentforanAnonymousMedicalDatabase

411

6 FUTURE WORK

There is a number of problems related to our case that

may be studied more profoundly.

In this paper we propose only a data access pro-

tocol - we do not discuss here how data are actually

stored. We assume as self-evident that a semantic

compatibility of data formats have to be ensured. An

important question is if and how they are encrypted.

If data were readable (or easily decryptable for some

parties), it would be necessary to eliminate identify-

ing information from the data content. This is for ex-

ample the case in the DICOM headers of medical im-

ages, as presented in (Elger et al., 2010).

In the use case described here, the patient gives

consent to access his/her data in the presence of the

doctor. A set of records is retrieved and displayed,

the relevant information is accessed. As long as the

entire set concerns a speciﬁc disease, we may assume

that the doctor can be trusted and can see it all. If the

set covers various diseases, it may be useful to divide

them in groups, possibly of difference conﬁdential-

ity level. If the patients is HIV-positive and visits an

orthopedist, the patient needs to have the freedom to

decide if even the headers of the data records are vis-

ible.

The data accesses have to be logged in order to

prevent and detect the cases of data theft. Also the

failed accesses have to be logged. It is not so much

the case of a malicious patient, because it one failed

trial can happen and it would be difﬁcult for the pa-

tient to try to read many data sets using many forged

cards. On the other hand, a malicious doctor can do it

quietly, not being disturbed.

7 CONCLUSIONS

In this paper we have considered a realistic case of

retrieving valuable medical data stored in an anony-

mous registry with the consent of the patient con-

cerned. The protocol we propose is a trade-off be-

tween security and privacy protection on one hand and

usability on the other hand. If we devise a scheme to

be massively used by patients, we have to remember

that we deal with “common” people, many of them el-

derly, many of them having little experience in using

computers. Therefore we should exclude following

from our design:

• carrying special items

• remembering special secrets, like user-

name/password

• upgrading standard items, like loading Java ap-

plications on the smartcard, especially by the pa-

tients themselves

Our protocol meets these requirements and pro-

vides a practicable solution to be used in the scope

of the existing infrastructure. We do not expect any-

thing special from the patients except that they have

the identity token they normally use.

We have described the use of this protocol in

the medical context, but it can be equally applied in

other situations. We can think about any collection

of anonymously stored data, where a data originator

wants to recall records related to him/her. Moreover,

he/she could trace its secondary use, if such informa-

tion were stored in the collection.

In general, such scheme is useful when a large

amount of anonymous data is collected for an accept-

able goal, and the originator is allowed to retain the

relation with his/her data. The main advantage is a

reasonable privacy protection (and tracing the actions

if the strict rules are loosened) and the simplicity of

deployment.

REFERENCES

Camenisch, J. and Lysyanskaya, A. (2001). An efﬁ-

cient system for non-transferable anonymous creden-

tials with optional anonymity revocation. In Ad-

vances in CryptologyEUROCRYPT 2001, pages 93–

118. Springer.

Elger, B. S., Iavindrasana, J., Lo Iacono, L., M

uller, H., Ro-

duit, N., Summers, P., and Wright, J. (2010). Strate-

gies for health data exchange for secondary, cross-

institutional clinical research. Computer methods and

programs in biomedicine, 99(3):230–251.

Ferguson, N., Schneier, B., and Kohno, T. (2010). Cryptog-

raphy Engineering: Design Principles and Practical

Applications. Wiley.

Gliklich, R. E. and Dreyer, N. A., editors (2010). Registries

for Evaluating Patient Outcomes: A User’s Guide.

Outcome Sciences, Inc., AHRQ Publication No.10-

EHC049.

Kwon, T. (2011). Privacy preservation with x. 509 standard

certiﬁcates. Information Sciences, 181(13):2906–

2921.

Wilson, S. (2005). A novel application of pki smartcards to

anonymise health identiﬁers. In AusCERT Asia Paciﬁc

Information Technology Security Conference Refereed

R&D Stream, page 64.

HEALTHINF2014-InternationalConferenceonHealthInformatics

412