Towards a Privacy Compliant Cloud Architecture for Natural Language

Processing Platforms

Matthias Blohm

, Claudia Dukino

, Maximilien Kintz

, Monika Kochanowski

, Falko Koetter

and Thomas Renner

University of Stuttgart IAT, Institute of Human Factors and Technology Management, Germany

Fraunhofer IAO, Fraunhofer Institute for Industrial Engineering IAO, Germany

monika.kochanowski@iao.fraunhofer.de, falko.koetter@iao.fraunhofer.de, thomas.renner@iao.fraunhofer.de

Keywords:

Natural Language Processing, Artiﬁcial Intelligence, Cloud Platform, GDPR, Compliance, Anonymization.

Abstract:

Natural language processing in combination with advances in artiﬁcial intelligence is on the rise. However,

compliance constraints while handling personal data in many types of documents hinder various application

scenarios. We describe the challenges of working with personal and particularly sensitive data in practice

with three different use cases. We present the anonymization bootstrap challenge in creating a prototype in

a cloud environment. Finally, we outline an architecture for privacy compliant AI cloud applications and an

anonymization tool. With these preliminary results, we describe future work in bridging privacy and AI.

1 INTRODUCTION

Natural language processing (NLP) is on its rise.

Researchers all over the scientiﬁc landscape investi-

gate manifold real world applications. However, in

these application scenarios the General Data Protec-

tion Regulation (European Union, 2016) is conceived

as a major challenge in NLP. This is, because in con-

trast to tabular data, anonymization by aggregation is

not possible for natural language text, as shown in

Figure 1. Furthermore, pseudonymization methods

can cause information loss.

These issues are all the more crucial when cloud-

based solutions are considered. In order to make

automated text analysis widely available, to share

knowledge across stakeholders and to reduce tag-

ging workload, cloud-based text analysis platforms

are a promising solution. However, working with

GDPR-relevant data in the cloud is particularly dif-

ﬁcult. Thus, the need for ways of taking advantages

of cloud solutions while remaining GDPR-compliant

increases.

A solution for automatically dealing with GDPR

relevant data especially in natural language docu-

ments is often missing. Therefore, anonymization

and pseudonymization is done manually. A promis-

ing idea is to use artiﬁcial intelligence (AI) / ma-

chine learning (ML) for anonymizing natural lan-

guage documents - however, to train this artiﬁcial

intelligence, non-anonymized and anonymized docu-

ments are needed. To get around this problem, several

options are possible.

This paper is structured as follows. Section 2 de-

scribes related work on the topics of natural language

processing, anonymization and pseudonymization as

well as platforms. Section 3 describes three exist-

ing application scenarios - court decisions, healthcare

and insurance fraud. Based on these application sce-

narios, a central research question is derived in Sec-

tion 4. To answer this question, section 5 outlines

a solution architecture for GDPR-compliant, semi-

automated document anonymization as well as an in-

progress prototype. Finally, Section 6 summarizes the

work and gives an outlook on research-in-progress.

2 RELATED WORK

We describe related work in three areas: (1)

NLP in GDPR context and (2) anonymization and

pseudonymization by artiﬁcial intelligence as well as

(3) platform solutions for NLP.

(1) Currently the possible slowdown of Europe’s

innovation progress especially in the ﬁeld of Text and

Data Mining (TDM) due to restrictive laws of data

protection and privacy is an important issue in pub-

lic discussions (European Comission, 2014). Since

454

Blohm, M., Dukino, C., Kintz, M., Kochanowski, M., Koetter, F. and Renner, T.

Towards a Privacy Compliant Cloud Architecture for Natural Language Processing Platforms.

DOI: 10.5220/0007746204540461

In Proceedings of the 21st International Conference on Enterprise Information Systems (ICEIS 2019), pages 454-461

ISBN: 978-989-758-372-8

Seite 1

vertraulich

Anonymisierung und aggregation

Anonymisierung von natürlichsprachlichen Daten zur Hebung der Potenziale

Structured data

Anonymized

structured data

Aggregated

structured data

Anonymized

aggregated

structured data

Information

loss

Information

loss

Unstructured

natural language

data (text, speech)

Information

loss

Data

extraction

Anonymization

Aggregation –

not an option

Aggregation

Anonymized

unstructured

natural language

data

Anonymization

Information

loss

Information

loss

Aggregation

Information

loss

Anonymization

Figure 1: Natural language data cannot be anonymized by aggregation before working with it, as it is done e.g. with tabular

data in sensitive contexts.

the introduction of the GDPR in Europe, some con-

siderations have been made about its compliance with

purposes of artiﬁcial intelligence. On one hand, new

restrictions of data privacy indeed complicate the data

acquisition for machine learning tasks. On the other

hand, data protection laws may also encourage a fairer

and more transparent processing of personal data (Ka-

marinou et al., 2016). Popular software that was

trained with the means of machine learning to auto-

matically identify and protect sensitive personal data

is for example given with Amazon Macie or Google

DLP (Marko, 2017).

(2) The importance of novel anonymization and

pseudonymization techniques is underlined by pro-

claimed challenges such as the NLP Shared Tasks an-

nounced by i2b2, where often one of the tasks was

to de-identify personal data in clinical reports (i2b2

Informatics for Integrating Biology & the Bedside,

2019). In 2014’s challenge the University of Notting-

ham achieved the highest f1-score of 93.6% correctly

recognized entities by combining machine learn-

ing and rule-based techniques (Yang and Garibaldi,

2015). For tackling the problem of de-identiﬁcation,

a common way is to rely on named entity recogni-

tion (NER) for detecting sensitive information even

in larger unstructured text documents (Vincze and

Farkas, 2014). A promising approach could be to

combine the results of several different entity recog-

nizers with coreference resolution processing in or-

der to ﬁnd and replace a maximum of entities such as

proper names, places or dates, while maintaining full

meaning in the document context (Dias, 2016).

(3) In the business domain, countless scenarios are

thinkable in which companies could beneﬁt from us-

ing AI, for example for supporting classiﬁcation and

decision tasks as automatic customer claim handling

(Coussement and den Poel, 2008; Yang et al., 2018).

Nowadays several providers like Aylien (AYLIEN,

2019) already offer AI platforms for natural language

processing as a self-service. Here customers can build

and train individual models for textual processing

without the need of any programming skills. How-

ever, sending sensitive data to cloud servers is still a

critical issue to deal with when using those platforms.

Therefore, some providers like Lexalytics (Lexalytics,

2019) also offer on premise solutions of their software

which can be installed and run only locally on internal

hardware.

Altogether, machine learning has been shown

as applicable for improving anonymization or

pseudonymization. Many state of the art approaches

exist therefore. However, for being able to process

text documents in a cloud environment, a practicable

solution for training these algorithms without the need

of on premise solutions in a multi-party environment

is necessary. To the best of our knowledge, a com-

plete solution for this task has not yet been described.

We formulate the research question in Section 4.

3 APPLICATION SCENARIOS

We describe three application scenarios: court deci-

sions, healthcare and fraud detection, having in com-

mon: (1) personal data is included all of the time, (2)

particularly sensitive data is included often, and (3)

high potential for machine learning in textual docu-

ments is given.

3.1 Court Decisions

In Germany, court decisions generally have to be

made available to the public upon request. However,

to protect the privacy of the parties involved, the ju-

dicial decisions must be anonymized prior to publi-

cation. Especially in criminal and family law, court

decisions often contain sensitive data, e.g. the biogra-

phy of the accused, or private details of family life.

Important court decisions are published by the

courts on their own accords. Other court decisions are

requested for an administrative fee. While case law is

generally not as important in Germany as in other ju-

risdictions like the USA, requests for court decisions

are increasing.

Towards a Privacy Compliant Cloud Architecture for Natural Language Processing Platforms

455

Currently, court decisions are anonymized man-

ually by judges or clerks, resulting in a consider-

able time investment for these highly skilled workers,

which could be used elsewhere.

3.2 Healthcare

The healthcare sector is one of the most highly reg-

ulated sectors in respect to data protection, as most

documents contain sensitive data of patients.

The healthcare sector is under pressure by ris-

ing healthcare costs, an aging populace, a shortage

of physicians, as well as comprehensive documenta-

tion requirements (Meinel and Koppenhagen, 2015).

While IT is widely used in areas like diagnostics and

robotics, adoption of cloud applications is slow (Lux

et al., 2017). One reason for this is the challenge to

comply with data protection laws. On the other hand,

many scenarios could proﬁt from sharing anonymized

documents, ranging from standard services like trans-

lation of medical instructions into a patient’s native

language to cooperative diagnosis and medical re-

search.

One challenge in the healthcare area is that not

only directly identifying data (e.g. name, address),

but also indirectly identifying data (e.g. combination

of symptoms, rare diseases) has to be removed. Deter-

mining what data is indirectly identifying requires ex-

pert expertise. How and if such a determination could

be performed automatically is an open research ques-

tion.

3.3 Fraud Detection

Undetected insurance fraud costs insurers billions of

dollars every year (Power and Power, 2015). To coun-

teract these losses, insurance companies try to detect

fraud before payments are made. Conventional fraud

detection relies on manual work as well as IT solu-

tions, which perform a rule-based analysis on a claim.

These rules are created and maintained by domain ex-

perts and focus on structured data that is known about

the claim. Unstructured documents and images are

typically investigated manually.

As a decision problem fraud detection could pos-

sibly be improved by applying ML. Depending on the

type of insurance, fraud rates are claimed by insur-

ance companies to be as high as 50 percent (smart-

phone insurance). However, it can be assumed that

not all fraudulent claims are detected as such. For ex-

ample, a claim may be abandoned by a claimant if

additional questions are asked, making it unclear why

no payment took place.

Insurance companies could improve fraud detec-

tion by sharing anonymized claim data in order to

build a communal AI (Power and Power, 2015).

Data protection laws necessitate anonymization or

pseudonymization of this data. This concerns not only

personal data, but also image ﬁles (e.g. license plates

on damaged cars).

4 RESEARCH QUESTION

Cloud computing, big data and artiﬁcial intelligence

make many new application scenarios possible. The

current public dialogue about artiﬁcial intelligence

and the digital transformation have made many or-

ganizations aware of these new possibilities (IDC,

2018). On the other hand, organizations have been

sensitized to privacy concerns by the public dialogue

about the GDPR.

This creates a perceived conﬂict between techni-

cal possibilities and legal requirements. In our work

with organizations in research and industry projects

we found data protection concerns to be the great-

est perceived challenge to overcome. In an ongo-

ing Fraunhofer survey of over 200 German organi-

zations, data protection was named the greatest chal-

lenge when using AI

. As new machine learning al-

gorithms are tailored for large amounts of data, ques-

tions of data protection need to be solved before build-

ing even an exploratory prototype. For tabular data,

aggregating data for ensuring privacy may be an op-

tion. However, for textual data this is not possible.

It is possible to extract structured data from text and

then apply machine learning - however, text process-

ing relies on more information than just the struc-

tured contents of the text documents. Therefore, the

information loss by working in this way is not ac-

ceptable for most natural language processing ma-

chine learning scenarios. It is therefore necessary to

work with the original documents and to anonymize

or pseudonymize these.

Additionally, ﬁnding the personal and particularly

sensitive data is a challenge. Table 1 shows how good

various state-of-the-art methods work for identifying

text with known and unknown formats and value sets.

If the format is known, like for example license plates,

ﬁnding the entity and anonymization is easier than if

the format is unknown. Dates of birth can be for ex-

ample contained in various forms in a document. If

the format is known, like an e-mail-address, it is easy

At the time of review, this survey is still open for par-

ticipation. In the camera ready paper, we will update this

sentence with the ﬁnal results. The survey can be found at:

https://www.befragung.iao.fraunhofer.de/index.php/568823

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

456

Table 1: Methods for anonymizing data in text documents.

method

difﬁculty

examples

Format Known Format Unknown

Valueset Known

reference list

very easy

invoiceId...

AI / ML

medium

birth date, priority,...

Valueset Unknown

rules, regular expressions

easy

e-mail, IBAN,...

AI / ML

hard

political opinion,...

Theoretical

option

Complicated

legal

solution

Interesting application

scenario with textual data

for artificial intelligence

(e.g. machine learning)

Personal data is contained.

One party provides the

data, the other processes it.

Particularly sensitive data is

contained or might be

contained in a small

fraction of documents.

Pseudonymization or

anonymization is done

manually for machine

learning.

Great prototype

Contract for processing

personal data (simple) in

1:1 scenario

Additional: information for

all persons which are

affected => (usually leads

to no prototype)

personal

data

Legal

solution

Contract for processing

personal data in a 1:n

scenario (rarely done in

practice)

Nearly not possible

Nearly not

possible

Bootstrap

Challenge

Figure 2: Alternatives and possible outcomes for prototype development with GDPR relevant data.

to identify and anonymize data, even with unknown

datasets. This is extremely useful for personal data

like IBANs and social security numbers. However,

in the ﬁeld of particularly sensitive data, like ethnical

background, sexual orientation, or political afﬁliation,

the format as well as the values are unknown. This

makes ﬁnding this information a very difﬁcult task,

making it hard to anonymize as well. Additionally,

machine learning algorithms need lots of data to han-

dle this kind of difﬁcult questions. Finally, this makes

machine learning algorithms the most promising ap-

proach for solving this problem.

However, solving this in machine learning gives

the need for a prototype. Figure 2 shows possible ap-

proaches for implementing prototypes while remain-

ing GDPR compliant.

A solution for data processing is possible, as long

as consent was received by the data provider, no sen-

sitive data is contained, and a contract for data pro-

cessing taking into account GDPR is made.

Special care needs to be taken when sensitive data

(e.g. health data, information about racial or ethnical

background) might be contained in the documents to

be used. While the GDPR allows an exception for

using sensitive data in research, processing it requires

explicit consent of all affected persons. As projects

with real companies rely on a large, existing volume

of data, this is not feasible, as all affected customers

would need to explicitly give consent.

As an additional challenge, most organizations,

especially small and medium enterprises, lack the

skills and data volume to realize AI projects in-house,

so they are dependent on service providers or coop-

eration to pool data. In this case, a contract between

multiple parties would be necessary, complicating a

possible legal solution and making GDPR compliance

questionable.

As an alternative, the GDPR allows the

anonymization of documents. Once a dataset

has been anonymized and individuals are no longer

identiﬁable, GDPR no longer applies. The solution in

existing projects was manual tagging and anonymiza-

tion of large volumes of documents. This work needs

to be performed in-house, as outsourcing it would

present a compliance violation as well. While this

is possible if companies receive a research grant,

Towards a Privacy Compliant Cloud Architecture for Natural Language Processing Platforms

457

Seite 17

vertraulich

Clients (Data Suppliers)

Pseudonymization

workstation (GUI)

Original

Data

Automated

pseudonymization

Lethe Plattform

Daten

Server (Federated anonymization service)

Pseudony.

data pool

Optimize pseudonymization /

anonymization

(machine learning)

Trained model

Privacy

Metadata

Automated

anonymization

Trained model

Anonymization

verification

Server (Federated Application)

Anonym.

data pool

Anonymization

verification

Application (e.g.

machine learning,

web service)

Pseudony.

data

Meta

data

Boot-

strap

cycle

Figure 3: Architecture outline for federated pseudonymization and anonymization for NLP and AI. Private data remains on

data supplier’s systems. A trained model for pseudonymization and anonymization is iteratively developed in a bootstrap

cycle. Better and better automation is used to pseudonymize larger and larger amounts of private data, resulting in a ﬁnal

model used for anonymization. Anonymization is veriﬁed with the private data tagged during pseudonymization.

generally it is not economically feasible.

The alternatives shown in Figure 2 represent a

challenge for applied research into AI and NLP. In

preliminary talks with companies interested in re-

search participation, we found the manual effort for

anonymization on the scope AI requires to be a deal-

breaker.

Thus, we formulated a preliminary research

question: To conduct our AI research, we need to

develop tools and methods to aid in anonymization

of documents. Possible approaches to anonymization

that come to mind are of course AI and NLP. This cre-

ates a bootstrap challenge, as to optimize and cus-

tomize anonymization methods for a certain class of

documents, access to these documents is necessary.

How can this bootstrap challenge be solved within

a cloud environment with software tools to lessen

manual effort for anonymization while maintaining

compliance with GDPR?

5 ARCHITECTURE AND

PROTOTYPE

To solve the bootstrap challenge, we outlined a

software solution for federated data pseudonymiza-

tion and anonymization, of which the architecture is

shown in Figure 3.

This solution uses iterative pseudonymization of

documents in order to train a domain-speciﬁc model

for anonymization. The reason to use pseudonymiza-

tion ﬁrst is to provide pseudo-non-anonymized doc-

uments to a service provider without giving the

provider real documents.

Pseudonymization is performed both automati-

cally and manually. In a ﬁrst step, a small sam-

ple of available documents are pseudonymized with

a generic pseudonymization algorithm, pseudonymiz-

ing common data items like e-mail addresses, names

and phone numbers. This small sample is manu-

ally reviewed on a pseudonymization workstation,

which allows correcting and amending the automated

pseudonymization, e.g. by tagging personal data

that was not automatically pseudonimized and by de-

tagging false positives.

The result of this pseudonymization is a set of

pseudonymized documents as well as a set of privacy

metadata (i.e. private data items and their positions in

documents). The pseudonymized documents as well

as reduced privacy metadata (indicating position, but

not private data) are sent to a federated anonymiza-

tion service. If multiple organizations want to pool

their data, this step is performed independently by ev-

ery data provider.

Based on the received pseudonymized data, the

anonymization service can improve the generic

model, either using machine learning or by manu-

ally extending the model. The methods for extraction

types of data have been discussed in Section 4 and are

shown in an overview in Table 1.

This process is repeated with the improved model.

This time, a different, possibly sample of documents

is used. If the improvement was successful, the man-

ual pseudonymization effort should be reduced. With

the additional data, the trained model can be further

improved. Iteration in this bootstrap cycle continues

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

458

Figure 4: Screenshot of entity recognition, automatic anonymization and manual correction workbench (working name Tex-

tominado). The software allows to automatically detect several kinds of entities, for example names, addresses, license plates,

and to replace them with pseudonymes or just removing them preserving length or not. Focus is the interaction of automatic

detection and anonymization as well as manual corrections, which can be used for machine learning in the next step.

until a model of acceptable quality is obtained.

With this model, anonymization can be per-

formed. Compared to pseudonymization, private data

is not substituted by equivalent data, but excised.

Anonymization is performed on the whole set of

original data, and is veriﬁed automatically using the

privacy metadata obtained during pseudonymization.

Depending on the sensitivity, random spot checks

could be performed as well.

The anonymized data may then be passed to an

application provider for use in the actual business case

(e.g. court decision database, medical research, fraud

detection). The application provider can use the same

anonymization veriﬁcation component to exercise due

diligence after receiving data.

To aid in future data acquisition in our research

projects, we plan to fully implement this software so-

lution. As an immediate solution, we partially imple-

mented this approach. The platform Textominado al-

lows manual and extensible automatic pseudonymiza-

tion/anonymization to create data for machine learn-

ing that is compliant with the GDPR.

Textominado’s ﬂexible architecture consists of a

Java Spring backend with loosely coupled modules

that allows quickly adding and modifying of service

endpoints. This way, we are able to integrate any

kind of library for providing analysis functions like

entity recognizers that help locate sensitive personal

data in unstructured text documents. New endpoints

are registered automatically in the frontend with their

corresponding URL and can directly be applied to

an input text together with any other tool provided

this way. Furthermore, the UI that we built using

nodejs and react supports easy manual tagging of

additional crucial entities, which may have not been

detected by the analysis tools. Finally, with the built-

in pseudonymization/anonymization feature, which

is shown in Figure 4 we are able to create different

kinds of meaningful pseudonyms for each category of

tagged entity. We are also experimenting with coref-

erence resolution in order to prevent a greater loss

of information by keeping the replacement of words

consistent throughout the whole textual contents.

With the introduction of Textominado we took a

ﬁrst step towards offering a powerful platform that fa-

cilitates customized creation of anonymized data for

Towards a Privacy Compliant Cloud Architecture for Natural Language Processing Platforms

459

machine learning purposes. The pseudonymization

can be improved and extended by implementing cus-

tom libraries, but this is a manual process. Our goal is

to automate improvement processes of entity recog-

nition and anonymization by including self-learning

components, that are able to improve the automated

detections based on the users’ manual corrections on

the results of the analysis tools, thus realizing the

bootstrapping cycle outlined above.

6 CONCLUSION AND OUTLOOK

In this work we have described the challenges of per-

forming applied research in NLP and AI with real un-

structured data while remaining compliant with the

GDPR and other data protection laws.

We have shown the need to anonymize and share

data in three application areas: court decision, health-

care and insurance fraud detection. From practical ex-

perience in research projects, we have outlined chal-

lenges and possible solutions for obtaining data to

develop research prototypes. Based on these experi-

ences, we have deﬁned a bootstrap challenge: AI and

NLP can be used to automate data anonymization for

research, but anonymized data is needed to create AI

and NLP anonymization solutions in the ﬁrst place.

The resulting research questions is how to solve this

bootstrap problem while lessening manual effort for

anonymization.

We have outlined a possible solution architec-

ture, which incrementally improves domain-speciﬁc

pseudonymization in a bootstrap cycle, thus solving

the bootstrap challenge, and shown Textominado, a

prototype for pseudonymization and anonymization

of unstructured documents.

Since the contents discussed in this paper are still

ongoing research, no evaluation has been done for our

prototype yet. But talking to different companies and

public organizations revealed that there is indeed a big

need for practicable ways of anonymizing unstruc-

tured textual data. In future research, we plan to use

Textominado to acquire anonymized data from real-

world organizations for use in AI and NLP research

projects. In this process, we plan to extend Textomi-

nado in order to implement the outlined solution ar-

chitecture and investigate its feasibility.

ACKNOWLEDGEMENTS

This work was partly supported by the project Smar-

tAIwork, which is funded by the Federal Ministry of

Education and Research (BMBF) under the funding

number 02L17B00ff. We like to thank our contacts at

the court and in the healthcare and insurance industry

as well as the students working in student projects for

their efforts.

REFERENCES

AYLIEN (2019). Text analysis platform — custom nlp

models. https://aylien.com/text-analysis-platform/.

Coussement, K. and den Poel, D. V. (2008). Improving

customer complaint management by automatic email

classiﬁcation using linguistic style features as predic-

tors. Decision Support Systems, 44(4):870 – 882.

Dias, F. M. C. (2016). Multilingual automated text

anonymization. Master’s thesis, Instituto Superior

ecnico, Lisboa.

European Comission (2014). Text and data mining - report

from the expert group.

European Union (2016). Regulation (EU) 2016/679 of the

European Parliament and of the Council of 27 April

2016 on the protection of natural persons with re-

gard to the processing of personal data and on the

free movement of such data, and repealing Directive

95/46/EC (General Data Protection Regulation).

i2b2 Informatics for Integrating Biology & the Bedside

(2019). 2016 cegs n-grid shared-tasks and workshop

on challenges in natural language processing for clin-

ical data. https://www.i2b2.org/NLP/.

IDC (2018). Multi-Client-Studie K

unstliche Intel-

ligenz und Machine Learning in Deutschland

2018. https://idc.de/de/research/multi-client-

projekte/kunstliche-intelligenz-und-machine-

learning-in-deutschland-die-nachste-stufe-der-

datenrevolution/kunstliche-intelligenz-und-machine-

learning-in-deutschland-projektergebnisse.

Kamarinou, D., Millard, C., and Singh, J. (2016). Machine

learning with personal data. In Queen Mary School

of Law Legal Studies Research Paper No. 247/2016.

SSRN.

Lexalytics (2019). Salience 6, lexalytics state of the art nat-

ural language processing engine on your own hard-

ware. https://www.lexalytics.com/salience/server.

Lux, T., Breil, B., D

orries, M., Gensorowsky, D., Greiner,

W., Pfeiffer, D., Rebitschek, F. G., Gigerenzer, G., and

Wagner, G. G. (2017). Healthcare — between privacy

and state-of-the-art medical technology. Wirtschafts-

dienst, 97(10).

Marko, K. (2017). Using machine in-

telligence to protect sensitive data.

https://diginomica.com/2017/08/24/using-machine-

intelligence-protect-sensitive-data/.

Meinel, C. and Koppenhagen, N. (2015). Thesen-

papier zum Schwerpunktthema Smart Data im

Gesundheitswesen (in German). https://www.digitale-

technologien.de/DT/Redaktion/DE/Downloads/Publi-

kation/Smart Data Thesenpapier SmartData Gesund-

heitswesen.html.

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

460

Power, D. J. and Power, M. L. (2015). Sharing and analyz-

ing data to reduce insurance fraud. In MWAIS 2015

Proceedings.

Vincze, V. and Farkas, R. (2014). De-identiﬁcation in nat-

ural language processing. In 2014 37th International

Convention on Information and Communication Tech-

nology, Electronics and Microelectronics (MIPRO),

pages 1300–1303.

Yang, H. and Garibaldi, J. M. (2015). Automatic detec-

tion of protected health information from clinic nar-

ratives. Journal of Biomedical Informatics, 58:S30 –

S38. Proceedings of the 2014 i2b2/UTHealth Shared-

Tasks and Workshop on Challenges in Natural Lan-

guage Processing for Clinical Data.

Yang, Y., Xu, D.-L., Yang, J.-B., and Chen, Y.-W.

(2018). An evidential reasoning-based decision sup-

port system for handling customer complaints in mo-

bile telecommunications. Knowledge-Based Systems,

162:202 – 210. Special Issue on intelligent decision-

making and consensus under uncertainty in inconsis-

tent and dynamic environments.

Towards a Privacy Compliant Cloud Architecture for Natural Language Processing Platforms

461