antee of the viability of BD tools in a society in-
creasingly concerned about data protection (Brkan,
2019; Casanovas et al., 2017; Huth et al., 2019). The
challenge is to conciliate Personal Data Regulations
(PDR) and BD mechanisms, mitigating friction be-
tween companies and governments.
In this paper, we investigate an important tool
for the compliance of BD mechanisms with PDR:
anonymisation
1
techniques. These are important be-
cause once anonymised, these data are exempt from
the requirements of PDR, including the principle of
“data minimisation” (Regulation, 2018).
To guide this work, we present the BACK-
GROUND exploring the limits of expectations placed
upon this tool. The question is whether anonymisa-
tion used exclusively can meet the demands of the two
apparently opposing systems, in example, demands
presented by both PDR and the BD. The justification,
about the choice of the problem in focus, is speci-
fied by pointing the difficulties of conceptualizing the
term. In this moment, an overview of the academy’s
work in the area is presented. We strive to counter-
balance the advantages and risks of using anonymi-
sation as a form of compliance. We raised the hy-
pothesis that, although anonymisation is an important
tool to increase data protection, it needs to be used
with assistance from other mechanisms developed by
compliance-oriented governance.
The main goal is to present anonymisation risks in
order to promote better use of this tool to privacy pro-
tections and BD demands. In section Related Work,
we raise the main bibliographical references for the
subject. We point out as a research method the liter-
ature review and the study of a hypothetical case. In
section Related Work, we bring the results obtained
so far, which been compared, when we bring a brief
discussion about areas prominence and limitations of
this work.
We conclude that it is not possible to complete BD
compliance with PDR and privacy protection exclu-
sively by anonymisation tools (Brasher, 2018; Ryan
and Brinkley, 2017; Casanovas et al., 2017; Ventura
and Coeli, 2018; Domingo-Ferrer, 2019). To solve
this problem we aim to conduct future research about
frameworks that can promote good practices that, as-
sociated with anonymisation mechanisms, can secure
data protection in BD environments.
1
The term is spelled with two variants: “anonymisa-
tion”, used in the European context; or “anonymization”
used in the US context. We adopt in this article the Euro-
pean variant because the work uses the GDPR (Regulation,
2018) as reference.
2 BACKGROUND
Many organizations have considered anonymisation
through BD to be the miraculous solution that will
solve all data protection and privacy issues. This be-
lief, which has been codified in European and Brazil-
ian regulations, undermines an efficient review of or-
ganisations’ data protection processes and policies
(Brasher, 2018; Dalla Favera and da Silva, 2016;
Ryan and Brinkley, 2017; Casanovas et al., 2017;
Popovich et al., 2017). For this, in this work we in-
vestigate the following research question:
RQ.1 Is anonymisation sufficient to conciliate Big
Data compliance with Personal Data Regulations
and data privacy at large?
In order to answer RQ.1, the concept of anonymi-
sation, its mechanisms, and legal treatment must be
highlighted. The text preceding the articles of Reg-
ulation, pertaining to the European Economic Area,
guides the anonymisation is point 26 (Regulation,
2018). It states that the “principles of data protection
should apply to any information concerning an iden-
tified or identifiable natural person”. Therefore, the
principles are not applied to anonymous data, namely,
“to personal data rendered anonymous in such a man-
ner that the data subject is not or no longer identifi-
able.”
The LGPD contains a similar exclusion in its Ar-
ticle 12 (da Rep
´
ublica, 2018). Regulators conclude
that, once anonymous, information cannot violate pri-
vacy, because data can no longer be linked to an iden-
tified or identifiable person. However, this premise
implies some challenges. First, the data can be con-
sidered personal even though it is not possible to
know the name of the person to whom the data refers.
This is because the name is just one way to identify
a person, which makes it possible to re-identify data
when a personal, nameless profile is provided.
Second, in a BD context, precisely because it
deals with massive data, connecting information be-
comes extremely easy, even when it comes to meta-
data or data fragments. Thus, some easy anonymi-
sation techniques, such as masking, can be effec-
tive in closed and smaller databases, but not in BD
(Pomares-Quimbaya et al., 2019).
Besides, techniques such as inference are more
easily applicable in BD contexts. Inference is one of
the techniques where information, although not ex-
plicit, can be assumed through the available data. An-
alyzing the propositional logic, we can say that there
is inference when three propositions A, B and C re-
spect the following equations:
A ⇒ B (1)
ICEIS 2020 - 22nd International Conference on Enterprise Information Systems
32