clear which open data model can be used to reduce
the risk on open data privacy violations. An open data
model is needed that helps making decisions on
opening data and that provides insight in whether the
data may violate users’ privacy.
The objective of this paper is to propose a model
to analyse privacy violation risks of publishing open
data. To do so, a new set of what are called open data
attributes is proposed. Open data attributes reflect
privacy risks versus benefits trade-offs associated
with the expected use scenarios of the data to be open.
Further, these attributes are evaluated using a
decision engine to a privacy risk indicator (PRI) and
a privacy risk mitigation measure (PRMM). In
particular this can help to determine whether to open
data or keep it closed.
This paper is organized as follows. Section 2
discusses related work while section 3 presents
privacy violation risks associated with open data,
followed by section 4 which introduces the proposed
model. The model helps identifying the risks and
highlights possible alternatives to reduce these risks.
Section 5 exemplifies the model by providing some
use cases and preliminary results. Section 6 discusses
the key findings and concludes the paper.
2 RELATED WORK
Public bodies are considered the biggest creators of
data in the society in what is known as public data.
Public data may range from data on procurement
opportunities, weather, traffic, tourist, energy
consumption, crime statistics, to data about policies
and businesses (Janssen and van den Hoven 2015).
Data can be classified into different levels of
confidentiality, including confidential, restricted,
internal use and public (ISO27001 2013). We
consider public data that has no relation with data
about citizens as outside the scope of this work.
Anonymized data about citizens can be shared to
understand societal problems, such as crime or
diseases. An example of citizen data is the sharing of
patient data to initiate collaboration among health
providers which is expected to be beneficial to the
patient and researchers. The highly expected benefits
behind this data sharing are the improved
understanding of specific diseases and hence
allowing for better treatments. It can also help
practitioners to become more efficient. For example,
a general practitioner can quickly diagnose and
prescribe medicines. Nevertheless, this sharing of
patients’ information should be done according to
data protection policies and privacy regulations.
A variety of Data Protection Directives has been
created and implemented. Based on the Data
Protection Directive of 1995 (European Parliament
and the Council of the European Union 1995), a
comprehensive reform of data protection rules in the
European Union was proposed by the European
Commission (2012). Also the Organization for
Economic Co-operation and Development has
developed Privacy Principles (OECD, 2008),
including principles such as “There should be limits
to the collection of personal data” and “Personal data
should not be disclosed, made available or otherwise
used for purposes other than those specified in
accordance with Paragraph 9 except: a) with the
consent of the data subject; or b) by the authority of
law.” In addition, the ISO/IEC 29100 standard has
defined 11 privacy principles (ISO/IEC-29100 2011).
Nowadays a relatively new approach for privacy
protection called privacy-by-design has received
attention of much organization such as the European
Network and Information Security Agency (ENISA).
Privacy-by-Design suggests integrating privacy
requirements into the design specifications of
systems, business practices, and physical
infrastructures (Hustinx 2010). In the ideal situation
data is collected in such a way that privacy cannot be
violated.
The Data Protection Directives are often defined
on a high level of abstraction, and provide limited
guidelines for translating the directives to practice.
Despite the developed Data Protection Directives and
other data protection policies, organizations still risk
privacy violations when publishing open data. In the
following sections we elaborate on the main risks of
privacy violation associated with open data.
A number of information security standards were
estalished to achieve effective information security
governance, among which are ISO (2013), COBIT5
and NIST (2016). Most work on privacy risk
assessment aim to conduct surveys or questionnaires
that assess companies’ ways of dealing with personal
data according to regulatory frameworks and moral or
ethical values. When it comes to open data, such
frameworks to assess privacy risks cannot be used
since the data to be published will contain no
identifying information as a pre-requisite by the law.
Having said that, normal ways of assessing privacy
risks cannot be applied and new ways are needed that
outweigh the benefits of sharing the data compared to
expected privacy risks of the leakage of personally
identifiable information.
Opening More Data - A New Privacy Risk Scoring Model for Open Data
147