Pnueli, 1978; De and Le M
´
etayer, 2016)). However,
none of these approaches is immediately suited for
dealing with GDPR compliance. Especially the char-
acterization of data being personal has not yet been
incorporated into any of these efforts. This also stems
from the fact that this characterization itself is a chal-
lenging one, as will be explained next.
4.1 The Problem of Determining
Sensitivity of Personal Data
Imagine a system that utilizes a customer’s name
and taxation ID (where we assume the latter to be a
government-issued, unique identifier of each citizen,
such as the Danish CPR numbers (Pedersen et al.,
2006)). Both data items clearly are of personal na-
ture, as both serve to identify the person behind the
data easily. Intuitively, the taxation ID is even “more
personal” than the name, as it uniquely identifies ex-
actly one human being, whereas a name (as a string
of characters) could be shared by many individuals of
the same name, and therefore is not unique.
Without a doubt, the taxation ID is also a sensitive
personal information. When knowing the taxation ID
of an individual, it is possible to find out some addi-
tional information concerning the individual, such as
its financial situation (based on tax reports), its prob-
able age (the Danish CPR is based on the birthdate)
and gender (coded into the Danish CPR again). Out
of these, the financial situation can clearly be misused
to harm an individual to the point of social ostracism,
therefore it clearly fulfils the definition of a sensitive
personal data item as defined in Article 9 GDPR.
The sensitive nature of the name is not that obvi-
ous, and many lawyers and data protection activists
will argue that a name is not a sensitive personal in-
formation. First, it is typically not directly possible to
gather critical, harmful information from a person’s
name. Second, a name is publicly used to identify a
person in society, therefore it is not considered a “se-
cret” in any context. Nevertheless, a name contain-
ing “Muhammad” can directly be linked to a religious
background, with a high probability of the person (or
its parents) following a certain religion. Therefore,
it may count as a sensitive information itself as well.
From this example another fact becomes apparent, a
determination of sensitivity would require a call to
be placed on evaluating multiple dimensions such as
usage of the data, inheritance from parent data types
(if applicable), context, level of anonymization and/or
generalization, among others. However, for the sake
of this example, we will treat a name as non-sensitive
personal data from now on.
4.2 The Problem of Determining the
Status of Derived Data
If we now assume that a software system collects and
stores both these data items, name and taxation ID, for
every customer, we already have identified two possi-
ble sources of risks to a customer’s privacy (if the data
gets lost) and two objects of interest when implement-
ing a customer’s data access rights (cf. (European Par-
liament and Council, 2016, Article 15)). Thus, in-
formation concerning these data items and their pro-
cessing is of clear interest to anyone trying to validate
GDPR compliance for the system in consideration.
Now let’s assume the system utilizes both data
items in combination to generate a unique identifier
per customer, for use as a unique yet speaking refer-
ence for subsequent processes
1
. Here, the system uti-
lizes both name and taxation ID as input to generate a
new data type: a customer ID. This new data item is
thus derived from the two inputs of name and taxation
ID, and thus may inherit the values and some of the
properties associated with these inputs. Especially, it
obviously is personal data itself again. This is due to
the fact that both the inputs were personal data, the
resulting data item is used for identification of human
individuals (here: customers), and the processing of
these inputs did not include any sort of anonymiza-
tion techniques. Both input data are clearly visible
and extractable from the newly generated data item of
customer ID. Hence, it is easy to see the inheritance
of this property to the “child” data item born from the
two previous ones.
A more complex question is whether the result-
ing customer ID also is a sensitive personal data item.
One of the inputs was sensitive, whereas the other was
not (see discussion above). Hence, the decision of
whether the resulting item is of sensitive personal na-
ture or not depends on the type of processing. Here,
a major challenge lies in the determination as to what
extent the sensitivity of the input data items is trans-
ferred to the newly created data item. If the customer
ID contains the complete taxation ID verbatim, that
latter can easily be extracted from the former, and
therefore the customer ID can easily be linked to a
customer’s taxation data and financial situation, ren-
dering it a sensitive personal data item. If only part of
the taxation ID is used in the customer ID, or if some
other method of anonymization or pseudonymization
is applied (such as hashing of inputs or generaliza-
tion of data, see e.g. (Zhou et al., 2008; Pfitzmann
and Hansen, 2010)), the resulting customer ID may
1
We omit the debate on whether this is a reasonable way
to create a customer identifier in the light of the GDPR here.
In short: it is not!
Towards Aligning GDPR Compliance with Software Development: A Research Agenda
391