that an information is not made available or revealed
to unauthorised persons, entities or processes” (ISO,
1989). In the context of pub/sub middleware, confi-
dentiality concerns encompass (1) part or all of the
constraints of the subscriptions, (2) part or all the in-
formation in the publication that is used for routing
against subscriptions, and (3) the payload of the pub-
lications (Onica, E. et al., 2016).
Through encryption, a full degree of confiden-
tiality can be achieved but with a significant over-
head. The will of the user as well as the desired de-
gree of confidentiality must be considered. Depend-
ing on the nature of the data sent by the users of a
pub/sub system, parts of the publications might be
sent in clear text under some assumptions. Particu-
larly, the main concern of the users may be to pre-
vent the brokers from identifying them. Following
the work of (Domingo-Ferrer, J. et al., 2019), we dis-
tinguish three categories of attributes: (1) identify-
ing attributes that individually disclose the identity of
a subject, (2) quasi-identifying attributes that do not
identify subjects when considered separately, but their
combination may, and (3) confidential attributes that
convey sensitive features of an individual (income, re-
ligion, health condition, etc.) and may be sent in clear
text as long as they can not be associated with an iden-
tity. Any attribute that does not fit any of these cate-
gories is considered as non-confidential and be out-
sourced as it is.
The proposition of this paper is to use a masking
method, namely data splitting, with a cryptographic
scheme to balance performance and security. This
allows to avoid the use of encrypted matching when
possible, regarding security requirements. In order to
assess the feasibility of our proposal, we implemented
our solution before proceeding to performance tests.
The paper is structured as follows. In Sec-
tion 2, we describe then illustrate the security con-
cerns through a motivating scenario. Afterwards, we
discuss related works in Section 3. In Section 4, we
detail our contribution. In Sections 5 and 6, we anal-
yse the security of our system and provide the results
of some performance tests. Finally, we conclude the
paper in Section 7.
2 MOTIVATION
In Section 2.1, we illustrate the security needs through
a security-oriented lifeguard scenario. Then, in Sec-
tion 2.2, we highlight the security concerns caused by
data splitting.
2.1 Scenario
In our motivating scenario, we consider bathers on
beaches and lifeguards whose mission is to protect
bathers from drowning. All bathers and lifeguards
are equipped with RFID wristbands that include ge-
olocation sensors. The lifeguards need to collect the
geolocation information of bathers and personal data
to fulfill their role. To comply with the GDPR regu-
lation, the collected data type must be determined in
advance and exposed clearly to the bathers to get their
consent.
Moreover, different physical or logical overlays
of brokers are present around the beach. They col-
lect information from the bathers and relay it to the
lifeguards. To avoid the automatic use of encryption
that would result in performance issues, the data are
rather split into non-sensitive chunks sent to different
overlays. For instance, let us consider the case where
the bather has to publish the following attributes:
{name, location, age, gender, occupation}. The name
is an identifying attribute. As a consequence, it has
to be encrypted to prevent an immediate identifica-
tion. The location, without being combined with any
identifying information, is not considered as sensi-
tive information. While gender, age, and occupation
are not confidential or identifying attributes when left
alone, they might identify someone when grouped to-
gether. They are quasi-identifiers and need specific
processing. Therefore, we split them into two groups:
{age, gender}, {occupation}. Note that many com-
binations are possible. For example, we could split
them as {age}, {gender}, and {occupation}, which is
the most basic way to split the data, but also the one
that needs the highest number of distinct overlays of
brokers. So, these two groups of quasi-identifiers are
sent to two different overlays of brokers in order to
avoid re-identification attacks. An additional overlay
of brokers may be used to publish the encrypted at-
tributes and the non-confidential ones together, which
is not troublesome anymore.
This scenario stresses the need to process the data
properly, firstly to avoid sending groups of attributes
that would allow re-identification, but to limit the
number of overlays as well. However, the second is-
sue is more a matter of performance rather than secu-
rity and we do not address it in this paper.
2.2 Security Threats and Requirements
We consider the semi-trusted model where the confi-
dentiality of subscriptions, publications, and payloads
(see Section 1) is at risk, as well as their privacy, if
any of these three items contains identifying or confi-
SECRYPT 2020 - 17th International Conference on Security and Cryptography
406