Authors:
Andrei Kazlouski
1
;
2
;
Thomas Marchioro
1
;
2
and
Evangelos Markatos
1
;
2
Affiliations:
1
Computer Science Department, University of Crete, Greece
;
2
Institute of Computer Science, Foundation for Research and Technology Hellas, Greece
Keyword(s):
Privacy, De-anonymization, Data Inference, Wearable Devices.
Abstract:
Recently, there has been a significant surge of lifelogging experiments, where the activity of few participants is monitored for a number of days through fitness trackers. Data from such experiments can be aggregated in datasets and released to the research community. To protect the privacy of the participants, fitness datasets are typically anonymized by removing personal identifiers such as names, e-mail addresses, etc. However, although seemingly correct, such straightforward approaches are not sufficient. In this paper we demonstrate how an adversary can still de-anonymize individuals in lifelogging datasets. We show that users’ privacy can be compromised by two approaches: (i) through the inference of physical parameters such as gender, height, and weight; and/or (ii) via the daily routine of participants. Both methods rely solely on fitness data such as steps, burned calories, and covered distance to obtain insights on the users in the dataset. We train several inference models
, and leverage them to de-anonymize users in public lifelogging datasets. Between our two approaches we achieve 93.5% re-identification rate of participants. Furthermore, we reach 100% success rate for people with highly distinct physical attributes (e.g., very tall, overweight, etc.).
(More)