Parallel Privacy-preserving Record Linkage using LSH-based Blocking
Martin Franke, Ziad Sehili, Erhard Rahm
2018
Abstract
Privacy-preserving record linkage (PPRL) aims at integrating person-related data without revealing sensitive information. For this purpose, PPRL schemes typically use encoded attribute values and a trusted party for conducting the linkage. To achieve high scalability of PPRL to large datasets with millions of records, we propose parallel PPRL (P3RL) approaches that build on current distributed dataflow frameworks such as Apache Flink or Spark. The proposed P3RL approaches also include blocking for further performance improvements, in particular the use of LSH (locality sensitive hashing) that supports a flexible configuration and can be applied on encoded records. An extensive evaluation for different datasets and cluster sizes shows that the proposed LSH-based P3RL approaches achieve both high quality and high scalability. Furthermore, they clearly outperform approaches using phonetic blocking.
DownloadPaper Citation
in Harvard Style
Franke M., Sehili Z. and Rahm E. (2018). Parallel Privacy-preserving Record Linkage using LSH-based Blocking.In Proceedings of the 3rd International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS, ISBN 978-989-758-296-7, pages 195-203. DOI: 10.5220/0006682701950203
in Bibtex Style
@conference{iotbds18,
author={Martin Franke and Ziad Sehili and Erhard Rahm},
title={Parallel Privacy-preserving Record Linkage using LSH-based Blocking},
booktitle={Proceedings of the 3rd International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,},
year={2018},
pages={195-203},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006682701950203},
isbn={978-989-758-296-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 3rd International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,
TI - Parallel Privacy-preserving Record Linkage using LSH-based Blocking
SN - 978-989-758-296-7
AU - Franke M.
AU - Sehili Z.
AU - Rahm E.
PY - 2018
SP - 195
EP - 203
DO - 10.5220/0006682701950203