Parallel Privacy-preserving Record Linkage using LSH-based Blocking

Martin Franke, Ziad Sehili, Erhard Rahm

2018

Abstract

Privacy-preserving record linkage (PPRL) aims at integrating person-related data without revealing sensitive information. For this purpose, PPRL schemes typically use encoded attribute values and a trusted party for conducting the linkage. To achieve high scalability of PPRL to large datasets with millions of records, we propose parallel PPRL (P3RL) approaches that build on current distributed dataflow frameworks such as Apache Flink or Spark. The proposed P3RL approaches also include blocking for further performance improvements, in particular the use of LSH (locality sensitive hashing) that supports a flexible configuration and can be applied on encoded records. An extensive evaluation for different datasets and cluster sizes shows that the proposed LSH-based P3RL approaches achieve both high quality and high scalability. Furthermore, they clearly outperform approaches using phonetic blocking.

Download


Paper Citation


in Harvard Style

Franke M., Sehili Z. and Rahm E. (2018). Parallel Privacy-preserving Record Linkage using LSH-based Blocking.In Proceedings of the 3rd International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS, ISBN 978-989-758-296-7, pages 195-203. DOI: 10.5220/0006682701950203


in Bibtex Style

@conference{iotbds18,
author={Martin Franke and Ziad Sehili and Erhard Rahm},
title={Parallel Privacy-preserving Record Linkage using LSH-based Blocking},
booktitle={Proceedings of the 3rd International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,},
year={2018},
pages={195-203},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006682701950203},
isbn={978-989-758-296-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 3rd International Conference on Internet of Things, Big Data and Security - Volume 1: IoTBDS,
TI - Parallel Privacy-preserving Record Linkage using LSH-based Blocking
SN - 978-989-758-296-7
AU - Franke M.
AU - Sehili Z.
AU - Rahm E.
PY - 2018
SP - 195
EP - 203
DO - 10.5220/0006682701950203