Automated Cryptanalysis of Bloom Filter Encryptions of Health Records

Martin Kroll, Simone Steinmetzer

2015

Abstract

Privacy-preserving record linkage with Bloom filters has become increasingly popular in medical applications, since Bloom filters allow for probabilistic linkage of sensitive personal data. However, since evidence indicates that Bloom filters lack sufficiently high security where strong security guarantees are required, several suggestions for their improvement have been made in literature. One of those improvements proposes the storage of several identifiers in one single Bloom filter. In this paper we present an automated cryptanalysis of this Bloom filter variant. The three steps of this procedure constitute our main contributions: (1) a new method for the detection of Bloom filter encrytions of bigrams (so-called atoms), (2) the use of an optimization algorithm for the assignment of atoms to bigrams, (3) the reconstruction of the original attribute values by linkage against bigram sets obtained from lists of frequent attribute values in the underlying population. To sum up, our attack provides the first convincing attack on Bloom filter encryptions of records built from more than one identifier.

References

  1. Bloom, B. H. (1970). Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, 13(7):422-426.
  2. Herzog, T. N., Scheuren, F. J., and Winkler, W. E. (2007). Data Quality and Record Linkage Techniques. Springer, New York.
  3. Jakobsen, T. (1995). A fast method for the cryptanalysis of substitution ciphers. Cryptologia, 19(3):265-274.
  4. Jones, M., McEwan, P., Morgan, C. L., Peters, J. R., Goodfellow, J., and Currie, C. J. (2005). Evaluation of the pattern of treatment, level of anticoagulation control, and outcome of treatment with warfarin in patients with non-valvar atrial fibrillation: a record linkage study in a large British population. Heart, 91(4):472- 477.
  5. Kirsch, A. and Mitzenmacher, M. (2008). Less hashing, same performance: Building a better Bloom filter. Random Structures & Algorithms, 33(2):187-218.
  6. Kuehni, C. E., Rueegg, C. S., Michel, G., Rebholz, C. E., Strippoli, M.-P. F., Niggli, F. K., Egger, M., and von der Weid, N. X. (2012). Cohort profile: The Swiss childhood cancer survivor study. International Journal of Epidemiology, 41(6):1553-1564.
  7. Kuzu, M., Kantarcioglu, M., Durham, E., and Malin, B. (2011). A constraint satisfaction cryptanalysis of bloom filters in private record linkage. In FischerHübner, S. and Hopper, N., editors, Privacy Enhancing Technologies, volume 6794 of Lecture Notes in Computer Science, pages 226-245. Springer, Berlin.
  8. Kuzu, M., Kantarcioglu, M., Durham, E. A., Toth, C., and Malin, B. (2012). A practical approach to achieve private medical record linkage in light of public resources. Journal of the American Medical Informatics Association, 20(2):285-292.
  9. Newman, T. B. and Brown, A. N. (1997). Use of commercial record linkage software and vital statistics to identify patient deaths. Journal of the American Medical Informatics Association, 4(3):233-237.
  10. Niedermeyer, F., Steinmetzer, S., Kroll, M., and Schnell, R. (2014). Cryptanalysis of basic Bloom filters used for privacy preserving record linkage. Working Paper NO.WP-GRLC-2014-04, German Record Linkage Center, Nürnberg.
  11. Office for National Statistics (2013). Beyond 2011: Matching anonymous data. Methods & Policies M9, ONS, London.
  12. Randall, S. M., Ferrante, A. M., Boyd, J. H., Bauer, J. K., and Semmens, J. B. (2014). Privacy-preserving record linkage on large real world datasets. Journal of Biomedical Informatics.
  13. Randall, S. M., Ferrante, A. M., Boyd, J. H., and Semmens, J. B. (2013). The effect of data cleaning on record linkage quality. BMC Medical Informatics and Decision Making, 13(64).
  14. Rocha, M. C. N. (2013). Vigilaˆncia dos óbitos Registrados com Causa Básica Hanseníase. Master thesis, Universidade de Brasília, Brasília.
  15. Schnell, R., Bachteler, T., and Reiher, J. (2009). Privacypreserving record linkage using Bloom filters. BMC Medical Informatics and Decision Making, 9(41):1- 11.
  16. Schnell, R., Bachteler, T., and Reiher, J. (2011). A novel error-tolerant anonymous linking code. Working Paper NO.WP-GRLC-2011-02, German Record Linkage Center, Nürnberg.
  17. Schnell, R., Richter, A., and Borgs, C. (2014). Performance of different methods for privacy preserving record linkage with large scale medical data sets. Presentation at International Health Data Linkage Conference, Vancouver.
  18. Van Den Brandt, P. A., Schouten, L. J., Goldbohm, R. A., Dorant, E., and Hunen, P. M. H. (1990). Development of a record linkage protocol for use in the Dutch cancer registry for epidemiological research. International Journal of Epidemiology, 19(3):553-558.
Download


Paper Citation


in Harvard Style

Kroll M. and Steinmetzer S. (2015). Automated Cryptanalysis of Bloom Filter Encryptions of Health Records . In Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2015) ISBN 978-989-758-068-0, pages 5-13. DOI: 10.5220/0005176000050013


in Bibtex Style

@conference{healthinf15,
author={Martin Kroll and Simone Steinmetzer},
title={Automated Cryptanalysis of Bloom Filter Encryptions of Health Records},
booktitle={Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2015)},
year={2015},
pages={5-13},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005176000050013},
isbn={978-989-758-068-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2015)
TI - Automated Cryptanalysis of Bloom Filter Encryptions of Health Records
SN - 978-989-758-068-0
AU - Kroll M.
AU - Steinmetzer S.
PY - 2015
SP - 5
EP - 13
DO - 10.5220/0005176000050013