Quantifying the Benefits of File Size Information for Forensic Hash Matching

Johan Garcia

Abstract

Hashing is a widely used technique in the digital forensic practice. By using file size information in addition to hashes, hash matching can potentially be made more effective since there is no need to calculate a hash value if there is no file in the hash set that has the same file size as the file being examined. Based on an examination of 36 million file sizes from five different data sets, this paper provides a quantification of the obtainable improvements. For the evaluated data sets the file reduction, i.e the fraction of files that can be skipped without hash calculations, ranged from 0.009 to 0.525. The byte reduction, i.e. the fraction of bytes that can be skipped, ranged from 0.514 to 0.992. Simulation results showed that these reductions in many cases could decrease the time necessary for hash scanning by 50% or more.

References

  1. Baier, H. and Breitinger, F. (2011). Security aspects of piecewise hashing in computer forensics. IT Security Incident Management and IT Forensics, International Conference on, 0:21-36.
  2. Garfinkel, S., Farrell, P., Roussev, V., and Dinolt, G. (2009). Bringing science to digital forensics with standardized forensic corpora. Digital Investigation, 6, Supplement(0):S2 - S11.
  3. Garfinkel, S. L. and Shelat, A. (2003). Remembrance of data passed: A study of disk sanitization practices. IEEE Security and Privacy, 1:17-27.
  4. Kornblum, J. (2006). Identifying almost identical files using context triggered piecewise hashing. Digital Investigation, 3, Supplement(0):91 - 97.
  5. NSRL (2007). National Software Reference Library (NSRL). National Institute of Standards and Technology (NIST). U.S. Department of Justice's National Institute of Justice (NIJ), http://www.nsrl.nist.gov/.
  6. Roussev, V. (2009). Hashing and data fingerprinting in digital forensics. IEEE Security and Privacy, 7:49-55.
Download


Paper Citation


in Harvard Style

Garcia J. (2012). Quantifying the Benefits of File Size Information for Forensic Hash Matching . In Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2012) ISBN 978-989-8565-24-2, pages 333-338. DOI: 10.5220/0004077303330338


in Bibtex Style

@conference{secrypt12,
author={Johan Garcia},
title={Quantifying the Benefits of File Size Information for Forensic Hash Matching},
booktitle={Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2012)},
year={2012},
pages={333-338},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004077303330338},
isbn={978-989-8565-24-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2012)
TI - Quantifying the Benefits of File Size Information for Forensic Hash Matching
SN - 978-989-8565-24-2
AU - Garcia J.
PY - 2012
SP - 333
EP - 338
DO - 10.5220/0004077303330338