Figure 5: Throughput of blk-sha256, higher is faster.
hash. Zero optimization not only speeds up hash com-
putation by up to 4 orders of magnitude but it also
lowers the carbon footprint of the computatio n by 4
orders of magnitude - at the same time.
Figure 6 shows blkhash efficiency on AWS
c7i.metal-2 4xl instance using BLAKE3 for the outer
and inn er hash functions. Hashing unallocated areas
(hole) shows through put of 0.000 2 cycles per byte,
2400.0 times lower than BLAKE3. Hashing blocks
full of zeros ( zero) shows constant throughput of 0.1
cycles per byte for any number of threads, 4.8 times
lower than single threaded BLAKE3. Hashing blocks
full of n on-zero bytes (da ta ) show constant through-
put of 0.54 cycles per bytes fo r any number of th reads,
1.12 times higher than single threaded BLAKE3.
Figure 6: Throughput of blk-blake3 in cycles per byte,
lower is better.
6 CONCLUSIONS
As the world increasingly shifts into the cloud, the
demand for secure hash functions of high throughput
becomes more pronou nced than ever before. The tra-
ditional a pproach to verifying file integrity th rough
hash computation has long been plagued by ineffi-
ciencies when dealin g with large files. The perception
that computing the hash value is negligible compared
to the time it takes to copy a file, no longer holds in
modern computing. Optimizing the performance over
sparse disk images needs to consider hash computa-
tion in addition to the copying op e ration.
The introduction of blkhash marks a new direc-
tion in the realm of hash function design. We address
these challenges head-on by minimizing the computa-
tional overhead associated with empty or unallocated
areas within the file and also by leveraging the multi
core techn ology by parallelization of the computation.
An important feature of blkhash is its modular design,
which allows it to utilize any existing hash function
as a building block. Whether it be well-established
standards like SHA256 or modern alternatives like
BLAKE3, blkhash seamlessly integrates these hash
functions into its framework. This modular approach
not only enhances the flexibility and versatility of
blkhash, but also leverages the p roven security prop-
erties of established hash algorithms. We provide a
referenc e implementatio n along with a suite of bench -
marks. Our results reveal tha t blkhash achieves accel-
eration levels o f up to four orders of magnitude, po-
sitioning it as a game-changer for use cases requiring
rapid verification of large virtual disk image s.
REFERENCES
Amazon Web Services (2023a). Amazon ec2 c7g instances.
https://aws.amazon.com/ec2/instance-types/c7g/.
Amazon Web Services (2023b). Amazon ec2 c7i instances.
https://aws.amazon.com/ec2/instance-types/c7i/.
Bellard, F. and the QEMU Project developers (2003).
Qemu disk image utility. https://www.qemu.org/docs/
master/tools/qemu-img.html.
Dworkin, M. (2015). Sha-3 standard: Permutation-based
hash and extendable-output functions.
Hansen, T. and 3rd, D. E. E. (2006). US Secure Hash Algo-
rithms (SHA and HMAC-SHA). RFC 4634.
Merkle, R . C. (1988). A Digital Signature Based on
a Conventional Encryption Function. In Advances
in Cryptology–CRYPTO ’87: Proceedings, page
369–378. Springer-Verlag.
O’Connor, J., Aumasson, J.-P., Neves, S., and Wilcox-
O’Hearn, Z. (2019). Blake3 - one function,
fast everywhere. https://github.com/BLAKE3-team/
BLAKE3-specs.
Soffer, N. (2021). blkhash - block based hash optimized
for disk images. https://gitlab.com/nirs/blkhash/-/tree/
paper.
Soffer, N. (2022). blkhash-bench - blkhash benchmark
tool. https://gitlab.com/nirs/blkhash/-/blob/paper/test/
blkhash-bench.c.
The OpenSSL Project (2003). OpenSSL: The open source
toolkit for SSL/ TLS. www.openssl.org.