Cache Side-Channel Attacks Against Black-Box Image Processing
Software
Ssuhung Yeh
1a
and Yuji Sekiya
2b
1
Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
2
Security Informatics Education and Research Center, Graduate School of Information Science and Technology,
The University of Tokyo, Tokyo, Japan
Keywords: Side-Channel Analysis, Cache Side-Channel Attack, Machine Learning, Deep Learning, Convolutional
Neural Network.
Abstract: Cache side-channel attacks are a persisting threat to modern computers for their ability to steal secret
information in memory and hard-to-detect characteristics. While researchers have studied these attacks for a
long time, there has been relatively little focus on attacks against media software. One reason is the inherent
noisiness of cache side-channels, making it challenging to extract meaningful information from it. However,
recent advancements in machine learning have changed the landscape, making side-channel analysis more
accessible. In this paper, we proposed a new side-channel analysis framework that is capable of extracting
high-level information from complex applications. With this framework, we attacked image processing
programs, reconstructed images that the victim opened with cache side-channel attacks, and achieved
significantly improved results compared to the previous work.
1 INTRODUCTION
Side-channel attacks involve leveraging additional
information about a computer to infiltrate its internal
states. This supplementary data encompasses factors
such as electromagnetic emissions, power usage
patterns, and execution timing. Subsequently, this
information can be meticulously analyzed to extract
sensitive data, such as cryptographic keys. With the
surge in the popularity of public cloud services, the
threat posed by side-channel attacks has intensified.
Numerous studies (Irazoqui et al., 2014; Moghimi,
2023) have demonstrated the practical feasibility of
cross-VM side-channel attacks, heightening concerns.
While side-channel attacks have been under
scrutiny for an extended period, the practice of
conducting side-channel analysis (SCA) on complex
software has traditionally been regarded as a
formidable endeavor, if not outright impossible.
Nonetheless, the substantial advancements in
machine learning and deep neural networks in recent
years have paved the way for the extraction of high-
level information from collected traces. This
a
https://orcid.org/0009-0005-1442-4159
b
https://orcid.org/0009-0006-6287-9606
development has amplified the potency of side-
channel attacks, rendering them more formidable than
ever before.
In this context, a pivotal question arises: With the
aid of advanced machine learning techniques, what is
the upper limit of information attainable through side-
channel analysis? We think that investigating this
question at the present juncture is important, given the
aforementioned reasons.
In this paper, our primary focus is on investigating
cache side-channel attacks targeting black-box image
processing applications. Specifically, these
applications including a JPEG decoding program and
a WebP decoding program, both of which are
designed to convert JPEG or WebP images into
bitmap files. Concurrently, an attacker initiates a
cache side-channel attack on these programs and
captures memory access traces. The objective is to
gauge the attacker's ability to effectively reconstruct
the original input images from these traces using
neural networks. Our contributions can be
summarized as follows:
578
Yeh, S. and Sekiya, Y.
Cache Side-Channel Attacks Against Black-Box Image Processing Software.
DOI: 10.5220/0012264400003584
In Proceedings of the 19th International Conference on Web Information Systems and Technologies (WEBIST 2023), pages 578-584
ISBN: 978-989-758-672-9; ISSN: 2184-3252
Copyright © 2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
Proposed a new side-channel analysis
framework that supports both Prime+Probe and
write-back channel attacks against image
processing software.
With the proposed framework, we use it to
attack libjpeg and libwebp programs and
reconstruct images successfully. Compared to
the previous work (Yuan et al., 2022), the
reconstructed images have much higher fidelity
even under stricter conditions.
To our knowledge, this work is the first one that
attacks libwebp with side-channel analysis,
and also the first one doing side-channel
analysis with a write-back channel attack.
In the rest of this paper, we will first survey
related works about cache side-channel attacks, side-
channel analysis with machine learning, and side-
channel attacks against media programs in Section 2.
Section 3 presents the dataset, the attack setup, and
the neural network model design. The reconstruction
result is presented and discussed in Section 4. Finally,
the conclusion and future work are provided in
Section 5.
Our code is available in our GitHub repository
(WEBIST-2023-Cache_Side_Channel, 2023).
2 BACKGROUND AND RELATED
WORKS
2.1 Cache Side-Channel Attacks
The cache is a hardware set between main memory and
CPUs to accelerate the memory access speed. Due to
its shared characteristic, the execution of one process
will influence the state of the cache, thus, influencing
the memory access time of other processes. In other
words, an attacker process can infer other processes'
internal states by observing whether each memory
access is cache hit or cache miss.
There are many variations of cache side-channel
attack techniques, that differ in the threat model and
amount of information the attacker can get. Here, we
introduce two of them that are related to this work, the
Prime+Probe attack and the write-back channel attack.
2.1.1 Prime+Probe Attack
The Prime+Probe attack (Tromer et al., 2010) exploit
that modern caches mostly apply set-associative
design, that the data is stored in which cache line is
decided by certain bits in the middle of the memory
address, called index bits.
The attack can be separated into two steps. In the
prime stage, the attacker fills the cache with his data
by accessing memory addresses mapping to all cache
lines, creating the eviction set. Then the attacker waits
for a while for the victim to execute. During the
victim’s execution, he will replace some data in the
cache with his data. The attacker enters the probe
stage the next time the attacker process is scheduled
to execute. This time, he accesses the whole eviction
set again and measures the time it takes to retrieve the
data. If it takes longer to access a memory address,
this infers the victim accessed this cache line before
so that the attacker’s data is evicted from the cache.
Overall, as long as a program has access to a
cycle-level high-precision clock, and can create an
eviction set, it can launch a Prime+Probe attack and
infer which cache lines or memory addresses are
accessed by the victim. Previous works have shown
that Prime+Probe attack can be used to exploit
encryption keys (Tromer et al., 2010; Liu et al., 2015)
and perform website fingerprinting (Oren et al., 2015).
2.1.2 Write-Back Channel Attack (WB
Attack)
Besides knowing cache hit or miss, memory
accessing time can also be exploited to infer whether
the cache line is dirty or not (Cui et al., 2022). If the
cache write policy is write-back, when a process
updates a value in memory, the update will only be
done in the cache under the hood. In this scenario, the
dirty bit of that cache line will be set, so that the
hardware knows to write the data back to the main
memory when eviction happens.
The write-back channel attack (WB attack)
exploits the fact that when the dirty bit is set for a
cache line, it takes a longer time to write the data back,
thus gaining further information about whether the
victim process writes data to the cache line. The WB
attack is an upgraded version of the Prime+Probe
attack. Under the same condition, the WB attack can
infer which cache lines are read or written by the
victim. The previous work has described the
possibility of creating side-channel attacks with the
write-back channel.
2.2 Side-Channel Analysis with
Machine Learning
Though side-channel attacks are easy to launch, the
collected data need to be analyzed to get sensitive
data. The whole process is usually called side-channel
analysis. The analyzing task is usually far from easy
for two reasons. First, the collected data is noisy and
Cache Side-Channel Attacks Against Black-Box Image Processing Software
579
huge in size. Second, the relationship between the
secret to extract and collected data remains unclear.
These two reasons make side-channel analysis a very
difficult and labor-intensive task. However, with the
progression in machine learning techniques, this task
has become feasible in practice.
First, researchers have shown that neural
networks can be used to denoise traces (Wu and Picek,
2020; Kwon et al., 2021) collected with side-channel
attacks. After that, more studies have proved that
deep neural networks can even be exploited to
perform end-to-end side-channel analysis attacks,
including website fingerprinting with cache side-
channel (Cook et al., 2022) and keystroke logging
with electromagnetic side-channel (Zhan et al., 2022).
2.3 Side-Channel Analysis of Media
Program
Side-channel analysis of media software hasn’t been
studied a lot relatively. Compared to breaking
encryption implementations, the data to steal in media
software is larger in size, and the diversity in software
implementations is greater. On the other hand,
website fingerprinting and keystroke logging with
side-channel analysis with side-channel analysis can
be degraded to classification problems, while attacks
on media software can’t.
As started by Xu, Cui, and Peinado (2015), they
successfully extracted the outlines of JPEG images
through side-channel analysis. Followed by Balmau,
et al. (2017), they reconstructed JPEG images with
colors. However, these two works launched attacks in
different scenarios. They assumed the OS was
compromised and treated the victim program as a
white-box, which are both strong assumptions.
Image reconstruction with non-privileged, black-
box side-channel analysis (Yuan, Wang et al., 2021;
Yuan, Pang et al., 2022) then succeeded. Yuan et al.
simplified the process of reconstructing images from
memory address traces to a regression problem. They
represented images with latent vectors that contained
high-level information, extracting latent vectors from
traces, and then reconstructed images from them. This
made side-channel analysis of media software
possible.
3 METHOLOGY
3.1 Threat Model
The threat model includes assumptions listed as
follows:
The attacker can execute native code on the
victim’s machine.
The attack doesn’t need any knowledge of the
victim program, only treat it as a black box.
The attacker has the same machine and victim
program, or he can input any data into the victim
program and observe the trace to produce
training data.
The cache being attacked uses the least recently
used (LRU) for the cache replacement policy
and write back for the cache writing policy.
For evaluation, we assume the target cache to be
the L1 data cache, but the attack framework doesn’t
limit to any level of cache.
The framework is evaluated on two image
processing programs, which are a JPEG decoding
program and a WebP decoding program. The JPEG
decoding program we used is the example JPEG
decoding program tjexample.c in the popular
libjpeg-turbo library (ver. 2.1.92) (libjpeg-
turbo, 2010). This program takes a JPEG image as the
input, decodes it, and outputs the bitmap file. As for
the WebP decoding program, the example program
dwebp.c in the libwebp library (ver. 1.3.1)
(libwebp, 2011) is used.
The complete attack consists of two phases. In the
training phase, the attacker produces traces
corresponding to the reference images and trains the
neural network. Then in the second phase, the
attacker can reconstruct unknown images from traces
collected from the victim.
3.2 Dataset
Experiments are conducted with two image datasets,
which are JPEG and WebP datasets. The images
come from Large-scale CelebFaces Attributes
(CelebA) Dataset (Liu et al., 2015), Align & Cropped
version, and are then resized to 128x128 pixels. The
images are in JPEG format originally, so for the
WebP dataset, manual transformation is required.
Ordered by image ID, the first 80,000 images are used
for training, and the last 19,921 images are used for
testing. Every image in the dataset belongs to one of
10177 identities. This identity is used to optimize the
training of the neural network.
3.3 Attack Setup
In this study, Intel Pin (Luk et al., 2005) is used to
collect memory access traces. The reason for
choosing Intel Pin is that it is a dynamic
instrumentation tool, in other words, there is no need
DMMLACS 2023 - 3rd International Special Session on Data Mining and Machine Learning Applications for Cyber Security
580
Figure 1: Model Overview.
to recompile the victim program, which corresponds
to the black-box assumption about the victim program.
After traces are collected, post-processing is
performed. The first step is to extract index bits
according to the cache configuration. In our case, the
attack target is the L1 data cache with 64 sets, and 64
bytes per cache line, thus, 7-th to 12-th bits of
memory addresses are extracted. Next, pad traces to
the maximum length of all traces. Finally, encode the
cache line index with the binary encoding method. In
other words, for each memory access, a vector with
64 elements is created. For the Prime+Probe attack,
only the element corresponding to the accessed cache
line index is 1, otherwise 0. As for the WB attack, -1
is used to represent a write, and 1 for a read.
3.4 Model Design
The overview of the neural network is shown in
Figure 1. The primary reconstruction job is done by a
reconstructor network, which is essentially a
variational autoencoder (VAE). It is composed of a
trace encoder and an image decoder. The idea is that
the trace encoder is expected to extract high-level
information (skin color, face direction, …) about the
image, and the image decoder can create the image
according to it. As the previously proposed
framework (Yuan et al., 2022) has done, a neural
network is chained after. It is used to answer if an
image is real or not and classify its identity, called a
classifier. Though this part is not mandatory, they can
provide extra information about how the
reconstruction images look and propagate loss back
to train the reconstructor better.
The training process of the whole neural network
is analogous to a generative adversarial network
(GAN) (Goodfellow et al., 2014). First, fix the
reconstructor, and train the classifier with real and
fake images reconstructed by the reconstructor. Then,
fix the classifier and train the reconstructor with the
assistance of the classifier afterward. Hopefully, the
two neural networks will grow together and provide a
better reconstruction and classification result.
For the detail of models inside each part, the trace
encoder is a 1-dimensional convolutional neural
network (1D CNN), and the image decoder is a 2-
dimensional convolutional neural network (2D CNN).
When training the classifier, the loss function is
defined as follows:

,̂

fake
,̂

real
,̂
(1)
is defined as the original image, and is the
reconstructed image. The first term is the cross-
entropy loss between the real identity of the image
and the predicted identity ̂
based on the original
image . The second and the third terms are the
binary cross entropy. It calculates the distance
between the trueness of a fake image
fake
and the
prediction of trueness ̂
based on the reconstructed
image, and the trueness of a real image
real
and the
prediction ̂
based on the reference image.
When training the reconstructor, the loss function
is defined as follows:
,


, 
pre
, 
(2)
Cache Side-Channel Attacks Against Black-Box Image Processing Software
581
Figure 2: Qualitative Reconstruction Result.
Table 1: Quantitative Results of Experiments.
(Yuan et al., 2022) Ours
Attack Target
libjpeg libwebp
Max Trace Length 290745 897567
Attack Prime+Probe Prime+Probe WB Prime+Probe
Avg. SSIM score 0.09337 0.23059 0.32506 0.20095
rec
and
pre
are reconstruction loss and
prediction loss, while is a parameter used to balance
these two terms.
rec
is defined as follows:
rec
α ,  1 α ,
(3)
The first part is the structural similarity loss
(SSIM) (Wang et al., 2004). The SSIM score is a
common method to quantify the perceptional
similarity between two images by splitting the images
into blocks and comparing the luminance, contrast,
and structure of each block. It is a value between -1
and 1, and a higher score means a higher similarity.
The SSIM loss is defined as the opposite of the SSIM
score, which means a higher score infers a lower
similarity. Since the SSIM loss doesn’t consider the
difference of color, a mean square error (MSE) term
is added, and parameter α is the weight between these
two terms. The definition of
pre
is as follows:
pre

,̂

fake
,̂
(4)
4 EVALUATIONS
The qualitative reconstruction results are shown in
Figure 2. For more reconstructed images, please refer
to the Appendix. For the quantitative results, the
average SSIM score (Wang et al., 2004) is used to
quantify the reconstruction result. The score is
calculated by averaging the SSIM scores between the
reference images in the testing split and reconstructed
images. The settings of different experiments and the
average SSIM scores are presented in Table 1.
For the rest of this section, we will discuss the
results in more detail and compare them between
experiments.
4.1 Comparison with the Previous
Work
We compare our experiment results with the result
reconstructed with the framework proposed in the
previous work (Yuan et al., 2022). For the previous
DMMLACS 2023 - 3rd International Special Session on Data Mining and Machine Learning Applications for Cyber Security
582
work, they used the 7-th to 32-th bits in the memory
address in the traces when the side-channel is set to
cache line index, which contains more information
than a cache side-channel attacker can learn
theoretically. Our framework only uses the 7-th to 12-
th bits as the input, which corresponds to the number
of cache sets. Despite the reduced information in the
traces, a superior reconstruction result is achieved.
We will describe the reconstruction result of the
previous work as that there is some correspondence
between reference images and reconstructed images,
however, they are not visually similar. Images
reconstructed with our framework are much similar to
original images, in aspects of skin colors, hairstyles,
facial expression, and so on.
The reason that our framework performs better
can be explained in two aspects. First, in the previous
work (Yuan et al., 2022), their model interprets the
accessed memory addresses or cache line indexes as
a value. However, the values only represent a location
in the memory of the cache, not a magnitude of
something. We encoded the value using the binary
encoding method, which is believed to be the correct
way to interpret those values. Second, they use 2D
CNN for the model of the trace encoder. This forces
the model to consider elements scattered in traces
together and look for patterns inside them. On the
other hand, 1D CNN is used in our model, thus the
model will only consider the relation between
adjacent elements in traces.
4.2 Comparison Between the
Prime+Probe and the WB Attack
The qualitative and quantitative results show that the
neural network can reconstruct images with higher
fidelity when launching a write-back channel attack.
This result corresponds with our expectation, as there
is additional information about read/write in the
traces. However, according to our observation, the
result is largely dependent on the encoding of read
and write behavior. We haven’t spent much time
comparing different encoding methods.
4.3 Attack on libwebp
Comparing the results of attacking libjpeg and
libwebp with Prime+Probe attack, the fidelity of
images is at about the same level. Though we do
expect a better result considering the length of traces
is about 3 times longer, the outcome is negative. Our
interpretation is that the example program in
libwebp is more complicated and supports
transformation between more formats, thus, lots of
parts in the traces may not be relevant to the input
image, and they may cause. Regardlessly, we
showcased the potential of our framework to attack
more complex software.
5 CONCLUSIONS
This paper underscores the heightened significance of
cache side-channel analysis, revealing its greater
severity than previously acknowledged. We introduce
a novel cache side-channel analysis framework that
enables the precise reconstruction of images with
remarkable fidelity through cache side-channel
attacks, all without requiring any prior knowledge of
the targeted victim program. Importantly, our
illustration of image processing program exploitation
serves as an exemplary case, echoing prior findings
(Yuan et al., 2022) that the same attack framework
can be adapted to target diverse software types, such
as audio processing and text processing programs, by
simply modifying the image decoder model. This
compelling evidence underscores the imperative to
recognize the non-negligible threat posed by cache
side-channel analysis.
As the upper limits of information leakage
achievable through cache side-channel attacks are
explored, the next objective is to empirically assess
their practical viability by implementing real-world
Prime+Probe and write-back channel attacks. This
aspect of our research remains a subject for future
exploration. Additionally, the broader challenge of
reconstructing general images remains open for
further investigation. While our framework does not
assume any specific image type, it is worth noting, as
indicated in other research on image-to-image VAE
(Van Den Oord et al., 2017), that even with advanced
model design, the latent vector's dimension required
for the reconstruction of general images exceeds 128
significantly.
REFERENCES
Cook, J., Drean, J., Behrens, J., and Yan, M. (2022). There's
always a bigger fish: a clarifying analysis of a machine-
learning-assisted side-channel attack. In Proceedings of
the 49th Annual International Symposium on Computer
Architecture (pp. 204-217).
Cui, Y., Yang, C., and Cheng, X. (2022). Abusing cache
line dirty states to leak information in commercial
processors. In 2022 IEEE International Symposium on
High-Performance Computer Architecture (HPCA)
(pp. 82-97). IEEE.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-
Cache Side-Channel Attacks Against Black-Box Image Processing Software
583
Farley, D., Ozair, S., ... and Bengio, Y. (2014).
Generative Adversarial Nets. Advances in neural
information processing systems, 27.
Hähnel, M., Cui, W., and Peinado, M. (2017). High-
Resolution Side Channels for Untrusted Operating
Systems. In 2017 USENIX Annual Technical Conference
(USENIX ATC 17) (pp. 299-312).
Irazoqui, G., Inci, M. S., Eisenbarth, T., and Sunar, B. (2014).
Wait a minute! A fast, Cross-VM attack on AES. In
Research in Attacks, Intrusions and Defenses: 17th
International Symposium, RAID 2014, Gothenburg,
Sweden, September 17-19, 2014. Proceedings 17 (pp.
299-319). Springer International Publishing.
WEBIST-2023-Cache_Side_Channel (2023). https://github.
com/ssuhung/WEBIST-2023-Cache_Side_Channel
Kwon, D., Kim, H., and Hong, S. (2021). Non-profiled deep
learning-based side-channel preprocessing with
autoencoders. IEEE Access, 9, 57692-57703.
libjpeg-turbo (2010). https://github.com/libjpeg-turbo/
libjpeg-turbo. [Online; accessed 10-April-2023].
libwebp (2011). https://github.com/webmproject/libwebp.
[Online; accessed 20-August-2023].
Liu, F., Yarom, Y., Ge, Q., Heiser, G., and Lee, R. B. (2015,
May). Last-level cache side-channel attacks are practical.
In 2015 IEEE symposium on security and privacy (pp.
605-622). IEEE.
Liu, Z., Luo, P., Wang, X., and Tang, X. (2015). Deep
learning face attributes in the wild. In Proceedings of the
IEEE international conference on computer vision (pp.
3730-3738).
Luk, C. K., Cohn, R., Muth, R., Patil, H., Klauser, A.,
Lowney, G., Wallace, S., Reddi, V. J., and Hazelwood,
K. (2005). Pin: building customized program analysis
tools with dynamic instrumentation. ACM SIGPLAN
Notices, 40(6), 190-200.
Moghimi, D. (2023). Downfall: Exploiting Speculative Data
Gathering. In 32nd USENIX Security Symposium
(USENIX Security 23) (pp. 7179-7193).
Oren, Y., Kemerlis, V. P., Sethumadhavan, S., and
Keromytis, A. D. (2015). The spy in the sandbox:
Practical cache attacks in JavaScript and their
implications. In Proceedings of the 22nd ACM SIGSAC
Conference on Computer and Communications Security
(pp. 1406-1418).
Tromer, E., Osvik, D. A., and Shamir, A. (2010). Efficient
cache attacks on AES, and countermeasures. Journal of
Cryptology, 23(1), 37–71.
Van Den Oord, A., and Vinyals, O. (2017). Neural discrete
representation learning. Advances in neural information
processing systems, 30.
Wu, L., and Picek, S. (2020). Remove some noise: On pre-
processing of side-channel measurements with
autoencoders. IACR Transactions on Cryptographic
Hardware and Embedded Systems, 389-415.
Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P.
(2004). Image quality assessment: from error visibility to
structural similarity. IEEE transactions on image
processing, 13(4), 600-612.
Xu, Y., Cui, W., and Peinado, M. (2015). Controlled-channel
attacks: Deterministic side channels for untrusted
operating systems. In 2015 IEEE Symposium on Security
and Privacy
(pp. 640-656). IEEE.
Yuan, Y., Pang, Q., and Wang, S. (2022). Automated side
channel analysis of media software with manifold
learning. In 31st USENIX Security Symposium (USENIX
Security 22) (pp. 4419-4436).
Yuan, Y., Wang, S., and Zhang, J. (2021). Private image
reconstruction from system side channels using
generative models. In Ninth International Conference on
Learning Representations.
Zhan, Z., Zhang, Z., Liang, S., Yao, F., and Koutsoukos, X.
(2022). Graphics peeping unit: Exploiting EM side-
channel information of GPUs to eavesdrop on your
neighbors. In 2022 IEEE Symposium on Security and
Privacy (SP) (pp. 1440-1457). IEEE.
APPENDIX
More reconstruction results are presented here.
DMMLACS 2023 - 3rd International Special Session on Data Mining and Machine Learning Applications for Cyber Security
584