LibSteal: Model Extraction Attack Towards Deep Learning Compilers

by Reversing DNN Binary Library

Jinquan Zhang

, Pei Wang

and Dinghao Wu

College of Information Science and Technology, The Pennsylvania State University, University Park, U.S.A.

Individual Researcher, U.S.A.

Keywords:

Model Extraction Attack, Neural Network Architecture, Deep Learning Compiler, Reverse Engineering.

Abstract:

The need for Deep Learning (DL) based services has rapidly increased in the past years. As part of the trend,

the privatization of Deep Neural Network (DNN) models has become increasingly popular. The authors give

customers or service providers direct access to their created models and let them deploy models on devices

or infrastructure out of the control of the authors. Meanwhile, the emergence of DL Compilers makes it

possible to compile a DNN model into a lightweight binary for faster inference, which is attractive to many

stakeholders. However, distilling the essence of a model into a binary that is free to be examined by untrusted

parties creates a chance to leak essential information. With only DNN binary library, it is possible to extract

neural network architecture using reverse engineering. In this paper, we present LibSteal. This framework can

leak DNN architecture information by reversing the binary library generated from the DL Compiler, which

is similar to or even equivalent to the original. The evaluation shows that LibSteal can efﬁciently steal the

architecture information of victim DNN models. After training the extracted models with the same hyper-

parameter, we can achieve accuracy comparable to that of the original models.

1 INTRODUCTION

Machine learning models, especially deep neural net-

works (DNNs), have been widely deployed to tackle

challenging problems in computer vision (He et al.,

2016; Krizhevsky et al., 2012; Simonyan and Zis-

serman, 2014), speech recognition (Graves et al.,

2013; Hinton et al., 2012), natural language process-

ing (Collobert and Weston, 2008), and autonomous

driving (Kato et al., 2015). Compared to other ma-

chine learning technologies, the outstanding perfor-

mance of DNNs on recognition and prediction tasks

(LeCun et al., 2015; Schmidhuber, 2015) has seen its

commercial adoption with impacts across the ﬁeld. It

also increases the demand for Deep Learning (DL)

based services and the need to deploy deep learning

model on edge devices like mobile phones (Wu et al.,

2019; Liang et al., 2018). For example, to help users

who are blind or have low vision, some DNN mod-

els need to be deployed on the phone so that users

can use them to identify nearby objects more conve-

niently. Also, giant AI providers provide the so-called

service privatization to sell their high-quality DNN

models to other companies and organizations with a

license fee.

However, over the past decades, the explosion of

both DNN frameworks (Chollet et al., 2015; Abadi

et al., 2016; Markham and Jia, 2017; Chen et al.,

2015; Paszke et al., 2017) and hardware backend

(e.g., CPUs, GPUs, and FPGAs) increase the difﬁ-

culty of deploying DL based services to target plat-

forms. Such deployment requires signiﬁcant manual

effort due to the vast gap between DNN frameworks

and the hardware backend. The deep learning com-

piler kills two birds with one stone and draws the at-

tention of many stakeholders. Several DL compilers

have been proposed by both industrial and academic

actors recently, such as TVM (Chen et al., 2018),

Tensor Comprehension (Vasilache et al., 2018), Glow

(Rotem et al., 2018), nGraph (Cyphers et al., 2018),

and XLA (Leary and Wang, 2017). The DL com-

pilers take the models from different DL frameworks

as input and compile them into a lightweight binary

with faster inference efﬁciency for the target hard-

ware platform. However, the DNN programs gener-

ated from DL compilers make it possible for an at-

tacker to leak the internal work of DNN models by re-

versing or decompiling the stand-alone programs (Liu

et al., 2022; Wu et al., 2022; Chen et al., 2022). Un-

like the existing work targeting the whole DNN ex-

Zhang, J., Wang, P. and Wu, D.

LibSteal: Model Extraction Attack Towards Deep Learning Compilers by Reversing DNN Binary Library.

DOI: 10.5220/0011754900003464

In Proceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2023), pages 283-292

ISBN: 978-989-758-647-7; ISSN: 2184-4895

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

283

ecutables, in this paper, we narrow down the threat

model where we only have access to the DNN binary

library. We propose a framework named LibSteal to

leak the network architecture information of the target

DNN model by using only DNN binary library.

The architecture information includes the layer

types, attributes, dimensions, and connectivity of the

layers. In order to get this information, we designed

our framework into three parts: binary analyzer, layer

identiﬁcation, and search engine. The binary analyzer

slices the program into layer functions and extracts

their I/O dimensions. Also, we ﬁnd the nested loops

of each control ﬂow graph (CFG) at this step to iden-

tify the most signiﬁcant features of each layer because

the computation of the DNN layers mostly depends

on the matrix. According to our observation, each

layer has its unique computation pattern and these

computation patterns remain the same across DNN

models. Therefore, at the layer identiﬁcation part, we

ﬁrst iterate possible layer attributes to generate layer

functions with the same I/O dimensions and use these

functions to make up the layer repo. Then we com-

pare the similarity between the layer functions of the

victim model and the generated layer functions in the

layer repo to obtain the layer types and attributes. As

for the search engine, we ﬁrst build a directed graph

based on I/O dimensions. After that, we search for the

possible connection between the layers heuristically.

We implement the prototype of LibSteal based on

Uroboros (Wang et al., 2015) and adopt the idea pro-

posed by Asm2Vec (Ding et al., 2019) to accomplish

the similarity comparison between layer functions. To

demonstrate the practicality and the effectiveness of

our attack framework, we evaluate it against bina-

ries compiled from four widely-used DNN models,

MNIST (LeCun, 1998), VGG (Simonyan and Zisser-

man, 2014), ResNet (He et al., 2016), and MobileNet

(Howard et al., 2017) using TVM (Chen et al., 2018).

The experimental result shows that our framework

can effectively extract the neural network architecture

information. The reconstructed models have similar

or even equivalent network architecture to the origi-

nal. We then re-trained the extracted models and they

all achieved accuracy comparable to that of the origi-

nal models.

In summary, we make the following contributions:

• We narrow down the threat model from the DNN

executable to the DNN binary library. With lim-

ited input, we are able to leak essential informa-

tion about the DNN model architecture.

• We design and implement the framework LibSteal

to achieve our goal, which consists of three parts

and combines various techniques to deliver a de-

cent pipeline.

DL Frameworks

…

Computational graph representation

(High Level IR)

Target Hardware Platform: CPU, GPU, some also supports ASIC, …

Low Level IR

Third-party Toolchain

(LLVM, CUDA, OpenCL, …)

Computation

graph

Optimizations

Hardware Specific

Optimizations

Code

generation

Compiler

frontend

Compiler

backend

Figure 1: The overview of deep learning compilers’ com-

monly adopted design architecture.

• We have evaluated our framework on four widely-

used model binaries using the TVM as the DL

compiler. The results indicate that our frame-

work can handle MNIST, VGG, ResNet, and Mo-

bileNet DNN models. With the stolen informa-

tion, the reconstructed models have similar or

even equivalent network architecture to the origi-

nal and can achieve inference accuracy compara-

ble to that of the original.

2 BACKGROUND

2.1 Deep Learning Compiler

The main purpose of deep learning compiler (Leary

and Wang, 2017; Chen et al., 2018; Cyphers et al.,

2018; Vasilache et al., 2018) is to reduce the manual

effort required when deploying DL models to various

hardware backends. As shown in Figure 1, the DL

compiler takes different types of DNN frameworks,

like TensorFlow (Abadi et al., 2016), PyTorch (Paszke

et al., 2017), MXNet (Chen et al., 2015), Caffe2

(Markham and Jia, 2017), and Keras (Ketkar, 2017),

as input and generates the optimized code for the dif-

ferent target hardware platforms. Most DL Compil-

ers support CPU and GPU, and some also support

Application-speciﬁc integrated circuits (ASICs) like

TPU (Jouppi et al., 2017).

The key component of the DL Compiler is the

multi-level intermediate representation (IR) (Li et al.,

2020), which is an abstraction of the program and

is mainly used to apply the optimization at differ-

ent levels. The compilation process can be divided

into the frontend and backend. At the compiler fron-

tend, the DL models will be transferred to the high-

level IR, also known as computational graph repre-

sentation. The computational graph is independent

of the target hardware platform. At this level, sev-

eral different-grained optimizations will be applied to

the computation graph. At the compiler backend, the

high-level IR will further be represented as low-level

IR for hardware-speciﬁc optimizations and code gen-

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

284

Attacker

Device

Deployable

Package

.json

.so

.params

DL Compiler

DNN

Figure 2: Threat Model. The DL compiler compiles the

DNN model and deploys it on the edge device. The attacker

can only access the .so part (i.e., shared library).

eration on the target hardware platform. The DL com-

piler also supports the existing infrastructures such as

LLVM (Lattner and Adve, 2004) to utilize the third-

party toolchain for further code generation and opti-

mizations. The generated package consists of three

parts: a shared library ﬁle with all the layer functions,

a JSON ﬁle with the connectivity of the network ar-

chitecture, and a data ﬁle with the parameter weights.

Our target in this paper is the shared library ﬁle.

2.2 Deep Neural Networks

Deep Neural Network (DNN) is a sub-area of Deep

Learning in Artiﬁcial Neural networks (ANNs) with

multiple layers between the input and output. A DNN

can be represented as a function y = f (x) where the

input x ∈ R

and the output y ∈ R

. In order to get

the function, massive computation and a large amount

of data are required for the training. A DNN model

has the following essential characteristics: (1) Neural

network architecture consists of layer types, layer

dimensions, connection topology between layers, and

layer attributes. (2) Hyper-parameters are the con-

ﬁguration data for the training process. They control

the efﬁciency of the training process and affect the

ﬁnal performance of the model. (3) Parameters are

updated during the training process. The performance

of the model signiﬁcantly relies on them. In our paper,

we aim to steal the network architecture information

of the DNN model through our framework.

2.3 Binary Reverse Engineering

Binary reverse engineering is the process of analyz-

ing a binary program to identify the program’s com-

ponents and functionalities. It is imperative to under-

stand the inner logic of the program for further op-

erations, especially when the source code is unavail-

able. There are plenty of mature reverse engineer-

ing frameworks from both academia (Brumley et al.,

2011; Wang and Shoshitaishvili, 2017; Wang et al.,

2015; Song et al., 2008) and industry (Eagle, 2011).

These frameworks are designed to recover as much

information as possible on the target program for dif-

ferent purposes, which can either be maintaining the

legacy code, understanding the behavior of the mal-

ware, or exploring the vulnerabilities in the program.

In this paper, we implement our framework based on

the Uroboros (Wang et al., 2015) to analyze the DNN

binaries.

3 THREAT MODEL

This paper uses TVM (Chen et al., 2018) as the tar-

get DL Compiler. Having a concrete target is mostly

to ease the technical discussions. The essence of our

work applies to most major DL compilers on the mar-

ket. As shown in Figure 2, the deployable package

generated by TVM consists of three parts: a JSON-

formatted speciﬁcation ﬁle, a shared library (.so), and

parameter weights (.params). Among them, both the

JSON ﬁle and the shared library are important for in-

ferring information about the neural network archi-

tecture. The JSON ﬁle contains the connection topol-

ogy between the layers of the DNN models, and the

shared library includes all the unique layer functions.

In this paper, we assume that the attacker is motivated

to leak the DNN model architecture information for

malicious usage and can only access the shared li-

brary. This scenario is reasonable and has a practi-

cal impact because, as far as we know, the JSON ﬁle

is a speciﬁc implementation of the TVM, and other

DL compilers do not have this design. Even the TVM

community plans to remove this text ﬁle in the future.

We also assume that the victim DNN binary libraries

are not obfuscated because, to the best of our knowl-

edge, DL compilers themselves do not apply any soft-

ware defensive mechanism to the generated binary.

4 ATTACK DESIGN

4.1 Overview

As shown in Figure 3, our attack framework consists

of three parts, the binary analyzer, the layer identiﬁca-

tion module, and the search engine. First, we feed the

DNN library to the binary analyzer. The analyzer dis-

assembles the binary and slices it into different layer

functions. We leverage the information from the data

section to extract the layer dimensions for each layer

function. Moreover, we apply nested loop analysis

LibSteal: Model Extraction Attack Towards Deep Learning Compilers by Reversing DNN Binary Library

285

Rebuilt DNN

Model

DNN Library

Binary

Analyzer

Layer

Functions

Layer

Demensions

Layer

Identification

Engine

Layer

Types

Layer Attributes

2 3

Figure 3: LibSteal Workﬂow Overview. 1 The binary analyzer takes DNN binary library as input and disassembles it to

get the sliced layer function. With further analysis, we also extract the layer dimensions and ﬁnd the nested loop at this step;

2 At the layer identiﬁcation step, we leverage the immutable computation pattern to compare the similarity of unknown

victim layer functions with customized candidate layer functions and identify their layer types and attributes. 3 Based on

layer types, attributes, and dimensions, the search engine ﬁnds the connectivity between layers and rebuilds the network

architecture of the model.

to identify each layer function’s existing nested loop.

The details will be discussed in Section 4.2.

The next step is to identify the layer type and at-

tributes of the layer functions. We use the layer di-

mensions to generate all possible layer functions with

the same I/O dimension and store them in the can-

didate layer repository. Since the computation pat-

tern of each kind of layer remains the same across

different models, the layers with the same layer di-

mensions, type, and attributes will lead to very simi-

lar layer functions. Therefore, in order to obtain layer

types and attributes, we use collected layer candidates

to train the representative learning model so that we

can check the similarity between victim layer func-

tions and candidate layers functions. We present the

detail in Section 4.3.

In the last step, we use the search engine to recover

the model architecture topology connection using the

information collected in the above two steps. We ﬁrst

build a directed graph based on the layer dimensions.

Then in order to make the search process more efﬁ-

cient, we make some heuristic pruning based on the

layer types. Along with the built graph, we explore

a result containing all layers and get the connection

topology between layers.

4.2 Binary Analysis

The Binary Analyzer uses Uroboros (Wang et al.,

2015) as the binary reverse engineering framework

to disassemble and analyze the victim DNN library,

which help us to slice the program into separated layer

functions and recover the precise control ﬂow graph

of each function.

4.2.1 Layer Dimensions

This section shows how we infer the layer dimen-

sions. Figure 4 shows a part of a layer function com-

piled from VGG16, which is abstracted from the ac-

tual result, where we only replace the recovered sym-

section .text

...

LAYER_FUNC:

...

1 cmp dword [rax], 0x1

2 jne LABEL

3 cmp dword [rax + 0x8], 0x40

4 jne (label)

5 cmp dword [rax + 0x10], 0x20

6 jne (label)

7 cmp dword [rax + 0x18], 0x20

8 jne (label)

...

LABEL:

9 mov rax, qword [LABEL_GOT]

10 lea rdi, [STR]

11 call qword [rax]

12 pop rcx

13 ret

...

section .rodata

14 STR: "Assert fail" ...

...

section .got

15 LABEL_GOT: qword __TVMAPISetLastError

Figure 4: Example of a layer function from VGG16 DNN

model.

bol with a more readable one. We ﬁnd that every

layer function will check the constraint of the data

dimensions before the computation so that the mem-

ory will not mess up during the runtime. Therefore,

we locate the Basic Block invoking the error report

function. In line 9, the code loads the address of

TVMAPISetLastError, which sets the last error mes-

sage before return, to the register rax. The function is

indirectly called in line 11 with the error message set

to STR in line 10. According to the message carried

by STR, this is the exception caused by the mismatch

of the input and actual data dimensions. When we

trace back to line 2 in Figure 4, we ﬁnd the compari-

son between a memory-loaded number and a constant

number in line 1 and ﬁgure out that one of the data

dimension numbers is equal to 1. Following the same

routine, we get a set of numbers. The numbers can

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

286

(a) Victim layer (b) MaxPooling2D layer (c) AveragePooling2D layer (d) Conv2D layer

Figure 5: The comparison between layer functions CFG with the same I/O dimensions. All four layer functions have the same

input dimension (1,64,32,32) and the same output dimension (1,64,16,16).

be separated into groups based on their memory ad-

dress. For example, four numbers in Figure 4 are in

one group. According to our observation, each group

represents one data dimension of the layer function.

Therefore, the data dimension extracted from Figure

4 is 1, 64,32,32. Typically, one layer function only

has one input dimension and one output dimension.

However, some layers require multiple inputs, like

Add layers whose input can be a list of tensors with

the same shape. Fortunately, all constraints of dimen-

sion data are checked in order.

4.2.2 Nested Loop Analysis

The main goal of nested loop analysis is to ﬁnd

each layer function’s most signiﬁcant computation

features. After the DNN model is compiled into

the binary, the layer functions contain computation

code and trivial instructions like push/pop and data

load/store, which will inﬂuence the similarity check

in the next step. Moreover, as shown in Figure 5, dif-

ferent types of layer functions have different numbers

of nested loops, which is reasonable because the com-

putation of pooling layers is relatively more straight-

forward than the Conv2D layer. This feature can be

used to validate the result of the layer identiﬁcation

process to increase accuracy.

In order to ﬁnd the nested loop from each layer

function, we use Algorithm 1 to apply the analysis.

In this algorithm, we demonstrate CFG as G = (V,E)

where V is the set of basic blocks, and E is the set of

directed ﬂow between basic blocks. In the ﬁrst step

(Line 1-13), we traverse the whole CFG beginning

from the entry basic blocks (Line 2-3) and timestamp

each basic block when they are popped out from the

stack (Line 10-11). In the second step (Line 14-23),

we ﬁrst create a reversed CFG, G

′

= (V

′

) from the

original CFG, G (Line 14-16). The direction of ﬂows

of G

′

are opposite from G. After that, we search V

′

to ﬁnd the basic block with the latest timestamp (Line

Algorithm 1: Nested Loop Analysis Algorithm.

Input: G = (V,E) {Control ﬂow graph}

1: stack ←

2: stack.push(G.entryBlock)

3: visitedBlocks ← {G.entryBlocks}

4: while not stack.empty() do

5: u ← stack.top()

6: if v ∈ u.successors and v /∈ visitedBlocks then

7: stack.push(v)

8: visitedBlocks.add(v)

9: else

10: stack.pop()

11: timestamp(u)

12: end if

13: end while

14: V

′

← V

15: E

′

← reversed(E) {the directions of all edges are

opposite}

16: G

′

← (V

′

)

17: nestedLoops ←

18: while not V

′

.empty() do

19: for u ∈ V

′

20: v ← u : v ? u.timestamp > v.timestamp

21: end for

22: blockSet ← traverse(G

′

,v)

23: V

′

← V

′

− blocksSet

24: if blockSet.size() > 1 or v.isSel f Loop() then

25: nestedLoops.add(blockSet)

26: end if

27: end while

18-20). And starting from this basic block, we try to

traverse the reversed CFG, G

′

(Line 21). All the basic

blocks reached by this point are no doubt members of

a nested loop. We then eliminate these basic blocks

from V

′

(Line 22) and continue the job until all basic

blocks are revisited. For the record, we also consider

the self-loop as the nested loop.

LibSteal: Model Extraction Attack Towards Deep Learning Compilers by Reversing DNN Binary Library

287

4.3 Layer Identiﬁcation

4.3.1 Layer Generator

The fundamental of layer identiﬁcation is that for two

layers, even if they are from different models, as long

as they share the same layer type, I/O dimensions, and

layer attributes, they are compiled into very similar

layer functions. For many layer types, the relationship

between the output and input dimensions is based on

their attributes. We take the Conv2D layer as an ex-

ample. Assume the input dimension is (N,C,H,W ),

where N is the sample number, C is the channel num-

ber, and H,W is the resolution of the data. Then we

have

out

= ⌊

H + 2 · P − D · (K − 1) − 1

+ 1⌋

out

= ⌊

W + 2 · P − D · (K − 1) − 1

+ 1⌋

where P is the value of padding, D is the value

of dilation, K is the value of kernel size, and S

is the value of stride. Therefore, for each pair of

I/O layer dimensions, we generate possible candi-

date layer functions to form a layer repo. As shown

in Figure 5, (a) is the computation part of the layer

function we discuss in Figure 4. After applying

the method we present in Section 4.2, we get its

I/O dimensions as (1,64, 32,32) and (1,64, 16,16).

There are several possible layers with this kind of

I/O dimensions. We choose three representative

layers to illustrate our approach: MaxPooling2D

with pool size=2 (b), AveragePooling2D with

pool size=2 (c), and Conv2D with filters=64,

kernel size=2, strides=2, padding=same (d).

As we can see, the CFG of the victim layer function

shown in (a) is the same as the layer function of Max-

Pooling2D with pool size=2 (b).

4.3.2 Layer Function Representation Learning

Our representation learning model is built based on

the idea proposed by Asm2Vec (Ding et al., 2019).

After ﬁnishing the training of the representation learn-

ing model, we ﬁrst use the nested loop analysis ap-

proach in Section 4.2.2 to lift the computation pattern

of candidate layer functions. We then use the trained

model to produce the vector of each candidate layer

function. The vector of the victim function is also

generated from the nested loop of the functions. Fi-

nally, we calculate the similarity score and make the

inference decision for the victim function.

As for the example shown in Figure 5, the simi-

larity between the victim layer and MaxPooling2D is

0.9356, while its similarity with AveragePooling2D is

0.522 and with Conv2D is 0.272. Therefore, we deter-

mine that the layer function from VGG16 is MaxPool-

ing2D with pool size=2. This result is reasonable

because the computation patterns MaxPooling2D and

AveragePooling2D are similar. They follow the same

routine to slide through the data. However, the key

operations are different.

4.4 Model Reconstruction

After gathering valuable information, including layer

dimensions, types, and attributes, we can reconstruct

the model architecture.

At ﬁrst, we build a directed graph, G

), where each layer represents a vertex v, and

an edge e, pointing from v

to v

means the output

dimension of v

matches the input dimension of v

Also, we divide the layer into supportive layers (e.g.,

ReLu, BN) and functional layers (e.g., Conv2D, Pool-

ing2D). As we mentioned before, the shared library

only contains unique functions. The victim model

will obviously use the most supportive layers multi-

ple times. Therefore, to reduce the search space, we

assume that each functional layer can only be used

once and should all be used in the ﬁnal reconstructed

model, and as for the supportive layers, we do not

limit their usage. Moreover, once a functional layer

ﬁnds out that it can link multiple supportive layers,

we directly add them to the path. For example, if a

Conv2D layer ﬁnds itself connected to a ReLu layer

and a BN layer, we will combine them as Conv2D →

BN → ReLu structure. Additionally, when we meet

the layer with multiple inputs, we trace back the data

ﬂow graph to ﬁnd another input and link it back to the

former path. Finally, along the directed graph G

, we

search from input layer to the output layer and recon-

struct the DNN model.

5 EVALUATION

5.1 Experiment Setup

Environment: All the experiments are run on the

Ubuntu 18.04 LTS server with NVIDIA TITAN XP

GPU and dual-core 2.20 GHz Intel(R) Xeno(R)

Silver 4114 CPU. We pick CUDA 10.3 as the GPU

programming interface. For the training of the repre-

sentative learning model, we choose embedding di-

mension d = 200, 25 negative samples, 3 random

walks, and a learning rate of 0.025.

Victim Model: We evaluate our attack on four

widely-used DNN models: MNIST, VGG16,

ResNet20, and MobileNet. The detailed information

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

288

Table 1: Victim Models Information.

Datasets Input Shape # of Parameters # of layers # of layer types

MNIST MNIST (28,28,1) 34,826 11 7

VGG16 CIFAR-10 (32,32,3) 150,001,418 60 7

ResNet20 CIFAR-10 (32,32,3) 19 274,442 72 8

MobileNet CIFAR-10 (32,32,3) 3,239,114 91 9

Conv

ReLU

MaxPool

Conv

ReLU

MaxPool

Flatten

Dropout

Softmax

Conv

ReLU

MaxPool

Conv

ReLU

MaxPool

Flatten

Softmax

(a) MNIST

MaxPool

Conv

ReLU

Dropout

Conv

ReLU

Dropout

Conv

ReLU

MaxPool

Conv

ReLU

MaxPool

Conv

ReLU

(b) VGG16

Add

ReLU

Conv

ReLU

Conv

Add

ReLU

Conv

ReLU

Conv

Add

ReLU

Conv

ReLU

Conv

Add

ReLU

Add

ReLU

Conv

ReLU

Conv

Add

ReLU

Conv

ReLU

Conv

Add

ReLU

Conv

ReLU

ZeroPadding

DWConv

ReLU

Conv

ReLU

DWConv

ReLU

Conv

ReLU

DWConv

ReLU

Conv

ReLU

DWConv

ReLU

Conv

ReLU

DWConv

ReLU

Conv

ReLU

DWConv

ReLU

ZeroPadding

Conv

ReLU

ZeroPadding

DWConv

ReLU

Conv

ReLU

DWConv

ReLU

ZeroPadding

(d) MobileNet

Figure 6: Architecture Comparison. This ﬁgure partially shows the difference of the network architecture between original

models and extracted models. MNIST

of all victim models is shown in Table 1. All the pre-

trained models are designed in the Keras framework

(Team, 2022a; Team, 2022b) with Tensorﬂow as the

backbone and compiled by TVM (Chen et al., 2018)

to generate the binaries. We use LLVM (Lattner and

Adve, 2004) as our host platform. As mentioned in

Section 2, we assume that we only have access to the

DNN binary library during the attack.

5.2 Architectural Completeness

This section compares the architectures of the orig-

inal models and the extracted models. Figure 6 vi-

sualizes part of the architecture of original models

and extracted models for MNIST, VGG16, ResNet20,

and MobileNet. Each block in the ﬁgure represents a

DNN layer. As shown in the ﬁgure, the basic architec-

ture remains the same in extracted models. However,

several layers are missing. First, the dropout layer is

LibSteal: Model Extraction Attack Towards Deep Learning Compilers by Reversing DNN Binary Library

289

Table 2: Statistics of the comparison between original model and extracted model.

Test accuracy Layer number Layer type number

Original (%) Extracted (%) Original Extracted Percentage (%) Original Extracted

MNIST 99.17 99.04 11 10 90.91 7 6

VGG16 93.16 90.59 60 38 63.33 7 6

ResNet20 91.65 83.92 72 49 68.06 8 8

MobileNet 83.16 75.00 91 66 72.53 9 8

used to avoid over-ﬁtting during the training phase by

setting randomly selected input units to 0, so it will be

invalid during the inference.

Regarding other missing layers, as we mentioned

in Section 4.3, the same type of layer with the same

I/O dimensions and attributes will produce the same

layer function. This rule of thumb allows us to iden-

tify the layer type and its related attributes but also

leads to the situation that the layer function can be

reused in the DNN executable. As shown in Figure 6,

the recovery of MobileNet is least satisfying. The rea-

son for missing a larger chunk in this model is because

that the original DNN model is basically the composi-

tion of the same pattern repeated for four times. Since

our attack targets the DNN library, it is difﬁcult to

infer the number of repetitions solely from the code,

as the same code can be executed for arbitrary times

during inference. On the other hand, Figure 6a shows

that as long as each layer of the DNN model holds an

unique layer function, we are able to fully recover the

whole network architecture.

5.3 Accuracy of Extracted Models

We compared other statistics information between

original DNN models with the extracted ones as

shown in Table 2. In order to evaluate the function-

ality of the extracted model, we re-trained the ex-

tracted models and compared their accuracy with the

original models. As for the training settings, we use

the same hyper-parameter as the original one for a

more precise comparison. For the record, we re-train

the model to demonstrate the accuracy of extracted

model architecture which does not assume that our

framework requires the training dataset and the hyper-

parameters. As shown in the Table 2, with the miss-

ing layers, it is not surprising that the accuracy of

VGG16, ResNet20, and MobileNet is worse than the

original ones. However, we can see that the accuracy

drop of ResNet20 and MobileNet is more signiﬁcant

than that of VGG16, although we recover more lay-

ers for ResNet20 and MobileNet. We guess that the

importance of layers varies inside the neural network

architecture. Although VGG16 missed more layers,

the key skeleton still remains. As for the layer types,

so far, the only layer we are not able to recover is the

Dropout layer, which will disappear during the infer-

ence process.

6 DISCUSSION

The limitation of our attack framework is due to the

connection between layers. The shared library com-

piled from DNN models only contains the distinct

layer functions, so we need to know the exact used

number of each layer function. One solution is that we

can enlarge our search space and use meta-learning to

make us closer to the original network architecture.

For example, we do not limit the usage of any function

layers and allow the search engine to explore the pos-

sible combination. We train and test each explored ar-

chitecture’s accuracy and leave the best result. How-

ever, time-consuming will be out of imagination. On

the other hand, if we can access the JSON ﬁle or even

the parameter ﬁle, we can recover the precise DNN

models, which makes our work more meaningful.

7 RELATED WORK

The basic logic of the model extraction attack is

leveraging the information gathered from a different

source to leak the vital features of the machine learn-

ing model. The most commonly used source is the

side channel (Hua et al., 2018; Yan et al., 2020; Wei

et al., 2018; Xiang et al., 2020; Duddu et al., 2018;

Hunt et al., 2020; Batina et al., 2019). For example,

by exploiting the memory and timing side-channel,

(Hua et al., 2018) presented a model extraction at-

tack to infer the network architecture and identify the

value of parameters of the convolutional neural net-

work (CNN) running on a hardware accelerator. Bus

trafﬁc is also an important information source (Zhu

et al., 2021; Hu et al., 2019). (Zhu et al., 2021) iden-

tiﬁed a new attack surface based on encrypted PCIe

trafﬁc and fully extracted the DNN model with the

exact model characteristics, and achieved the same

inference accuracy as the target model. Some ex-

citing work also relied on the query-prediction pairs

from the target model (Tram

er et al., 2016; Oh et al.,

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

290

2019; Orekondy et al., 2019; Kariyappa et al., 2021).

(Tram

er et al., 2016) relied on the information car-

ried by the output from the ML prediction APIs to

generate a similar or the same model and successfully

applied the attack against the online services.

8 CONCLUSIONS

The rising of DL compilers and the privatization sit-

uation introduce a new threat to the ML community.

Several novel attack framework has been proposed.

However, all of them assume they have full access to

the DNN binaries. In this paper, we narrow down the

threat model and demonstrate that with only the DNN

binary library, we can leak the network architecture

information of DNN models. We propose a frame-

work, namely LibSteal, using only the DNN library

to get the layer types, attributes, dimensions, and even

compatible topology connections. We implemented a

prototype of LibSteal and evaluated it on four DNN

models compiled from TVM. The evaluation results

indicate that our framework can reconstruct a simi-

lar or even equivalent model architecture compared to

the original one, achieving comparable accuracy after

training with the public datasets with only the library

ﬁles.

ACKNOWLEDGEMENTS

This research was supported in part by the National

Science Foundation (NSF) grant CNS-1652790 and

the Ofﬁce of Naval Research (ONR) grant N00014-

17-1-2894.

REFERENCES

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean,

J., Devin, M., Ghemawat, S., Irving, G., Isard, M.,

et al. (2016). Tensorﬂow: A system for large-scale

machine learning. In OSDI 16, pages 265–283.

Batina, L., Bhasin, S., Jap, D., and Picek, S. (2019). CSI

NN: Reverse engineering of neural network archi-

tectures through electromagnetic side channel. In

USENIX Security 19, pages 515–532.

Brumley, D., Jager, I., Avgerinos, T., and Schwartz, E. J.

(2011). Bap: A binary analysis platform. In CAV

2011, pages 463–469. Springer.

Chen, S., Khanpour, H., Liu, C., and Yang, W. (2022).

Learning to reverse dnns from ai programs automat-

ically. arXiv preprint arXiv:2205.10364.

Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M.,

Xiao, T., Xu, B., Zhang, C., and Zhang, Z. (2015).

Mxnet: A ﬂexible and efﬁcient machine learning li-

brary for heterogeneous distributed systems. arXiv

preprint arXiv:1512.01274.

Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen,

H., Cowan, M., Wang, L., Hu, Y., Ceze, L., et al.

(2018). TVM: An automated end-to-end optimizing

compiler for deep learning. In OSDI 18, pages 578–

594.

Chollet, F. et al. (2015). Keras. https://keras.io.

Collobert, R. and Weston, J. (2008). A uniﬁed architec-

ture for natural language processing: Deep neural net-

works with multitask learning. In ICML 2008 , pages

160–167.

Cyphers, S., Bansal, A. K., Bhiwandiwalla, A., Bobba,

J., Brookhart, M., Chakraborty, A., Constable, W.,

Convey, C., Cook, L., Kanawi, O., et al. (2018).

Intel ngraph: An intermediate representation, com-

piler, and executor for deep learning. arXiv preprint

arXiv:1801.08058.

Ding, S. H., Fung, B. C., and Charland, P. (2019). Asm2vec:

Boosting static representation robustness for binary

clone search against code obfuscation and compiler

optimization. In IEEE S&P 2019, pages 472–489.

IEEE.

Duddu, V., Samanta, D., Rao, D. V., and Balas, V. E. (2018).

Stealing neural networks via timing side channels.

arXiv preprint arXiv:1812.11720.

Eagle, C. (2011). The IDA pro book. no starch press.

Graves, A., Mohamed, A.-r., and Hinton, G. (2013). Speech

recognition with deep recurrent neural networks. In

ICASSP 2013, pages 6645–6649. Ieee.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In CVPR 2016,

pages 770–778.

Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-

r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P.,

Sainath, T. N., et al. (2012). Deep neural networks for

acoustic modeling in speech recognition: The shared

views of four research groups. IEEE Signal processing

magazine, 29(6):82–97.

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D.,

Wang, W., Weyand, T., Andreetto, M., and Adam,

H. (2017). Mobilenets: Efﬁcient convolutional neu-

ral networks for mobile vision applications. arXiv

preprint arXiv:1704.04861.

Hu, X., Liang, L., Deng, L., Li, S., Xie, X., Ji, Y.,

Ding, Y., Liu, C., Sherwood, T., and Xie, Y. (2019).

Neural network model extraction attacks in edge de-

vices by hearing architectural hints. arXiv preprint

arXiv:1903.03916.

Hua, W., Zhang, Z., and Suh, G. E. (2018). Reverse engi-

neering convolutional neural networks through side-

channel information leaks. In DAC 2018, pages 1–6.

IEEE.

Hunt, T., Jia, Z., Miller, V., Szekely, A., Hu, Y., Rossbach,

C. J., and Witchel, E. (2020). Telekine: Secure com-

puting with cloud GPUs. In 17th USENIX Sympo-

sium on Networked Systems Design and Implementa-

tion (NSDI 20), pages 817–833.

LibSteal: Model Extraction Attack Towards Deep Learning Compilers by Reversing DNN Binary Library

291

Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal,

G., Bajwa, R., Bates, S., Bhatia, S., Boden, N.,

Borchers, A., et al. (2017). In-datacenter performance

analysis of a tensor processing unit. In ISCA ’17,

pages 1–12.

Kariyappa, S., Prakash, A., and Qureshi, M. K. (2021).

Maze: Data-free model stealing attack using zeroth-

order gradient estimation. In CVPR 2021, pages

13814–13823.

Kato, S., Takeuchi, E., Ishiguro, Y., Ninomiya, Y., Takeda,

K., and Hamada, T. (2015). An open approach to au-

tonomous vehicles. IEEE Micro, 35(6):60–68.

Ketkar, N. (2017). Introduction to keras. In Deep learning

with Python, pages 97–111. Springer.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-

agenet classiﬁcation with deep convolutional neural

networks. Advances in neural information processing

systems, 25.

Lattner, C. and Adve, V. (2004). Llvm: A compilation

framework for lifelong program analysis & transfor-

mation. In CGO 2004., pages 75–86. IEEE.

Leary, C. and Wang, T. (2017). Xla: Tensorﬂow, compiled.

TensorFlow Dev Summit.

LeCun, Y. (1998). The mnist database of handwritten digits.

http://yann. lecun. com/exdb/mnist/.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-

ing. nature, 521(7553):436–444.

Li, M., Liu, Y., Liu, X., Sun, Q., You, X., Yang, H.,

Luan, Z., and Qian, D. (2020). The deep learning

compiler: A comprehensive survey. arXiv preprint

arXiv:2002.03794.

Liang, Y., Cai, Z., Yu, J., Han, Q., and Li, Y. (2018). Deep

learning based inference of private information using

embedded sensors in smart devices. IEEE Network,

32(4):8–14.

Liu, Z., Yuan, Y., Wang, S., Xie, X., and Ma, L. (2022).

Decompiling x86 deep neural network executables.

arXiv preprint arXiv:2210.01075.

Markham, A. and Jia, Y. (2017). Caffe2: Portable high-

performance deep learning framework from facebook.

NVIDIA Corporation.

Oh, S. J., Schiele, B., and Fritz, M. (2019). Towards

reverse-engineering black-box neural networks. In

Explainable AI: Interpreting, Explaining and Visual-

izing Deep Learning, pages 121–144. Springer.

Orekondy, T., Schiele, B., and Fritz, M. (2019). Knockoff

nets: Stealing functionality of black-box models. In

CVPR 2019, pages 4954–4963.

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E.,

DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and

Lerer, A. (2017). Automatic differentiation in pytorch.

In NIPS-W.

Rotem, N., Fix, J., Abdulrasool, S., Catron, G., Deng, S.,

Dzhabarov, R., Gibson, N., Hegeman, J., Lele, M.,

Levenstein, R., et al. (2018). Glow: Graph lower-

ing compiler techniques for neural networks. arXiv

preprint arXiv:1805.00907.

Schmidhuber, J. (2015). Deep learning in neural networks:

An overview. Neural networks, 61:85–117.

Simonyan, K. and Zisserman, A. (2014). Very deep con-

volutional networks for large-scale image recognition.

arXiv preprint arXiv:1409.1556.

Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I.,

Kang, M. G., Liang, Z., Newsome, J., Poosankam,

P., and Saxena, P. (2008). Bitblaze: A new approach

to computer security via binary analysis. In ICISSP

2008, pages 1–25. Springer.

Team, K. (2022a). Keras applications.

Team, K. (2022b). Keras examples.

Tram

er, F., Zhang, F., Juels, A., Reiter, M. K., and Risten-

part, T. (2016). Stealing machine learning models via

prediction apis. In USENIX Security 16, pages 601–

618.

Vasilache, N., Zinenko, O., Theodoridis, T., Goyal,

P., DeVito, Z., Moses, W. S., Verdoolaege, S.,

Adams, A., and Cohen, A. (2018). Tensor com-

prehensions: Framework-agnostic high-performance

machine learning abstractions. arXiv preprint

arXiv:1802.04730.

Wang, F. and Shoshitaishvili, Y. (2017). Angr-the next gen-

eration of binary analysis. In 2017 IEEE Cybersecu-

rity Development (SecDev), pages 8–9. IEEE.

Wang, S., Wang, P., and Wu, D. (2015). Reassembleable

disassembling. In USENIX Security 15, pages 627–

642.

Wei, L., Luo, B., Li, Y., Liu, Y., and Xu, Q. (2018). I know

what you see: Power side-channel attack on convo-

lutional neural network accelerators. In ACSAC ’18,

pages 393–406.

Wu, C.-J., Brooks, D., Chen, K., Chen, D., Choudhury, S.,

Dukhan, M., Hazelwood, K., Isaac, E., Jia, Y., Jia, B.,

et al. (2019). Machine learning at facebook: Under-

standing inference at the edge. In HPCA 2019, pages

331–344. IEEE.

Wu, R., Kim, T., Tian, D. J., Bianchi, A., and Xu, D. (2022).

DnD: A cross-architecture deep neural network de-

compiler. In USENIX Security 22, pages 2135–2152.

Xiang, Y., Chen, Z., Chen, Z., Fang, Z., Hao, H., Chen, J.,

Liu, Y., Wu, Z., Xuan, Q., and Yang, X. (2020). Open

dnn box by power side-channel attack. IEEE Trans-

actions on Circuits and Systems II: Express Briefs,

67(11):2717–2721.

Yan, M., Fletcher, C. W., and Torrellas, J. (2020). Cache

telepathy: Leveraging shared resource attacks to learn

DNN architectures. In USENIX Security 20, pages

2003–2020.

Zhu, Y., Cheng, Y., Zhou, H., and Lu, Y. (2021). Hermes

attack: Steal DNN models with lossless inference ac-

curacy. In USENIX Security 21.

ENASE 2023 - 18th International Conference on Evaluation of Novel Approaches to Software Engineering

292