The Use of the DWARF Debugging Format for the Identification of
Potentially Unwanted Applications (PUAs) in WebAssembly Binaries
Calebe Helpa
1
, Tiago Heinrich
2 a
, Marcus Botacin
3 b
, Newton C. Will
4 c
,
Rafael R. Obelheiro
5 d
and Carlos A. Maziero
1 e
1
Computer Science Department, Federal University of Paraná, Curitiba, 81530–015, Brazil
2
Max Planck Institute for Informatics (MPI), Saarbrücken, 66123, Germany
3
Texas A&M University, College Station, TX, 77843, U.S.A.
4
Computer Science Department, Federal University of Technology, Paraná, Dois Vizinhos, 85660–000, Brazil
5
Computer Science Department, State University of Santa Catarina, Joinville, 89219–710, Brazil
Keywords:
WebAssembly, Intrusion Detection, Security.
Abstract:
Debugging formats are well-known means to store information from an application, that help developers to
find errors, bugs, or unexpected behavior during the development period. The Debugging With Attributed
Record Format (DWARF) is an example of a generic format that can be used for a range of programming
languages and formats, such as WebAssembly, a low-level binary format that provides a compilation target for
high-level languages. Given the use of debugging formats, their potential for intrusion detection is still un-
known. Our study consists of evaluating the use of data extracted with the DWARF format, and their respective
potential for an intrusion detection solution. In this context, we present a strategy for identifying Potentially
Unwanted Application (PUA) in WebAssembly binaries, through feature extraction and static analysis using
the DWARF format as a data source from WebAssembly binary. Our results are promising, with an overall
f1score performance above 96% for the algorithms.
1 INTRODUCTION
WebAssembly is a low-level binary format that pro-
vides a compilation target for high-level languages
(Hoffman, 2019). It aims to support web applications,
offering fast processing support and decreasing mem-
ory usage to load web pages (Falliere, 2018). At the
same time, it works in different browsers (Romano
et al., 2022).
DWARF is a debugging information file format,
supported by different compilers and debuggers to
allow developers access to a high-level debugger
(DWARF, 2023). It is supported by languages such
as C, C++, Fortran, and WebAssembly. The main
use of such a format is during the debugging process,
where breakpoints can be set, operation data can be
a
https://orcid.org/0000-0002-8017-1293
b
https://orcid.org/0000-0001-6870-1178
c
https://orcid.org/0000-0003-2976-4533
d
https://orcid.org/0000-0002-4014-6691
e
https://orcid.org/0000-0003-2592-3664
viewed, or the tracing of code sections can be made.
The format represents information using a tree, where
the nodes represent data, types, and functions (Eager,
2012). While DWARF information is usually pro-
duced during compilation from source, one may gen-
erate DWARF information from decompiled binaries
if source code is lacking.
The use of debugging format outside the develop-
ment environment is limited. Taking into account the
information that this type of format offers from appli-
cations, evaluation strategies can benefit. Specifically,
security solutions can use this information for a threat
investigation and detection process.
Rogue software that may compromise the privacy
of a system or weaken its security is denoted Poten-
tially Unwanted Application (PUA) (Pickard and Mi-
ladinov, 2012). In most cases, the identification of
PUAs involves evaluating binaries using static and/or
dynamic analysis. The former depends on the extrac-
tion of information found in the binaries, while the
latter observes their behavior during execution.
442
Helpa, C., Heinrich, T., Botacin, M., Will, N., Obelheiro, R. and Maziero, C.
The Use of the DWARF Debugging Format for the Identification of Potentially Unwanted Applications (PUAs) in WebAssembly Binaries.
DOI: 10.5220/0012754500003767
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 21st International Conference on Security and Cryptography (SECRYPT 2024), pages 442-449
ISBN: 978-989-758-709-2; ISSN: 2184-7711
Proceedings Copyright © 2024 by SCITEPRESS – Science and Technology Publications, Lda.
WebAssembly applications have the potential for
malicious use due to the binary format, which makes
it difficult to identify the purpose of each program.
In this way, malicious users exploit the design of the
WebAssembly format to carry out cryptojacking at-
tacks and the obfuscation of malicious code (Naseem
et al., 2021). To mitigate such risks, recent studies fo-
cus on correcting design flaws, adding new features
and improving already implemented features, eval-
uating memory-related issues (Michael et al., 2023)
and improving the design of the compiler (Bosamiya
et al., 2022).
Our proposal consists of evaluating the poten-
tial of using information extracted from debugging
formats for an intrusion detection solution. We
use the DWARF format and identify a set of fea-
tures that can be used to represent the application
and applied in the identification of PUAs. Since
WebAssembly applications are being widely adopted
by major web browsers, and considering the security
problems observed in the field, we focus on the use of
WebAssembly binaries as a case study. We extracted
information from real applications and evaluated how
Machine Learning (ML) models can be used for an in-
trusion detection process. ML strategies have promis-
ing results in the security area, allowing the classifi-
cation of application behaviors (Ceschin et al., 2024).
Our contributions are:
The evaluation of the DWARF debugging format
for extracting information from WebAssembly bi-
naries, for a subsequent intrusion detection pro-
cess; and
A strategy for PUA identification in
WebAssembly binaries.
Our proposal presented a better understanding of
how information retrieved from debug formats such
as DWARF can be used for an intrusion detection pro-
cess. We achieve interesting results for PUA identifi-
cation, with f1score above 95% and accuracy above
96% for the algorithms tested with cross-validation.
The remainder of this paper is structured as fol-
lows: Section 2 presents the background; Section 3
discusses the proposal; Section 4 presents the evalu-
ation; Section 5 reviews related work; and Section 6
concludes the paper.
2 BACKGROUND
This section presents the background for under-
standing the work, including the DWARF format,
WebAssembly format, intrusion detection, and static
analysis.
2.1 DWARF Format
The Debugging With Attributed Record Format
(DWARF) is a debugging information file that allows
source-level debugging (DWARF, 2023). Debuggers
and compilers can use the format to represent the ap-
plications in a tree structure, in which types, variables,
and functions form the data sections of the DWARF
format (Eager, 2012).
In DWARF, a Debugging Information Entry (DIE)
describes an attribute in a program. The structure of
DIE makes it a parent or property of one. In this way,
the structure of the program is maintained for an eval-
uation process. The description found in a DIE will
present information about attributes such as variables,
constants, and references.
Code 1 presents a function that returns an integer
that is printed as output. The output of this code in
DWARF file format is presented in Code 2.
i n t f o o ( ) {
i n t x = 1 ;
r e t urn x ;
}
i n t main ( ) {
p r i n t f ( " V a lu e : %d " , foo ( ) ) ;
r e t urn 0 ;
}
Code 1: An example of code in C with a function that
returns a value to be printed.
As shown in Code 2, the entire structure of the
C program is represented through the DWARF for-
mat. From the function foo, to the variables present
throughout the code is presented in a tree-like struc-
ture. With each DIE representing the information in
the current scope of the application. In this way, the
DWARF format presents the information in a way that
it is possible to follow the process of executing the C
code alongside the DWARF format.
In addition to this information being useful for the
debugging process, they appear to have the potential
for intrusion detection solutions, since they are capa-
ble of representing the key functionalities presented
in a program. For other languages, a similar out-
put is expected, with additions for language features.
WebAssembly applications can be converted to sup-
port the DWARF format even with access only to the
application binary.
2.2 WebAssembly
WebAssembly is a binary format targeted at the
Web. WebAssembly code can be generated from the
WebAssembly Text (WAT) textual format or by com-
pilers that allow the translation of codes from high-
The Use of the DWARF Debugging Format for the Identification of Potentially Unwanted Applications (PUAs) in WebAssembly Binaries
443
level languages such as C, C++, Go, and Rust to
WebAssembly (Hoffman, 2019).
<1 >: TA G_ c omp ile _un it [ 1 ]
*
AT _pr odu ce r ( " Ap pl e LLVM v e r s i o n 9 . 0 . 0 ( c l a n g
9 0 0 . 0 . 3 9 . 2 ) " )
AT _l an gu age (DW_LANG_C99)
AT_name ( " main . c " )
A T _ s t m t _ l i s t ( 0 x00 00000 0 )
AT_comp_dir ( " / U s e r s / b a r / Docum ents / " )
AT_low_pc ( 0 x00000000000 00 00 0 )
AT_ hig h_pc (0 x00 00004 1 )
<2 >: TAG_subprogram [ 2 ]
*
AT_low_pc ( 0 x00000000000 00 00 0 )
AT_ hig h_pc (0 x00 00001 0 )
AT _f ra me_ba se ( r b p )
AT_name ( " fo o " )
A T _ d e c l _ f i l e ( " main . c " )
A T _ d e c l _ l i n e ( 7 )
AT_type ({ 0 x00000 06b } ( i n t ) )
A T _ e x t e r n a l ( t r u e )
<3 >: TAG _ v ari a b le [ 3 ]
A T _ l o c a t i o n ( f b r e g 4)
AT_name ( " x " )
A T _ d e c l _ f i l e ( " main . c " )
A T _ d e c l _ l i n e ( 8 )
AT_type ({ 0 x00000 06b } ( i n t ) )
<4 >: NULL
<5 >: TAG_subprogram [ 4 ]
AT_low_pc ( 0 x00000000000 00 01 0 )
AT_ hig h_pc (0 x00 00003 1 )
AT _f ra me_ba se ( r b p )
AT_name ( " main " )
A T _ d e c l _ f i l e ( " main . c " )
A T _ d e c l _ l i n e ( 1 2 )
AT_type ({ 0 x00000 06b } ( i n t ) )
A T _ e x t e r n a l ( t r u e )
<6 >: TAG_ bas e_typ e [ 5 ]
AT_name ( " i n t " )
AT_enco di ng ( DW_ATE_signed )
A T _ b y t e _ s i z e ( 0 x04 )
<7 >: NULL
Code 2: DWARF output example, based on the source code
presented in Code 1.
A module consists of a WebAssembly application,
which contains function definitions, global variables,
linear memories, and indirect call tables. Functions
and variables, as well as other program elements, are
identified by indices represented by integer numbers
(Lehmann et al., 2020). A WebAssembly module usu-
ally has three sections: Preamble with module start
information, Default which contains all application
information such as functions, and Custom which has
information for debug (Kim et al., 2022). Figure 1
shows in detail the structure of a WebAssembly bi-
nary.
Only four primitive types are supported: i32, i64,
f32, and f64, representing a 32-bit or 64-bit integer or
a 32-bit or 64-bit floating point number, respectively.
WebAssembly uses a format of binary code in-
structions that can be debugged using converters that
make the machine code readable. Tools are avail-
able and make it more practical to analyze sec-
tions of WebAssembly code (Falliere, 2018). The
WebAssembly instruction format was designed with
a focus on ensuring the safety of its users. Its key
features for security are:
Virtualized Environment: WebAssembly modules
run in a virtual machine based on the stack model.
All input/output interactions and access to operat-
ing system resources must be performed through
functions incorporated by WebAssembly, which
must be imported by the module. Therefore,
WebAssembly is able to establish security poli-
cies for developers and is able to assure users that
their environment and system resources are being
accessed by modules in a limited and controlled
way (Rossberg, 2018).
Linear Memory: The linear memory of Web-
Assembly modules is instantiated in managed
buffers. This way, read and write operations are
limited to certain areas of memory (Kim et al.,
2022).
Control Flow Integrity (CFI): Through a struc-
tured control flow generated during the com-
pilation process, WebAssembly modules are
protected against attacks such as shellcode injec-
tion or the abuse of unrestricted jumps carried
out indirectly (Kim et al., 2022). However, the
WebAssembly CFI is not as effective as modern
CFIs used for native binary defenses, with some
calls vulnerable to malicious use (Lehmann et al.,
2020).
2.3 Intrusion Detection
Intrusion Detection Systems (IDS) are security mech-
anisms that have the purpose of monitoring hosts, ap-
plications, and networks for signs of attacks and in-
trusions (Stallings et al., 2012). One way of perform-
ing intrusion detection on applications is by analyzing
application code using static and/or dynamic analysis
(Chandola et al., 2009; Liu et al., 2018; Castanhel
et al., 2021; Lemos et al., 2022).
Static analysis is performed by extracting features
from the code, but without executing them, thus defin-
ing an abstract representation of the program’s be-
havior (Kirchmayr et al., 2016). This technique has
been widely used, especially in critical systems such
as those used in aviation and air traffic control.
Dynamic analysis is a software analysis technique
that allows evaluating the behavior of the program
SECRYPT 2024 - 21st International Conference on Security and Cryptography
444
Table 1: Representation of a WebAssembly binary. Structurally describing the expected composition of a WebAssembly
binary file.
Preamble Magic Version
Standard
Section
Type Import Function Table Memory Global Export Start Code Element Data
Custom
Section
Any kind
of data
during its execution. Dynamic analysis is performed
through tests and simulations, which makes it possi-
ble to identify programming errors, unexpected be-
haviors, and security flaws during the interaction be-
tween the code and the environment in which it is ex-
ecuted (Kirchmayr et al., 2016; Lemos et al., 2023).
3 PROPOSAL
This section presents our proposal. Section 3.1 de-
scribes the threat model. Section 3.2 presents the
studied strategy. Section 3.3 discusses the character-
istics used by binaries in the detection process.
3.1 Threat Model
In the threat model, we consider that the adversary
explores the WebAssembly format for the deploy-
ment of malicious content. It is assumed that access
to the DWARF data will always exist, since even a
WebAssembly binary without the DWARF informa-
tion, in a later process will be possible to generate
the data. Our PUA evaluation will be made by a
static strategy, where the information available in the
DWARF format will be accessed and used in a detec-
tion process.
3.2 Strategy
Our strategy is to evaluate how debug formats, such as
DWARF, can be used in an intrusion detection solu-
tion. The advantage of using debug formats is directly
associated with data access since the format will of-
fer access to information that would not be accessible
from other observation points.
Debug formats allow the intrusion detection strat-
egy to have an almost complete understanding of the
application and its execution process since all code
structures will be taken into account by the debug-
ging tools. Therefore, the use of this data for detection
may prove useful in situations in which the source is
unknown or code evaluation is possible. Achieving
these goals involves two main steps:
extracting key characteristics from binaries in
DWARF format; and
defining an evaluation strategy to detect PUAs
based on these characteristics.
To evaluate the potential of the DWARF format,
we selected WebAssembly binaries from well-known
datasets (Lehmann and Pradel, 2022; Stiévenart et al.,
2022), and extracted debug information from each bi-
nary in DWARF format using llvm-dwarfdump
1
. The
information extracted will be discussed in Section 3.3.
For WebAssembly binaries that lack debugging
information, we apply the process shown in Figure 1.
The binary is decompiled using wasm2wat
2
, produc-
ing source in WAT format (1). The WAT source is
compiled with debugging symbols using wasmtime
3
(2), and the DWARF information is extracted from
the new binary using llvm-dwarfdump (3).
WebAssembly
Module
Transformation
WebAssembly
DWARF
WebAssembly
+
DWARF
1
2
3
Figure 1: Process of transformation of a binary to
WebAssembly to generate a binary supporting the DWARF
format.
Our PUA identification strategy uses machine
learning algorithms. We selected multi-class algo-
rithms to better understand the impact of the infor-
mation extracted from the binaries in the models.
The choice of algorithms is based on previous related
work (Galante et al., 2019; Castanhel et al., 2020;
Lemos et al., 2023; Heinrich et al., 2024).
3.3 Binary Characteristics
After converting the WebAssembly binaries to sup-
port the DWARF format, information can be extracted
from them. The DWARF format provides informa-
tion from a set of tags and attributes present in a
WebAssembly binary (as presented in Section 3.2).
This information allows a later application analysis
process, for the identification of implementation er-
rors or, in the case of this work, for the identification
of PUAs that may reflect an intrusion.
The DWARF format offers different tags accord-
ing to the format of the language being analyzed (as
1
https://llvm.org/docs/CommandGuide/llvm-dwarfdu
mp.html
2
https://github.com/WebAssembly/wabt
3
https://wasmtime.dev/
The Use of the DWARF Debugging Format for the Identification of Potentially Unwanted Applications (PUAs) in WebAssembly Binaries
445
presented in Section 2.1). Instead of using all the tags
available for WebAssembly, we chose to select the
relevant attributes for classification. For this selection
process, we collected information about all the tags
available for the WebAssembly format and evaluated
the importance of these features for the models.
We limited our selection based on specific sections
of the WebAssembly binary (as presented in Table 1).
The data presented in the preamble and standard sec-
tion presents the key characteristics of the application
and the functionality of the WebAssembly module,
such as functions, variables, export data, memory al-
location, and binary compilation.
After this process, we evaluated the tags that were
relevant to our classification strategy (Section 3.2).
This process generated a set of key information from
the DWARF format that defined six groups covering
the extracted information
4
.
General Information: from the application, such as
program size, the number of labels in the code,
and the source code language. In addition to
describing the application, we hypothesize that
this information can help classifiers as it offers
a view of the language used before the port to
WebAssembly;
Routines and Subprograms: considering the num-
ber of declared variables, the number of declara-
tions of inline subroutines, the number of declara-
tions of subprograms, and the number of param-
eters of these subprograms. This information is
important to describe the operations that a binary
can perform;
Variables: which includes the number of type decla-
rations, declared integer types, declared unsigned
integer types, declared word types, and declared
file pointer types. As WebAssembly has a re-
stricted set of native data types, it is necessary to
define these types in WebAssembly modules, con-
sequently, this information also helps in describ-
ing the operations performed by a binary;
Parameters: which consist of the information found
in each DIE, like operations performed by a func-
tion;
Memory Information: frequency of declared mem-
bers when using structures or classes. We hypoth-
esize that information found in memory can assist
in a detection process; and
Shared Data: includes data on the number of at-
tributes that determine whether the subroutines
are part of an external program or produce ex-
ternally accessible information and the number of
4
We also provide a list of tags used in the Appendix 6.
tags and attributes related to the use of function
calls.
When extracting information from the DWARF
format, different regions of the application can be ac-
cessed in each DIE. The selected groups allow ac-
cess to 29 tags in WebAssembly DWARF format. The
number of tags that can be accessed by the DWARF
format will vary according to the language supported.
The tags present specific information about the ap-
plication behavior or represent some type of operation
performed by the program. For an intrusion detection
proposal, the raw information extracted can be used,
or the frequency of appearance that may present a pat-
tern.
4 EVALUATION
This section presents the evaluation of our proposal.
Section 4.1 describes the objective of the evaluation.
Section 4.2 presents the dataset. Section 4.3 discusses
of the results.
4.1 Objective
Our purpose is to evaluate the usefulness of employ-
ing information extracted from the debugging format
for PUA identification. We use static information
from WebAssembly binaries to identify threats, inves-
tigating the feasibility of using information present in
WebAssembly binaries through the DWARF format.
For the experiments we selected four machine
learning algorithms: Multi-layer Perceptron (MLP),
Random Forest (RF), Support Vector Machines (SVM)
and XGBoost. These models aim to demonstrate how
the learning process behaves when considering the
characteristics selected for the benign and malicious
classification classes.
4.2 Dataset
To perform the evaluation, a dataset is needed. In-
stead of building a completely new dataset, we used
already available data from two sources (Lehmann
and Pradel, 2022; Stiévenart et al., 2022). The bi-
naries were selected taking into account the available
samples to define both benign and malicious samples.
To balance the classes of malicious and benign sam-
ples, we normalized the number of samples between
the two classes. We selected samples to obtain a sub-
set with the same characteristics as the entire dataset
SECRYPT 2024 - 21st International Conference on Security and Cryptography
446
(our code is publicly available
5
).
After the selection process, the binaries were
transformed to support the DWARF format (as pre-
sented in Section 3.2). In total, 770 samples were se-
lected for the experiment. These samples are divided
into 400 benign samples and 370 malicious samples.
The purpose of the samples is to present recurring be-
haviors for benign applications and vulnerabilities or
implementation errors for malicious samples.
Before running the experiments information needs
to be extracted from the DWARF format of each
WebAssembly application. Taking into account the
binary characteristics presented in Section 3.3, we ex-
tracted all tags in the DWARF format that were incor-
porated into the six defined groups.
The features extracted from the binaries are saved
to a file, which contains the appearance count of the
attributes and subsequent encodings of extracted vari-
ables. After this process, the information was used for
the model training and testing process.
4.3 Results
The classification results obtained by using machine
learning algorithms, trained with the information ex-
tracted by the DWARF format, are presented in this
section. The algorithms were trained and tested in a
1:1 ratio, that is, 50% of the data was used for training
and 50% for testing. An exhaustive parameter search
was made, aiming to find the best configurations for
the models. We also perform the test with 10-fold
cross-validation, as it is the standard for this type of
assessment.
Table 2 presents the results achieved by the algo-
rithms evaluated, considering the usual metrics in the
area of machine learning. The results achieved var-
ied according to the classification strategy used by the
machine learning models. However, the results are fa-
vorable for a PUA detection strategy using the multi-
class algorithms. Multi-class algorithms are trained to
classify two or more classes, being capable of defin-
ing patterns that represent each of the classes.
The precision highlights that models like MLP
and SVM had the biggest impact due to false posi-
tives, with the models classifying benign samples as
malicious. Despite the small percentage of false pos-
itives, the error was responsible for the impact found
in the F1Score.
The models were not affected by false negatives,
as portrayed by the recall. The best metric to describe
the model’s result is the F1Score, in which we notice
the impact of false positives. For the classification
5
https://github.com/CalebeHelpa/webassembly-classif
ication
of binaries to perform the identification of PUA, the
overall F1Score demonstrates that the features used to
train and test the classifiers are sufficient for a classi-
fication process.
The accuracy demonstrates the impact of true pos-
itives and true negatives, demonstrating that the mod-
els were able to adequately learn the patterns of the
binaries through the extracted features. The values
achieved by the Balanced Accuracy (BAC) are evi-
dence that the model learning persists.
With these results, we conclude that debugging
formats, such as DWARF, have the potential to ex-
tract information from binaries that later can be used
for intrusion detection solutions. We also presented a
static strategy for PUA detection in the WebAssembly
application. Although we only explore the use of the
DWARF format for WebAssembly applications, the
information extracted through the use of the DWARF
format showed promising results for the use of infor-
mation extracted through debugging tools to detect
PUA.
5 RELATED WORK
Some well-known strategies are used to collect data
from applications and perform intrusion detection.
Solutions that use system calls are an example, in
which, traces of the application execution are used
to define application behavior and identify anomalies
that may correspond to an intrusion (Liu et al., 2018).
The use of bytecode is also explored in the intru-
sion detection field. Bytecode contains a sequence of
instructions that represents the program without any
redundancy, facilitating the process of compiling or
interpreting source code into machine code. The so-
lution proposed by (Ashouri et al., 2021) relies on
Java bytecode to intercept runtime attacks. Bytecode
is also used to detect malware on Android systems, by
extracting features and using convolution neural net-
works to classify malicious applications (Ding et al.,
2020).
In web applications, bytecode sequences are used
to detect malicious behavior in JavaScript code,
such as cross-site scripting and redirections (Rozi
et al., 2020). The use of bytecode allows to bypass
JavaScript obfuscation.
Approaches that focus on the static evaluation
of WebAssembly binaries are aimed at identifying
vulnerabilities in the developed codes that can be
exploited or cause an error in production. These
tools are also aimed at generating the flow of
WebAssembly applications, aiming to identify vul-
nerabilities that may have been ported from other lan-
The Use of the DWARF Debugging Format for the Identification of Potentially Unwanted Applications (PUAs) in WebAssembly Binaries
447
Table 2: Performance of algorithms for PUA using DWARF format data.
10-Fold Cross-validation
Classifier Precision Recall F1Score Accuracy BAC
MLP 93.40% 100% 96.59% 96.62% 96.77%
RandomForest 98.24% 100% 99.11% 99.22% 99.31%
SVM 94.89% 100% 97.38% 97.66% 97.94%
XGBoost 97.34% 100% 98.65% 98.70% 98.76%
guages (Quan et al., 2019; Brito et al., 2022).
To the best of our knowledge, our work is the first
to use the debugging format to extract features to ap-
ply in an intrusion detection system. Our work is able
to demonstrate the potential of using debug formats
for intrusion detection.
6 CONCLUSION
In this paper, we present a novel intrusion detection
approach using debugging formats to extract features
from application code. To validate our proposal, we
built a dataset with WebAssembly applications, a bi-
nary format that has seen rapid adoption on the Web.
The features were extracted from DWARF format, a
debugging information file format used by many com-
pilers and debuggers to support source level debug-
ging.
We evaluated our approach with multi-class ma-
chine learning algorithms, obtaining promising re-
sults, especially with ensemble algorithms. Thus, we
showed the potential of using debugging formats to
extract information from binaries to perform intrusion
detection.
Unfortunately, using debugging formats has some
limitations. Debugging symbols increase the size of
compiled binaries and thus are usually stripped from
distributed binaries to save space. Having a process
for dealing with such binaries, as described in Sec-
tion 3.2, alleviates this problem. The DWARF format
may also, on rare occasions, not be capable of debug-
ging the information itself. Some languages/compil-
ers may not support the format, or support only a sub-
set of its functionalities (Bastian et al., 2019); further
experimentation is needed to investigate the impact of
partial support for DWARF in our proposal.
ACKNOWLEDGEMENTS
This study was financed in part by the Coordenação
de Aperfeiçoamento de Pessoal de Nível Superior
Brasil (CAPES) Finance Code 001 and Fundação
de Amparo à Pesquisa e Inovação do Estado de
Santa Catarina (FAPESC). The authors also thank the
UDESC, UFPR and UTFPR Computer Science de-
partments.
REFERENCES
Ashouri, M., Kreitz, C., Austin, T. H., and Bordim, J. L.
(2021). JACY: A robust JVM-based intrusion detec-
tion and security analysis system.
Bastian, T., Kell, S., and Zappa Nardelli, F. (2019). Reli-
able and fast DWARF-based stack unwinding. Pro-
ceedings of the ACM on Programming Languages,
3(OOPSLA):1–24.
Bosamiya, J., Lim, W. S., and Parno, B. (2022).
Provably-Safe multilingual software sandboxing us-
ing WebAssembly. In Proceedings of the 31st
USENIX Security Symposium, pages 1975–1992,
Boston, MA, USA. USENIX Association.
Brito, T., Lopes, P., Santos, N., and Santos, J. F. (2022).
Wasmati: An efficient static vulnerability scanner for
WebAssembly. Computers & Security, 118:102745.
Castanhel, G. R., Heinrich, T., Ceschin, F., and Maziero,
C. A. (2020). Sliding window: The impact of
trace size in anomaly detection system for containers
through machine learning. In XVIII Regional School
of Computer Networks, pages 141–146, Virtual Event.
SBC.
Castanhel, G. R., Heinrich, T., Ceschin, F., and Maziero,
C. A. (2021). Taking a peek: An evaluation of
anomaly detection using system calls for containers.
In Proceedings of the 26th IEEE Symposium on Com-
puters and Communications, Athens, Greece. IEEE.
Ceschin, F., Botacin, M., Bifet, A., Pfahringer, B., Oliveira,
L. S., Gomes, H. M., and Grégio, A. (2024). Machine
learning (in) security: A stream of problems. Digital
Threats, 5(1).
Chandola, V., Banerjee, A., and Kumar, V. (2009).
Anomaly detection: A survey. ACM Computing Sur-
veys, 41(3):1–58.
Ding, Y., Zhang, X., Hu, J., and Xu, W. (2020). Android
malware detection method based on bytecode image.
Journal of Ambient Intelligence and Humanized Com-
puting, 14(5):6401–6410.
DWARF (2023). DWARF debugging information format. ht
tps://dwarfstd.org/. DWARF Debugging Information
Format Committee.
Eager, M. J. (2012). Introduction to the dwarf debugging
format. https://dwarfstd.org/doc/Debugging%20usi
ng%20DWARF-2012.pdf.
SECRYPT 2024 - 21st International Conference on Security and Cryptography
448
Falliere, N. (2018). Reverse engineering WebAssembly. ht
tps://www.pnfsoftware.com/reversing-wasm.pdf.
Galante, L., Botacin, M., Grégio, A., and de Geus, P.
(2019). Forseti: Extração de características e classi-
ficação de binários ELF. In Anais Estendidos do XIX
Simpósio Brasileiro de Segurança da Informação e de
Sistemas Computacionais, pages 5–10, São Paulo, SP,
Brazil. SBC.
Heinrich, T., Will, N. C., Obelheiro, R. R., and Maziero,
C. A. (2024). A categorical data approach for anomaly
detection in WebAssembly applications. In Proceed-
ings of the 10th International Conference on Infor-
mation Systems Security and Privacy, pages 275–284,
Rome, Italy. SciTePress.
Hoffman, K. (2019). Programming WebAssembly with
Rust: Unified Development for Web, Mobile, and
Embedded Applications. The Pragmatic Bookshelf,
Raleigh, NC, USA.
Kim, M., Jang, H., and Shin, Y. (2022). Avengers, Assem-
ble! survey of WebAssembly security solutions. In
Proceedings of the 15th International Conference on
Cloud Computing, pages 543–553, Barcelona, Spain.
IEEE.
Kirchmayr, W., Moser, M., Nocke, L., Pichler, J., and To-
ber, R. (2016). Integration of static and dynamic code
analysis for understanding legacy source code. In Pro-
ceedings of the International Conference on Software
Maintenance and Evolution, pages 543–552, Raleigh,
NC, USA. IEEE.
Lehmann, D., Kinder, J., and Pradel, M. (2020). Everything
old is new again: Binary security of WebAssembly. In
Proceedings of the 29th USENIX Security Symposium,
pages 217–234, Boston, MA, USA. USENIX Associ-
ation.
Lehmann, D. and Pradel, M. (2022). Finding the DWARF:
Recovering precise types from WebAssembly bina-
ries. In Proceedings of the 43rd International Con-
ference on Programming Language Design and Im-
plementation, pages 410–425, San Diego, CA, USA.
ACM.
Lemos, R., Heinrich, T., Maziero, C. A., and Will, N. C.
(2022). Is it safe? identifying malicious apps through
the use of metadata and inter-process communica-
tion. In Proceedings of the 16th Annual IEEE Interna-
tional Systems Conference, pages 1–8, Montreal, QC,
Canada. IEEE.
Lemos, R., Heinrich, T., Will, N. C., Obelheiro, R. R., and
Maziero, C. A. (2023). Inspecting binder transactions
to detect anomalies in android. In Proceedings of the
17th Annual IEEE International Systems Conference,
Vancouver, BC, Canada. IEEE.
Liu, M., Xue, Z., Xu, X., Zhong, C., and Chen, J. (2018).
Host-based intrusion detection system with system
calls: Review and future trends. ACM Computing Sur-
veys, 51(5):98.
Michael, A. E., Gollamudi, A., Bosamiya, J., Johnson, E.,
Denlinger, A., Disselkoen, C., Watt, C., Parno, B.,
Patrignani, M., Vassena, M., and Stefan, D. (2023).
MSWasm: Soundly enforcing memory-safe execution
of unsafe code. Proceedings of the ACM on Program-
ming Languages, 7(POPL).
Naseem, F. N., Aris, A., Babun, L., Tekiner, E., and Ulu-
agac, A. S. (2021). MINOS: A lightweight real-time
cryptojacking detection system. In Proceedings of the
Network and Distributed System Security Symposium,
Virtual Event. Internet Society.
Pickard, C. and Miladinov, S. (2012). Rogue software:
Protection against potentially unwanted applications.
In Proceedings of the 7th International Conference
on Malicious and Unwanted Software, Fajardo, PR,
USA. IEEE.
Quan, L., Wu, L., and Wang, H. (2019). EVulHunter: De-
tecting fake transfer vulnerabilities for EOSIO’s smart
contracts at WebAssembly-level.
Romano, A., Lehmann, D., Pradel, M., and Wang, W.
(2022). Wobfuscator: Obfuscating JavaScript mal-
ware via opportunistic translation to WebAssembly.
In Proceedings of the 43rd Symposium on Security
and Privacy, pages 1574–1589, San Francisco, CA,
USA. IEEE.
Rossberg, A. (2018). Webassembly specification. https:
//webassembly.github.io/spec/core/_download/WebA
ssembly.pdf.
Rozi, M. F., Kim, S., and Ozawa, S. (2020). Deep neu-
ral networks for malicious javascript detection using
bytecode sequences. In Proceedings of the Interna-
tional Joint Conference on Neural Networks, pages 1–
8, Glasgow, UK. IEEE.
Stallings, W., Brown, L., Bauer, M. D., and Bhattachar-
jee, A. K. (2012). Computer Security: Principles and
Practice. Pearson.
Stiévenart, Q., De Roover, C., and Ghafari, M. (2022). Se-
curity risks of porting C programs to WebAssembly.
In Proceedings of the 37th Symposium on Applied
Computing, pages 1713–1722, Virtual Event. ACM.
APPENDIX
Tags Considered in the Training Process
lines, language, dw_tag_subprogram,
dw_tag_typedef, dw_tag_member, dw_tag_label,
dw_tag_gnu_call_site, dw_at_gnu_all_call_sites,
dw_tag_inlined_subroutine, dw_at_external,
dw_at_call_file, int_type, uint_type, string_type,
fp_type, bool_type, dw_tag_enumerator,
dw_tag_variable_int, dw_tag_variable_uint,
dw_tag_variable_string, dw_tag_variable_fp,
dw_tag_variable_bool, dw_tag_variable,
dw_tag_formal_parameter_int,
dw_tag_formal_parameter_uint,
dw_tag_formal_parameter_string,
dw_tag_formal_parameter_fp,
dw_tag_formal_parameter,
dw_tag_formal_parameter_bool
The Use of the DWARF Debugging Format for the Identification of Potentially Unwanted Applications (PUAs) in WebAssembly Binaries
449