The Use of the DWARF Debugging Format for the Identiﬁcation of

Potentially Unwanted Applications (PUAs) in WebAssembly Binaries

Calebe Helpa

, Tiago Heinrich

2 a

, Marcus Botacin

3 b

, Newton C. Will

4 c

Rafael R. Obelheiro

5 d

and Carlos A. Maziero

1 e

Computer Science Department, Federal University of Paraná, Curitiba, 81530–015, Brazil

Max Planck Institute for Informatics (MPI), Saarbrücken, 66123, Germany

Texas A&M University, College Station, TX, 77843, U.S.A.

Computer Science Department, Federal University of Technology, Paraná, Dois Vizinhos, 85660–000, Brazil

Computer Science Department, State University of Santa Catarina, Joinville, 89219–710, Brazil

Keywords:

WebAssembly, Intrusion Detection, Security.

Abstract:

Debugging formats are well-known means to store information from an application, that help developers to

ﬁnd errors, bugs, or unexpected behavior during the development period. The Debugging With Attributed

Record Format (DWARF) is an example of a generic format that can be used for a range of programming

languages and formats, such as WebAssembly, a low-level binary format that provides a compilation target for

high-level languages. Given the use of debugging formats, their potential for intrusion detection is still un-

known. Our study consists of evaluating the use of data extracted with the DWARF format, and their respective

potential for an intrusion detection solution. In this context, we present a strategy for identifying Potentially

Unwanted Application (PUA) in WebAssembly binaries, through feature extraction and static analysis using

the DWARF format as a data source from WebAssembly binary. Our results are promising, with an overall

f1score performance above 96% for the algorithms.

1 INTRODUCTION

WebAssembly is a low-level binary format that pro-

vides a compilation target for high-level languages

(Hoffman, 2019). It aims to support web applications,

offering fast processing support and decreasing mem-

ory usage to load web pages (Falliere, 2018). At the

same time, it works in different browsers (Romano

et al., 2022).

DWARF is a debugging information ﬁle format,

supported by different compilers and debuggers to

allow developers access to a high-level debugger

(DWARF, 2023). It is supported by languages such

as C, C++, Fortran, and WebAssembly. The main

use of such a format is during the debugging process,

where breakpoints can be set, operation data can be

https://orcid.org/0000-0002-8017-1293

https://orcid.org/0000-0001-6870-1178

https://orcid.org/0000-0003-2976-4533

https://orcid.org/0000-0002-4014-6691

https://orcid.org/0000-0003-2592-3664

viewed, or the tracing of code sections can be made.

The format represents information using a tree, where

the nodes represent data, types, and functions (Eager,

2012). While DWARF information is usually pro-

duced during compilation from source, one may gen-

erate DWARF information from decompiled binaries

if source code is lacking.

The use of debugging format outside the develop-

ment environment is limited. Taking into account the

information that this type of format offers from appli-

cations, evaluation strategies can beneﬁt. Speciﬁcally,

security solutions can use this information for a threat

investigation and detection process.

Rogue software that may compromise the privacy

of a system or weaken its security is denoted Poten-

tially Unwanted Application (PUA) (Pickard and Mi-

ladinov, 2012). In most cases, the identiﬁcation of

PUAs involves evaluating binaries using static and/or

dynamic analysis. The former depends on the extrac-

tion of information found in the binaries, while the

latter observes their behavior during execution.

442

Helpa, C., Heinrich, T., Botacin, M., Will, N., Obelheiro, R. and Maziero, C.

The Use of the DWARF Debugging Format for the Identiﬁcation of Potentially Unwanted Applications (PUAs) in WebAssembly Binaries.

DOI: 10.5220/0012754500003767

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 21st International Conference on Security and Cryptography (SECRYPT 2024), pages 442-449

ISBN: 978-989-758-709-2; ISSN: 2184-7711

WebAssembly applications have the potential for

malicious use due to the binary format, which makes

it difﬁcult to identify the purpose of each program.

In this way, malicious users exploit the design of the

WebAssembly format to carry out cryptojacking at-

tacks and the obfuscation of malicious code (Naseem

et al., 2021). To mitigate such risks, recent studies fo-

cus on correcting design ﬂaws, adding new features

and improving already implemented features, eval-

uating memory-related issues (Michael et al., 2023)

and improving the design of the compiler (Bosamiya

et al., 2022).

Our proposal consists of evaluating the poten-

tial of using information extracted from debugging

formats for an intrusion detection solution. We

use the DWARF format and identify a set of fea-

tures that can be used to represent the application

and applied in the identiﬁcation of PUAs. Since

WebAssembly applications are being widely adopted

by major web browsers, and considering the security

problems observed in the ﬁeld, we focus on the use of

WebAssembly binaries as a case study. We extracted

information from real applications and evaluated how

Machine Learning (ML) models can be used for an in-

trusion detection process. ML strategies have promis-

ing results in the security area, allowing the classiﬁ-

cation of application behaviors (Ceschin et al., 2024).

Our contributions are:

• The evaluation of the DWARF debugging format

for extracting information from WebAssembly bi-

naries, for a subsequent intrusion detection pro-

cess; and

• A strategy for PUA identiﬁcation in

WebAssembly binaries.

Our proposal presented a better understanding of

how information retrieved from debug formats such

as DWARF can be used for an intrusion detection pro-

cess. We achieve interesting results for PUA identiﬁ-

cation, with f1score above 95% and accuracy above

96% for the algorithms tested with cross-validation.

The remainder of this paper is structured as fol-

lows: Section 2 presents the background; Section 3

discusses the proposal; Section 4 presents the evalu-

ation; Section 5 reviews related work; and Section 6

concludes the paper.

2 BACKGROUND

This section presents the background for under-

standing the work, including the DWARF format,

WebAssembly format, intrusion detection, and static

analysis.

2.1 DWARF Format

The Debugging With Attributed Record Format

(DWARF) is a debugging information ﬁle that allows

source-level debugging (DWARF, 2023). Debuggers

and compilers can use the format to represent the ap-

plications in a tree structure, in which types, variables,

and functions form the data sections of the DWARF

format (Eager, 2012).

In DWARF, a Debugging Information Entry (DIE)

describes an attribute in a program. The structure of

DIE makes it a parent or property of one. In this way,

the structure of the program is maintained for an eval-

uation process. The description found in a DIE will

present information about attributes such as variables,

constants, and references.

Code 1 presents a function that returns an integer

that is printed as output. The output of this code in

DWARF ﬁle format is presented in Code 2.

i n t f o o ( ) {

i n t x = 1 ;

r e t urn x ;

}

i n t main ( ) {

p r i n t f ( " V a lu e : %d " , foo ( ) ) ;

r e t urn 0 ;

}

Code 1: An example of code in C with a function that

returns a value to be printed.

As shown in Code 2, the entire structure of the

C program is represented through the DWARF for-

mat. From the function foo, to the variables present

throughout the code is presented in a tree-like struc-

ture. With each DIE representing the information in

the current scope of the application. In this way, the

DWARF format presents the information in a way that

it is possible to follow the process of executing the C

code alongside the DWARF format.

In addition to this information being useful for the

debugging process, they appear to have the potential

for intrusion detection solutions, since they are capa-

ble of representing the key functionalities presented

in a program. For other languages, a similar out-

put is expected, with additions for language features.

WebAssembly applications can be converted to sup-

port the DWARF format even with access only to the

application binary.

2.2 WebAssembly

WebAssembly is a binary format targeted at the

Web. WebAssembly code can be generated from the

WebAssembly Text (WAT) textual format or by com-

pilers that allow the translation of codes from high-

The Use of the DWARF Debugging Format for the Identiﬁcation of Potentially Unwanted Applications (PUAs) in WebAssembly Binaries

443

level languages such as C, C++, Go, and Rust to

WebAssembly (Hoffman, 2019).

<1 >: TA G_ c omp ile _un it [ 1 ]

AT _pr odu ce r ( " Ap pl e LLVM v e r s i o n 9 . 0 . 0 ( c l a n g

− 9 0 0 . 0 . 3 9 . 2 ) " )

AT _l an gu age (DW_LANG_C99)

AT_name ( " main . c " )

A T _ s t m t _ l i s t ( 0 x00 00000 0 )

AT_comp_dir ( " / U s e r s / b a r / Docum ents / " )

AT_low_pc ( 0 x00000000000 00 00 0 )

AT_ hig h_pc (0 x00 00004 1 )

<2 >: TAG_subprogram [ 2 ]

AT_low_pc ( 0 x00000000000 00 00 0 )

AT_ hig h_pc (0 x00 00001 0 )

AT _f ra me_ba se ( r b p )

AT_name ( " fo o " )

A T _ d e c l _ f i l e ( " main . c " )

A T _ d e c l _ l i n e ( 7 )

AT_type ({ 0 x00000 06b } ( i n t ) )

A T _ e x t e r n a l ( t r u e )

<3 >: TAG _ v ari a b le [ 3 ]

A T _ l o c a t i o n ( f b r e g −4)

AT_name ( " x " )

A T _ d e c l _ f i l e ( " main . c " )

A T _ d e c l _ l i n e ( 8 )

AT_type ({ 0 x00000 06b } ( i n t ) )

<4 >: NULL

<5 >: TAG_subprogram [ 4 ]

AT_low_pc ( 0 x00000000000 00 01 0 )

AT_ hig h_pc (0 x00 00003 1 )

AT _f ra me_ba se ( r b p )

AT_name ( " main " )

A T _ d e c l _ f i l e ( " main . c " )

A T _ d e c l _ l i n e ( 1 2 )

AT_type ({ 0 x00000 06b } ( i n t ) )

A T _ e x t e r n a l ( t r u e )

<6 >: TAG_ bas e_typ e [ 5 ]

AT_name ( " i n t " )

AT_enco di ng ( DW_ATE_signed )

A T _ b y t e _ s i z e ( 0 x04 )

<7 >: NULL

Code 2: DWARF output example, based on the source code

presented in Code 1.

A module consists of a WebAssembly application,

which contains function deﬁnitions, global variables,

linear memories, and indirect call tables. Functions

and variables, as well as other program elements, are

identiﬁed by indices represented by integer numbers

(Lehmann et al., 2020). A WebAssembly module usu-

ally has three sections: Preamble with module start

information, Default which contains all application

information such as functions, and Custom which has

information for debug (Kim et al., 2022). Figure 1

shows in detail the structure of a WebAssembly bi-

nary.

Only four primitive types are supported: i32, i64,

f32, and f64, representing a 32-bit or 64-bit integer or

a 32-bit or 64-bit ﬂoating point number, respectively.

WebAssembly uses a format of binary code in-

structions that can be debugged using converters that

make the machine code readable. Tools are avail-

able and make it more practical to analyze sec-

tions of WebAssembly code (Falliere, 2018). The

WebAssembly instruction format was designed with

a focus on ensuring the safety of its users. Its key

features for security are:

Virtualized Environment: WebAssembly modules

run in a virtual machine based on the stack model.

All input/output interactions and access to operat-

ing system resources must be performed through

functions incorporated by WebAssembly, which

must be imported by the module. Therefore,

WebAssembly is able to establish security poli-

cies for developers and is able to assure users that

their environment and system resources are being

accessed by modules in a limited and controlled

way (Rossberg, 2018).

Linear Memory: The linear memory of Web-

Assembly modules is instantiated in managed

buffers. This way, read and write operations are

limited to certain areas of memory (Kim et al.,

2022).

Control Flow Integrity (CFI): Through a struc-

tured control ﬂow generated during the com-

pilation process, WebAssembly modules are

protected against attacks such as shellcode injec-

tion or the abuse of unrestricted jumps carried

out indirectly (Kim et al., 2022). However, the

WebAssembly CFI is not as effective as modern

CFIs used for native binary defenses, with some

calls vulnerable to malicious use (Lehmann et al.,

2020).

2.3 Intrusion Detection

Intrusion Detection Systems (IDS) are security mech-

anisms that have the purpose of monitoring hosts, ap-

plications, and networks for signs of attacks and in-

trusions (Stallings et al., 2012). One way of perform-

ing intrusion detection on applications is by analyzing

application code using static and/or dynamic analysis

(Chandola et al., 2009; Liu et al., 2018; Castanhel

et al., 2021; Lemos et al., 2022).

Static analysis is performed by extracting features

from the code, but without executing them, thus deﬁn-

ing an abstract representation of the program’s be-

havior (Kirchmayr et al., 2016). This technique has

been widely used, especially in critical systems such

as those used in aviation and air trafﬁc control.

Dynamic analysis is a software analysis technique

that allows evaluating the behavior of the program

SECRYPT 2024 - 21st International Conference on Security and Cryptography

444

Table 1: Representation of a WebAssembly binary. Structurally describing the expected composition of a WebAssembly

binary ﬁle.

Preamble Magic Version

Standard

Section

Type Import Function Table Memory Global Export Start Code Element Data

Custom

Section

Any kind

of data

during its execution. Dynamic analysis is performed

through tests and simulations, which makes it possi-

ble to identify programming errors, unexpected be-

haviors, and security ﬂaws during the interaction be-

tween the code and the environment in which it is ex-

ecuted (Kirchmayr et al., 2016; Lemos et al., 2023).

3 PROPOSAL

This section presents our proposal. Section 3.1 de-

scribes the threat model. Section 3.2 presents the

studied strategy. Section 3.3 discusses the character-

istics used by binaries in the detection process.

3.1 Threat Model

In the threat model, we consider that the adversary

explores the WebAssembly format for the deploy-

ment of malicious content. It is assumed that access

to the DWARF data will always exist, since even a

WebAssembly binary without the DWARF informa-

tion, in a later process will be possible to generate

the data. Our PUA evaluation will be made by a

static strategy, where the information available in the

DWARF format will be accessed and used in a detec-

tion process.

3.2 Strategy

Our strategy is to evaluate how debug formats, such as

DWARF, can be used in an intrusion detection solu-

tion. The advantage of using debug formats is directly

associated with data access since the format will of-

fer access to information that would not be accessible

from other observation points.

Debug formats allow the intrusion detection strat-

egy to have an almost complete understanding of the

application and its execution process since all code

structures will be taken into account by the debug-

ging tools. Therefore, the use of this data for detection

may prove useful in situations in which the source is

unknown or code evaluation is possible. Achieving

these goals involves two main steps:

• extracting key characteristics from binaries in

DWARF format; and

• deﬁning an evaluation strategy to detect PUAs

based on these characteristics.

To evaluate the potential of the DWARF format,

we selected WebAssembly binaries from well-known

datasets (Lehmann and Pradel, 2022; Stiévenart et al.,

2022), and extracted debug information from each bi-

nary in DWARF format using llvm-dwarfdump

. The

information extracted will be discussed in Section 3.3.

For WebAssembly binaries that lack debugging

information, we apply the process shown in Figure 1.

The binary is decompiled using wasm2wat

, produc-

ing source in WAT format (1). The WAT source is

compiled with debugging symbols using wasmtime

(2), and the DWARF information is extracted from

the new binary using llvm-dwarfdump (3).

WebAssembly

Module

Transformation

WebAssembly

DWARF

WebAssembly

DWARF

Figure 1: Process of transformation of a binary to

WebAssembly to generate a binary supporting the DWARF

format.

Our PUA identiﬁcation strategy uses machine

learning algorithms. We selected multi-class algo-

rithms to better understand the impact of the infor-

mation extracted from the binaries in the models.

The choice of algorithms is based on previous related

work (Galante et al., 2019; Castanhel et al., 2020;

Lemos et al., 2023; Heinrich et al., 2024).

3.3 Binary Characteristics

After converting the WebAssembly binaries to sup-

port the DWARF format, information can be extracted

from them. The DWARF format provides informa-

tion from a set of tags and attributes present in a

WebAssembly binary (as presented in Section 3.2).

This information allows a later application analysis

process, for the identiﬁcation of implementation er-

rors or, in the case of this work, for the identiﬁcation

of PUAs that may reﬂect an intrusion.

The DWARF format offers different tags accord-

ing to the format of the language being analyzed (as

https://llvm.org/docs/CommandGuide/llvm-dwarfdu

mp.html

https://github.com/WebAssembly/wabt

https://wasmtime.dev/

The Use of the DWARF Debugging Format for the Identiﬁcation of Potentially Unwanted Applications (PUAs) in WebAssembly Binaries

445

presented in Section 2.1). Instead of using all the tags

available for WebAssembly, we chose to select the

relevant attributes for classiﬁcation. For this selection

process, we collected information about all the tags

available for the WebAssembly format and evaluated

the importance of these features for the models.

We limited our selection based on speciﬁc sections

of the WebAssembly binary (as presented in Table 1).

The data presented in the preamble and standard sec-

tion presents the key characteristics of the application

and the functionality of the WebAssembly module,

such as functions, variables, export data, memory al-

location, and binary compilation.

After this process, we evaluated the tags that were

relevant to our classiﬁcation strategy (Section 3.2).

This process generated a set of key information from

the DWARF format that deﬁned six groups covering

the extracted information

General Information: from the application, such as

program size, the number of labels in the code,

and the source code language. In addition to

describing the application, we hypothesize that

this information can help classiﬁers as it offers

a view of the language used before the port to

WebAssembly;

Routines and Subprograms: considering the num-

ber of declared variables, the number of declara-

tions of inline subroutines, the number of declara-

tions of subprograms, and the number of param-

eters of these subprograms. This information is

important to describe the operations that a binary

can perform;

Variables: which includes the number of type decla-

rations, declared integer types, declared unsigned

integer types, declared word types, and declared

ﬁle pointer types. As WebAssembly has a re-

stricted set of native data types, it is necessary to

deﬁne these types in WebAssembly modules, con-

sequently, this information also helps in describ-

ing the operations performed by a binary;

Parameters: which consist of the information found

in each DIE, like operations performed by a func-

tion;

Memory Information: frequency of declared mem-

bers when using structures or classes. We hypoth-

esize that information found in memory can assist

in a detection process; and

Shared Data: includes data on the number of at-

tributes that determine whether the subroutines

are part of an external program or produce ex-

ternally accessible information and the number of

We also provide a list of tags used in the Appendix 6.

tags and attributes related to the use of function

calls.

When extracting information from the DWARF

format, different regions of the application can be ac-

cessed in each DIE. The selected groups allow ac-

cess to 29 tags in WebAssembly DWARF format. The

number of tags that can be accessed by the DWARF

format will vary according to the language supported.

The tags present speciﬁc information about the ap-

plication behavior or represent some type of operation

performed by the program. For an intrusion detection

proposal, the raw information extracted can be used,

or the frequency of appearance that may present a pat-

tern.

4 EVALUATION

This section presents the evaluation of our proposal.

Section 4.1 describes the objective of the evaluation.

Section 4.2 presents the dataset. Section 4.3 discusses

of the results.

4.1 Objective

Our purpose is to evaluate the usefulness of employ-

ing information extracted from the debugging format

for PUA identiﬁcation. We use static information

from WebAssembly binaries to identify threats, inves-

tigating the feasibility of using information present in

WebAssembly binaries through the DWARF format.

For the experiments we selected four machine

learning algorithms: Multi-layer Perceptron (MLP),

Random Forest (RF), Support Vector Machines (SVM)

and XGBoost. These models aim to demonstrate how

the learning process behaves when considering the

characteristics selected for the benign and malicious

classiﬁcation classes.

4.2 Dataset

To perform the evaluation, a dataset is needed. In-

stead of building a completely new dataset, we used

already available data from two sources (Lehmann

and Pradel, 2022; Stiévenart et al., 2022). The bi-

naries were selected taking into account the available

samples to deﬁne both benign and malicious samples.

To balance the classes of malicious and benign sam-

ples, we normalized the number of samples between

the two classes. We selected samples to obtain a sub-

set with the same characteristics as the entire dataset

SECRYPT 2024 - 21st International Conference on Security and Cryptography

446

(our code is publicly available

After the selection process, the binaries were

transformed to support the DWARF format (as pre-

sented in Section 3.2). In total, 770 samples were se-

lected for the experiment. These samples are divided

into 400 benign samples and 370 malicious samples.

The purpose of the samples is to present recurring be-

haviors for benign applications and vulnerabilities or

implementation errors for malicious samples.

Before running the experiments information needs

to be extracted from the DWARF format of each

WebAssembly application. Taking into account the

binary characteristics presented in Section 3.3, we ex-

tracted all tags in the DWARF format that were incor-

porated into the six deﬁned groups.

The features extracted from the binaries are saved

to a ﬁle, which contains the appearance count of the

attributes and subsequent encodings of extracted vari-

ables. After this process, the information was used for

the model training and testing process.

4.3 Results

The classiﬁcation results obtained by using machine

learning algorithms, trained with the information ex-

tracted by the DWARF format, are presented in this

section. The algorithms were trained and tested in a

1:1 ratio, that is, 50% of the data was used for training

and 50% for testing. An exhaustive parameter search

was made, aiming to ﬁnd the best conﬁgurations for

the models. We also perform the test with 10-fold

cross-validation, as it is the standard for this type of

assessment.

Table 2 presents the results achieved by the algo-

rithms evaluated, considering the usual metrics in the

area of machine learning. The results achieved var-

ied according to the classiﬁcation strategy used by the

machine learning models. However, the results are fa-

vorable for a PUA detection strategy using the multi-

class algorithms. Multi-class algorithms are trained to

classify two or more classes, being capable of deﬁn-

ing patterns that represent each of the classes.

The precision highlights that models like MLP

and SVM had the biggest impact due to false posi-

tives, with the models classifying benign samples as

malicious. Despite the small percentage of false pos-

itives, the error was responsible for the impact found

in the F1Score.

The models were not affected by false negatives,

as portrayed by the recall. The best metric to describe

the model’s result is the F1Score, in which we notice

the impact of false positives. For the classiﬁcation

https://github.com/CalebeHelpa/webassembly-classif

ication

of binaries to perform the identiﬁcation of PUA, the

overall F1Score demonstrates that the features used to

train and test the classiﬁers are sufﬁcient for a classi-

ﬁcation process.

The accuracy demonstrates the impact of true pos-

itives and true negatives, demonstrating that the mod-

els were able to adequately learn the patterns of the

binaries through the extracted features. The values

achieved by the Balanced Accuracy (BAC) are evi-

dence that the model learning persists.

With these results, we conclude that debugging

formats, such as DWARF, have the potential to ex-

tract information from binaries that later can be used

for intrusion detection solutions. We also presented a

static strategy for PUA detection in the WebAssembly

application. Although we only explore the use of the

DWARF format for WebAssembly applications, the

information extracted through the use of the DWARF

format showed promising results for the use of infor-

mation extracted through debugging tools to detect

PUA.

5 RELATED WORK

Some well-known strategies are used to collect data

from applications and perform intrusion detection.

Solutions that use system calls are an example, in

which, traces of the application execution are used

to deﬁne application behavior and identify anomalies

that may correspond to an intrusion (Liu et al., 2018).

The use of bytecode is also explored in the intru-

sion detection ﬁeld. Bytecode contains a sequence of

instructions that represents the program without any

redundancy, facilitating the process of compiling or

interpreting source code into machine code. The so-

lution proposed by (Ashouri et al., 2021) relies on

Java bytecode to intercept runtime attacks. Bytecode

is also used to detect malware on Android systems, by

extracting features and using convolution neural net-

works to classify malicious applications (Ding et al.,

2020).

In web applications, bytecode sequences are used

to detect malicious behavior in JavaScript code,

such as cross-site scripting and redirections (Rozi

et al., 2020). The use of bytecode allows to bypass

JavaScript obfuscation.

Approaches that focus on the static evaluation

of WebAssembly binaries are aimed at identifying

vulnerabilities in the developed codes that can be

exploited or cause an error in production. These

tools are also aimed at generating the ﬂow of

WebAssembly applications, aiming to identify vul-

nerabilities that may have been ported from other lan-

The Use of the DWARF Debugging Format for the Identiﬁcation of Potentially Unwanted Applications (PUAs) in WebAssembly Binaries

447

Table 2: Performance of algorithms for PUA using DWARF format data.

10-Fold Cross-validation

Classiﬁer Precision Recall F1Score Accuracy BAC

MLP 93.40% 100% 96.59% 96.62% 96.77%

RandomForest 98.24% 100% 99.11% 99.22% 99.31%

SVM 94.89% 100% 97.38% 97.66% 97.94%

XGBoost 97.34% 100% 98.65% 98.70% 98.76%

guages (Quan et al., 2019; Brito et al., 2022).

To the best of our knowledge, our work is the ﬁrst

to use the debugging format to extract features to ap-

ply in an intrusion detection system. Our work is able

to demonstrate the potential of using debug formats

for intrusion detection.

6 CONCLUSION

In this paper, we present a novel intrusion detection

approach using debugging formats to extract features

from application code. To validate our proposal, we

built a dataset with WebAssembly applications, a bi-

nary format that has seen rapid adoption on the Web.

The features were extracted from DWARF format, a

debugging information ﬁle format used by many com-

pilers and debuggers to support source level debug-

ging.

We evaluated our approach with multi-class ma-

chine learning algorithms, obtaining promising re-

sults, especially with ensemble algorithms. Thus, we

showed the potential of using debugging formats to

extract information from binaries to perform intrusion

detection.

Unfortunately, using debugging formats has some

limitations. Debugging symbols increase the size of

compiled binaries and thus are usually stripped from

distributed binaries to save space. Having a process

for dealing with such binaries, as described in Sec-

tion 3.2, alleviates this problem. The DWARF format

may also, on rare occasions, not be capable of debug-

ging the information itself. Some languages/compil-

ers may not support the format, or support only a sub-

set of its functionalities (Bastian et al., 2019); further

experimentation is needed to investigate the impact of

partial support for DWARF in our proposal.

ACKNOWLEDGEMENTS

This study was ﬁnanced in part by the Coordenação

de Aperfeiçoamento de Pessoal de Nível Superior –

Brasil (CAPES) – Finance Code 001 and Fundação

de Amparo à Pesquisa e Inovação do Estado de

Santa Catarina (FAPESC). The authors also thank the

UDESC, UFPR and UTFPR Computer Science de-

partments.

REFERENCES

Ashouri, M., Kreitz, C., Austin, T. H., and Bordim, J. L.

(2021). JACY: A robust JVM-based intrusion detec-

tion and security analysis system.

Bastian, T., Kell, S., and Zappa Nardelli, F. (2019). Reli-

able and fast DWARF-based stack unwinding. Pro-

ceedings of the ACM on Programming Languages,

3(OOPSLA):1–24.

Bosamiya, J., Lim, W. S., and Parno, B. (2022).

Provably-Safe multilingual software sandboxing us-

ing WebAssembly. In Proceedings of the 31st

USENIX Security Symposium, pages 1975–1992,

Boston, MA, USA. USENIX Association.

Brito, T., Lopes, P., Santos, N., and Santos, J. F. (2022).

Wasmati: An efﬁcient static vulnerability scanner for

WebAssembly. Computers & Security, 118:102745.

Castanhel, G. R., Heinrich, T., Ceschin, F., and Maziero,

C. A. (2020). Sliding window: The impact of

trace size in anomaly detection system for containers

through machine learning. In XVIII Regional School

of Computer Networks, pages 141–146, Virtual Event.

SBC.

Castanhel, G. R., Heinrich, T., Ceschin, F., and Maziero,

C. A. (2021). Taking a peek: An evaluation of

anomaly detection using system calls for containers.

In Proceedings of the 26th IEEE Symposium on Com-

puters and Communications, Athens, Greece. IEEE.

Ceschin, F., Botacin, M., Bifet, A., Pfahringer, B., Oliveira,

L. S., Gomes, H. M., and Grégio, A. (2024). Machine

learning (in) security: A stream of problems. Digital

Threats, 5(1).

Chandola, V., Banerjee, A., and Kumar, V. (2009).

Anomaly detection: A survey. ACM Computing Sur-

veys, 41(3):1–58.

Ding, Y., Zhang, X., Hu, J., and Xu, W. (2020). Android

malware detection method based on bytecode image.

Journal of Ambient Intelligence and Humanized Com-

puting, 14(5):6401–6410.

DWARF (2023). DWARF debugging information format. ht

tps://dwarfstd.org/. DWARF Debugging Information

Format Committee.

Eager, M. J. (2012). Introduction to the dwarf debugging

format. https://dwarfstd.org/doc/Debugging%20usi

ng%20DWARF-2012.pdf.

SECRYPT 2024 - 21st International Conference on Security and Cryptography

448

Falliere, N. (2018). Reverse engineering WebAssembly. ht

tps://www.pnfsoftware.com/reversing-wasm.pdf.

Galante, L., Botacin, M., Grégio, A., and de Geus, P.

(2019). Forseti: Extração de características e classi-

ﬁcação de binários ELF. In Anais Estendidos do XIX

Simpósio Brasileiro de Segurança da Informação e de

Sistemas Computacionais, pages 5–10, São Paulo, SP,

Brazil. SBC.

Heinrich, T., Will, N. C., Obelheiro, R. R., and Maziero,

C. A. (2024). A categorical data approach for anomaly

detection in WebAssembly applications. In Proceed-

ings of the 10th International Conference on Infor-

mation Systems Security and Privacy, pages 275–284,

Rome, Italy. SciTePress.

Hoffman, K. (2019). Programming WebAssembly with

Rust: Uniﬁed Development for Web, Mobile, and

Embedded Applications. The Pragmatic Bookshelf,

Raleigh, NC, USA.

Kim, M., Jang, H., and Shin, Y. (2022). Avengers, Assem-

ble! survey of WebAssembly security solutions. In

Proceedings of the 15th International Conference on

Cloud Computing, pages 543–553, Barcelona, Spain.

IEEE.

Kirchmayr, W., Moser, M., Nocke, L., Pichler, J., and To-

ber, R. (2016). Integration of static and dynamic code

analysis for understanding legacy source code. In Pro-

ceedings of the International Conference on Software

Maintenance and Evolution, pages 543–552, Raleigh,

NC, USA. IEEE.

Lehmann, D., Kinder, J., and Pradel, M. (2020). Everything

old is new again: Binary security of WebAssembly. In

Proceedings of the 29th USENIX Security Symposium,

pages 217–234, Boston, MA, USA. USENIX Associ-

ation.

Lehmann, D. and Pradel, M. (2022). Finding the DWARF:

Recovering precise types from WebAssembly bina-

ries. In Proceedings of the 43rd International Con-

ference on Programming Language Design and Im-

plementation, pages 410–425, San Diego, CA, USA.

ACM.

Lemos, R., Heinrich, T., Maziero, C. A., and Will, N. C.

(2022). Is it safe? identifying malicious apps through

the use of metadata and inter-process communica-

tion. In Proceedings of the 16th Annual IEEE Interna-

tional Systems Conference, pages 1–8, Montreal, QC,

Canada. IEEE.

Lemos, R., Heinrich, T., Will, N. C., Obelheiro, R. R., and

Maziero, C. A. (2023). Inspecting binder transactions

to detect anomalies in android. In Proceedings of the

17th Annual IEEE International Systems Conference,

Vancouver, BC, Canada. IEEE.

Liu, M., Xue, Z., Xu, X., Zhong, C., and Chen, J. (2018).

Host-based intrusion detection system with system

calls: Review and future trends. ACM Computing Sur-

veys, 51(5):98.

Michael, A. E., Gollamudi, A., Bosamiya, J., Johnson, E.,

Denlinger, A., Disselkoen, C., Watt, C., Parno, B.,

Patrignani, M., Vassena, M., and Stefan, D. (2023).

MSWasm: Soundly enforcing memory-safe execution

of unsafe code. Proceedings of the ACM on Program-

ming Languages, 7(POPL).

Naseem, F. N., Aris, A., Babun, L., Tekiner, E., and Ulu-

agac, A. S. (2021). MINOS: A lightweight real-time

cryptojacking detection system. In Proceedings of the

Network and Distributed System Security Symposium,

Virtual Event. Internet Society.

Pickard, C. and Miladinov, S. (2012). Rogue software:

Protection against potentially unwanted applications.

In Proceedings of the 7th International Conference

on Malicious and Unwanted Software, Fajardo, PR,

USA. IEEE.

Quan, L., Wu, L., and Wang, H. (2019). EVulHunter: De-

tecting fake transfer vulnerabilities for EOSIO’s smart

contracts at WebAssembly-level.

Romano, A., Lehmann, D., Pradel, M., and Wang, W.

(2022). Wobfuscator: Obfuscating JavaScript mal-

ware via opportunistic translation to WebAssembly.

In Proceedings of the 43rd Symposium on Security

and Privacy, pages 1574–1589, San Francisco, CA,

USA. IEEE.

Rossberg, A. (2018). Webassembly speciﬁcation. https:

//webassembly.github.io/spec/core/_download/WebA

ssembly.pdf.

Rozi, M. F., Kim, S., and Ozawa, S. (2020). Deep neu-

ral networks for malicious javascript detection using

bytecode sequences. In Proceedings of the Interna-

tional Joint Conference on Neural Networks, pages 1–

8, Glasgow, UK. IEEE.

Stallings, W., Brown, L., Bauer, M. D., and Bhattachar-

jee, A. K. (2012). Computer Security: Principles and

Practice. Pearson.

Stiévenart, Q., De Roover, C., and Ghafari, M. (2022). Se-

curity risks of porting C programs to WebAssembly.

In Proceedings of the 37th Symposium on Applied

Computing, pages 1713–1722, Virtual Event. ACM.

APPENDIX

Tags Considered in the Training Process

lines, language, dw_tag_subprogram,

dw_tag_typedef, dw_tag_member, dw_tag_label,

dw_tag_gnu_call_site, dw_at_gnu_all_call_sites,

dw_tag_inlined_subroutine, dw_at_external,

dw_at_call_ﬁle, int_type, uint_type, string_type,

fp_type, bool_type, dw_tag_enumerator,

dw_tag_variable_int, dw_tag_variable_uint,

dw_tag_variable_string, dw_tag_variable_fp,

dw_tag_variable_bool, dw_tag_variable,

dw_tag_formal_parameter_int,

dw_tag_formal_parameter_uint,

dw_tag_formal_parameter_string,

dw_tag_formal_parameter_fp,

dw_tag_formal_parameter,

dw_tag_formal_parameter_bool

The Use of the DWARF Debugging Format for the Identiﬁcation of Potentially Unwanted Applications (PUAs) in WebAssembly Binaries

449