New Perspectives on Data Exﬁltration Detection for Advanced Persistent

Threats Based on Ensemble Deep Learning Tree

Xiaojuan Cai

and Hiroshi Koide

Department of Information Science and Technology, Information Science and Electrical Engineering,

Kyushu University, Fukuoka, Japan

Section of Cyber Security for Information Systems, Research Institute for Information Technology,

Kyushu University, Fukuoka, Japan

Keywords:

Data Exﬁltration, Command and Control Channel, Transfer Size Limitation, Advanced Persistent Threat,

Deep Learning, Ensemble Tree, Extreme Gradient Boosting, Internet Trafﬁc.

Abstract:

Data exﬁltration of Advanced Persistent Threats (APTs) is a critical concern for high-value entities such as

governments, large enterprises, and critical infrastructures, as attackers deploy increasingly sophisticated and

stealthy tactics. Although extensive research has focused on methods to detect and halt APTs at the onset

of an attack (e.g., examining data exﬁltration over Domain Name System tunnels), there has been a lack of

attention towards detecting sensitive data exﬁltration once an APT has gained a foothold in the victim system.

To address this gap, this paper analyzes data exﬁltration detection from two new perspectives: exﬁltration

over a command-and-control channel and limitations on exﬁltration transfer size, assuming that APT attackers

have established a presence in the victim system. We introduce two detection mechanisms (Transfer Life-

time Volatility & Transfer Speed Volatility) and propose an ensemble deep learning tree model, EDeepXGB,

based on eXtreme Gradient Boosting, to analyze data exﬁltration from these perspectives. By comparing

our approach with eight deep learning models (including four deep neural networks and four convolutional

neural networks) and four traditional machine learning models (Naive Bayes, Quadratic Discriminant Analy-

sis, Random Forest, and AdaBoost), our approach demonstrates competitive performance on the latest public

real-world dataset (Unraveled−2023), with Precision of 91.89%, Recall of 93.19%, and F1-Score of 92.49%.

1 INTRODUCTION

Advanced Persistent Threats (APTs) are able to es-

tablish a long-term presence in a target system that

allows them to gather as much data as they can while

remaining undetected by using sophisticated tools and

zero-day vulnerabilities (Charan et al., 2021).

All APTs have a basic procedure during the at-

tack, generally referred to as the life cycle, which

is shown in Figure 1. In order to facilitate the data

exﬁltration stage, the attacker must maintain active

communications with the victim system through the

command and control (C2) channel, a communica-

tion channel through which the attacker receives ex-

ﬁltrated data and sends commands (Edgar and Manz,

2017). The target information will be detected, re-

trieved, and transferred to attackers through C2 chan-

nels. This is the ultimate goal of APT attackers and is

the last line of defense in the information defense war.

Therefore, it is signiﬁcant to focus on data exﬁltration

detection in APT attacks. (Irshad et al., 2021; King

et al., 2021).

During the process of data exﬁltration, attackers

will gather and transform target data through out-

bound trafﬁc. Therefore, data exﬁltration is not com-

pletely invisible. For all APT attacks, it is a crucial

step for attackers to establish C2 channels in order to

send commands to the victim system and receive the

exﬁltrated data from the victim network (King et al.,

2021).

Reconnaissance

Initial

Compromise

Establish

Foothold

Data

Exfiltration

Figure 1: APT typical main stages.

There are plenty of works concentrating on how to

identify and stop attackers the moment they enter the

system and begin attacking (Stojanovi

c et al., 2020;

Chen et al., 2014; Alminshid and Omar, 2020). And

the incorporation of machine learning has resulted

in an even greater rate of attack detection accuracy

276

Cai, X. and Koide, H.

New Perspectives on Data Exﬁltration Detection for Advanced Persistent Threats Based on Ensemble Deep Learning Tree.

DOI: 10.5220/0012181200003584

In Proceedings of the 19th International Conference on Web Information Systems and Technologies (WEBIST 2023), pages 276-285

ISBN: 978-989-758-672-9; ISSN: 2184-3252

(Ghaﬁr et al., 2018; Mamun and Shi, 2021; Lal et al.,

2022; Abdullayeva, 2021). Meanwhile, some works

have noticed the importance of detecting data exﬁl-

tration based on Domain Name System (DNS) (Lal

et al., 2022; Mengqi et al., 2022; Alenezi and Lud-

wig, 2021; Zebin et al., 2022). However, the topic

of how to secure sensitive data in an APT attack after

the APT attacker has established a foothold on victim

systems has received relatively little attention.

This void inspired us to investigate the following

two questions: Q1. How to detect data exﬁltration of

APT attacks within normal trafﬁc? Q2. How to detect

sensitive data exﬁltration of APT attacks if the leaked

information is split into extremely small chunk sizes

in different victim systems?

To answer the two questions above, we are facing

the following two challenges in this paper:

Challenge 1: To identify the malicious sensitive data

transfer while conventional Internet trafﬁc is occur-

ring. In order to prevent being detected, data exﬁltra-

tion trafﬁc will mimic legitimate user trafﬁc as closely

as possible. Hence, it is a main challenge to identify

data exﬁltration in the network output stream of APT

attacks versus general Internet trafﬁc activity.

Challenge 2: To detect APT attacks if the exposed

information is transferred under small sizes (e.g., 1

MB) from the servers of several victims. The difﬁ-

culty is that the features of the exﬁltrated data trans-

fer can easily be masked by legitimate ﬁle transfers

of normal users to avoid detection by trafﬁc monitors,

since the larger the exﬁltrated data transfer size, the

more likely it is to trigger a monitor alert.

Hence, in this paper, assuming the APT attacker

has established a foothold in the victim system suc-

cessfully, by analyzing the Internet trafﬁc, our main

purpose is to detect data exﬁltration of APT from the

following two perspectives using an ensemble deep

learning tree based on eXtreme Gradient Boosting

(EDeepXGB):

• Exﬁltration over C2 channel

• Exﬁltration transfer size limitation

Contributions

Our main contributions in this paper can be listed

as follows:

• Assuming APT attackers has established a

foothold in the victim system, we ﬁrst focus at-

tention on the data exﬁltration detection from the

perspective of exﬁltration over APT C2 channels

and exﬁltrated data size transfer limitation.

• In order to detect data exﬁltration of APT attack,

we summarized the Transfer Lifetime Volatility

(PT L) and the Transfer Speed Volatility (PT S) as

detection mechanisms from the perspective of Ex-

ﬁltration over C2 Channel and Exﬁltration Trans-

fer Size Limitation.

• A ensemble deep learning tree based on eXtreme

Gradient Boosting (XGB), called EDeepXGB, is

implemented. Like the existing ConvXGB sys-

tem (Thongsuwan et al., 2021), we implement the

XGB to the dense layer of deep learning models

to detect data exﬁltration of APT accurately and

rapidly.

• To exclude chance and randomness, four promis-

ing Deep Neural Network (DNN) models and four

Convolution Neural Networks (CNN) are trained

using the newest dataset of Unraveled−2023

(Sowmya et al., 2023). Meanwhile, an optimal

EDeepXGB model is determined after a couple of

evaluation experiments.

• The performance of our proposed method is veri-

ﬁed using the newest public dataset, which shows

that our EDeepXGB successfully promotes the

performance of detection Precision, Recall Score

and the F1-Score, compared to baselines (e.g.,

Naive Bayes, Quadratic Discriminant Analysis

(QDA), Random Forest and AdaBoost).

The rest of the paper is structured as follows. In

Section 2, a brief introduction of related previous

works is drafted. The detection mechanisms we sum-

marized are introduced in Section 3. Section 4 de-

scribes the structure of our proposed method. The de-

tails of our experiments using the newest public real-

world dataset are shown in Section 5. Additionally,

the observation and the evaluation are depicted in this

Section. Lastly, The conclusion and the future work

are addressed in Section 6.

2 RELATED WORKS

A number of techniques are used to detect data exﬁl-

tration of APT attacks. (Zou et al., 2020) proposed

a ranking list of APT tactics in the framework of

APT tactics recognition through synthesizing analysis

and correlation of data from various sources. (Veena

and Brahmananda, 2022) built a destination host ﬁl-

ter unit and a blacklist of host destinations to ana-

lyze the outbound connections which go to the same

destination from a huge amount of trafﬁc. More-

over, in some works, machine learning models have

been implemented in APT threats detection (Sabir

et al., 2022; Zimba et al., 2020; Moghaddam and

Zincir-Heywood, 2020). With semi-supervised learn-

ing models, the work of (Zimba et al., 2020) proposed

New Perspectives on Data Exﬁltration Detection for Advanced Persistent Threats Based on Ensemble Deep Learning Tree

277

a framework to score the suspicious APT activities us-

ing an SNN-based clustering algorithm. (Ghaﬁr et al.,

2018) mines relationships between activities and ﬂag

sequences of suspicious events as they occur for oper-

ating system diagnosis using Bayesian network. And

(Mamun and Shi, 2021) proposed a heterogeneous

task tree base deep learning method to detect ma-

licious traces of APT. With the advantage of Auto-

Encoder based deep learning approach that is able to

identify complex relationships between features, (Ab-

dullayeva, 2021) uses Auto-Encoder neural network

with a softmax regression layer against APT attacks.

The detection rates of these works were relatively

high and attention was given to all stages (includ-

ing Reconnaissance, Initial Compromise, Estanblish

Foothold and Data Exﬁltration) of APT detection.

By analyzing data exﬁltration on the outbound

trafﬁc of APT threats from a different angle, in the

work of (D’Agostino and Kul, 2021), the authors

focus on the situation after being attacked by APT

threats. They compared the post-APT attack reports

with sensitive database information to discover which

sensitive data has been stolen. Although, their exper-

iments are based on the fact that only APT attack ac-

tivities occurred in the network trafﬁc when being at-

tacked.

Besides, for data exﬁltration of APT, there are

a lot of works (Mengqi et al., 2022; Alenezi and

Ludwig, 2021; Zebin et al., 2022) focusing on the

Domain Name System over HTTPS (DoH) to pre-

vent APT attackers from storing information from the

domain names and the corresponding IP addresses

(zone ﬁle). In order to prevent data exﬁltration from

DNS Text messages, those works classify and detect

DoH tunnelings by analyzing DNS queries and re-

sponses. However, in text classiﬁcation, encrypted

DNS queries can hardly be detected.

Consequently, it can be noted that all these pre-

vious works did not discuss the data exﬁltration of

APT attacks when data is transferred over C2 chan-

nels within transfer size limitation.

There are various strategies to establish and main-

tain C2 channels in APT attacks. For instance, APT1

(Mandiant, 2014) uses domain names to imitate the

usual naming of online advertising services or web-

sites (e.g., yahoodaily.com) to set up C2 servers.

Moreover, in the attack of Duqu (Eric et al., 2012),

the attacker uses intermediary servers as proxies to

improve the availability and stealth of C2 channels.

On the other hand, according to the report of (NSA

et al., 2021), there are plenty of attacks that will split

exﬁltrated ﬁles into small chunks to avoid being de-

tected, for example, APT28 (NSA et al., 2021) can

split ﬁles under 1MB, and Kevin (Aseel Kayal, 2021)

can exﬁltrate data in blocks of 27 characters to the C2

server.

Hence, it is meaningful and necessary to investi-

gate data exﬁltration of APT from the perspective of

Exﬁltration over C2 channel and Exﬁltration transfer

size limitation.

3 DETECTION MECHANISMS

The gap from previous works leads us to the following

two questions:

Q1. How to detect data exﬁltration of APT attacks

within normal trafﬁc?

Q2. How to detect sensitive data exﬁltration of

APT attacks if the leaked information is split into ex-

tremely small sizes in different victim systems?

In this paper, to answer these two questions, we

aim to investigate data exﬁltration of APT from the

perspective of Exﬁltration over C2 channel and Exﬁl-

tration transfer size limitation.

Generally, HTTP/HTTPS ports 80 and 443 are

typically employed for establishing C2 channels,

since in well-secured corporate environments or gov-

ernmental organizations, only these ports are per-

mitted for outgoing connections. And the commu-

nication transmitted through the HTTP/HTTPS port

can be identiﬁed as legitimate HTTP protocol or bi-

nary communication (Mengqi et al., 2022; Veena and

Brahmananda, 2022). Hence, to detect the exﬁltrated

data, basic features (e.g., the destination IP/MAC ad-

dress, the source IP/MAC address, the Transport layer

port and the Transport layer protocol) are required.

However, to prevent being detected, attack-

ers would mimic legitimate user communication as

closely as possible. It means that: APT attackers

would encode the exﬁltrated data into normal commu-

nications using the same transport layer ports and the

same transport protocol with fake domain names gen-

erators (e.g., Domain Generation Algorithm). More-

over, some APT attackers limit the exﬁltrated data

transfer size to an extremely small chunk size to pre-

vent detection by the trafﬁc monitor (NSA et al.,

2021; Aseel Kayal, 2021).

Thus, as well as the basic features (ports, protocol,

IP address, etc.) mentioned above, we summarized

two detection mechanisms to detect data exﬁltration

over C2 channel and to detect data exﬁltration with

exﬁltration transfer size limitation.

3.1 Transfer Lifetime Volatility

The value of transfer lifetime (which we called PT L

)

focuses on the one-way lifetime of packets transfer at

WEBIST 2023 - 19th International Conference on Web Information Systems and Technologies

278

one trafﬁc connection between a normal user host and

the destination (the suspect C2 server or the suspect

proxy) in a period of time [t

f irst

, t

last

Exﬁltration over C2 Channel

While exﬁltrating the target data from the host server,

although in order to prevent being detected, the at-

tacker would try to keep the number of outﬂow pack-

ets per time unit as close to the average as possible,

the ambition makes the attacker try to complete the

exﬁltration transfer as fast as possible at the same time

(Charan et al., 2021; Sabir et al., 2021). It means that

for the victim system, more packets are transmitted

out than in over a period of time.

Hence, as shown as Equation 1, with the num-

ber of the transferred packets between the host to the

destination N

over time unit [t

f irst

, t

last

], the trans-

fer lifetime volatility PT L

from the destination to the

host will be smaller than the transfer lifetime volatil-

ity PT L

from the host to the destination in the i

communication channel (PT L

≤ PT L

PT L

last

−t

f irst

(1)

Exﬁltration Transfer Size Limitation

For the same size of sensitive data, with the limitation

of data transfer packets, the attacker needs to maintain

a relatively longer period of alive C2 channel to keep

exﬁltrated data transferring compared to not limiting

transfer size over C2 channels. Meanwhile, much

longer time would be spent when the APT attacker

trying to transmit sensitive data with small chunk size

than normal users.

Hence, in the i

communication connection, with

the number of the transferred packets N

(

t) between

the host to the destination in a period of time [t

f irst

t],

the transfer lifetime volatility PT L

from the destina-

tion to the host can be calculated as Equation 2. The

transfer lifetime volatility PT L

from the destination

to the host will be greater than the transfer lifetime

volatility PT L

which from the host to the destination

in the i

communication channel (PT L

≥ PT L

PT L

last

∑

t=t

f irst

(

t − t

f irst

(2)

3.2 Transfer Speed Volatility

The value of transfer speed (called PT S

) focuses

on the one-way packet transfer speed at one trafﬁc

connection between the normal user host (the victim’s

system) and the destination (the suspect C2 server or

the proxy).

Exﬁltration over C2 Channel

To prevent triggering the trafﬁc monitor’s alarm, at-

tackers need to control the transfer speed. However,

the outbound transfer speed (from the host server to

the destination) will be greater than the inbound trans-

fer speed (from the destination to the host server), be-

cause of the ongoing transfer of exﬁltrated data.

As Equation 3 shows, Byte

means the total trans-

fer size of packets between the host server and the

destination in one communication channel, while

byte

i j

means the transfer size of payload of the packet

j between the host server and the destination in i

communication channel. With the Equation 4, the

transfer speed volatility PT S

from the destination

to the host will be smaller than the transfer lifetime

volatility PT S

which from the host to the destination

in the i

communication channel (PT S

≤ PT S

Byte

∑

byte

i j

(3)

(where k represents the k

packet of i

communica-

tion channel.)

PT S

Byte

last

−t

f irst

(4)

Exﬁltration Transfer Size Limitation

To prevent triggering the trafﬁc monitor’s alarm, at-

tackers need to extend the transfer time when setting

a limitation of data transfer packets. That means, the

total number of the transferred packets between the

host to the destination, and the lifetime (t

last

− t

f irst

)

between the host to the destination will increase while

the average transfer bytes AveByte

(shown as Equa-

tion 5) from the host to the destination will be de-

creased in the i

communication connection.

Thus, according to the Equation 6, the transfer

speed volatility PT S

from the destination to the host

will be greater than the transfer lifetime volatility

PT S

which from the host to the destination in the

communication channel (PT S

≥ PT S

AveByte

∑

byte

(5)

(where k represents the k

packet of i

communica-

tion channel.)

PT S

AveByte

last

−t

f irst

(6)

New Perspectives on Data Exﬁltration Detection for Advanced Persistent Threats Based on Ensemble Deep Learning Tree

279

4 PROPOSED METHODOLOGY

In this paper, we aim to detect the data exﬁltration

of APT attacks from two perspectives (Exﬁltration

over C2 channel & Exﬁltration transfer size limita-

tion) under the assumption that the APT attacker has

established a foothold in the victim system. Hence,

we proposed an ensemble deep learning tree based on

eXtreme Gradient Boosting (EDeepXGB) for data ex-

ﬁltration detection analysis.

4.1 DL Models

In this paper, to exclude chance and random-

ness, we prepared four Deep Neural Networks

({DNN

1−4

}) and four Convolutional Neural Net-

works ({CNN

1−4

}) as the base Deep Learning (DL)

models for our EDeepXGB.

...

Input Layer

...

𝑥

ℎ

𝑜

𝑚

ℎ

𝑗

Hidden Layer

Output Layer

𝑦

...

𝑦

(a) DNN.

Input Layer

...

Output Layer

𝑦

...

𝑥

𝑜

𝑚

𝑦

Convolution

Pooling

Fully Connected

Hidden Layer

(b) CNN.

Figure 2: Model Structure of Deep Learning Networks.

4.1.1 DNN

Figure (a) of Figure 2 shows the structure of the Deep

Neural Network (DNN). As Equation 7 shows, with

n input nodes, the output result y

of the node j is

able to be calculated by the input X

, synaptic weights

) between two neural layers and the bias (θ

) of

hidden layers or the output layer.

= f (

∑

i=1

+ θ

) (7)

4.1.2 CNN

Figure (b) of Figure 2 shows the structure of the

Convolutional Neural Network (CNN). While there is

some N ∗ N square neuron layer which is followed by

the convolutional layer (as shown in Equation 9), if an

n ∗ n ﬁlter ω is used, at unit l, the convolutional layer

output y

i j

will be of size (N−n + 1) ∗ (N−n + 1). x

i j

is the pre-nonlinearity input in the layer, which means

the contributions (weighted by the ﬁlter components)

from the previous layer cells. As Equation 10 shows,

i j

is the output of Max-Pooling, which is to reduce

the spatial dimensions of the convolutional layer out-

put y

i j

(e.g., width and height).

Convolutional Layers:

i j

n−1

∑

a=0

n−1

∑

b=0

l−1

(i+a)( j+b)

(8)

i j

= f (

∑

W x

i j

+ θ) (9)

Max-Pooling:

i j

= max(y

i j

) (10)

4.2 EDeepXGB

In our EDeepXGB, using the prepared eight DL mod-

els as deep learning feature extractors, the prediction

results will be obtained by using the trees of XGB to

predict those extracted features. The node distribution

of each EDeepXGB model is shown in Table 1.

Table 1: Distribution of EDeepXGB.

Models Layer Structure

EDeepXGB − DNN

(In.-32)

-XGB

EDeepXGB − DNN

(In.-128-64)-XGB

EDeepXGB − DNN

(In.-64-32)-XGB

EDeepXGB − DNN

(In.-128-256-64)-XGB

EDeepXGB −CNN

(In.-64-30)-XGB

EDeepXGB −CNN

(In.-64-64-256)-XGB

EDeepXGB −CNN

(In.-64-128-30)-XGB

EDeepXGB −CNN

(In.-64-128-128-30)-XGB

Input-Dense layer distribution of DL models.

4.2.1 Prediction Drive Force

In this paper, all of our experiments are multi-class

classiﬁcation problems. Therefore, for base DL mod-

els, as shown in Figure 2, to predict t classes in

the case of (N, n) input x, outputs of hidden layers

,..., o

) need a prediction drive force (usually

using an activation function as the prediction drive

force) to do class prediction (y

,..., y

Generally, the SOFTMAX function σ (shown as

Equation 11) is an activation function for an output

layer of neural networks in multi-class classiﬁcation

problems. It normalizes a numerical vector ~o to a vec-

tor of probability distributions with individual proba-

bilities summing to 1.

σ(~o)

∑

j=1

, (11)

(where ~o means the input vector, and K represents the

number of classes in the multi-class classiﬁer.)

However, works of (Thongsuwan et al., 2021; Ma-

mun and Shi, 2021) inspired us to use a tree struc-

ture in the feature classiﬁcation layer to overcome the

WEBIST 2023 - 19th International Conference on Web Information Systems and Technologies

280

overﬁtting problem. Thus, in this paper, we use XGB

as a prediction drive force of our DL models to predict

data exﬁltration of APT attacks more accurately.

As a supervised learning model, the prediction of

XGB is given as function (12) (Chen and Guestrin,

2016). For each input data y

, XGB will assign a

prediction numerical score. Hence, the classiﬁcation

problem of each input y

becomes a ’yes’ and ’no’

problem (Xianrui and Joan, 2020). Hence, the output

layer of our EDeepXGB is implemented as Figure 3.

ˆy

∑

j=1

) (12)

(where n is the number of trees, f is the function space

of the set of possible classiﬁcation trees)

...

Results of Output Layer

𝑦

...

𝑜

𝑚

Outputs of Hidden Layer

𝑦

XGB Classification

Leafs

Input

Figure 3: Class Prediction of EDeepXGB using XGB as a

Prediction Drive Force.

5 EXPERIMENT

To demonstrate that the data exﬁltration of APT at-

tacks can be detected from perspectives of Exﬁl-

tration over C2 channel and Exﬁltration transfer

size limitation, we integrated our summarized detec-

tion mechanisms (Transfer Lifetime Volatility: PT L,

Transfer Speed Volatility: PT S) into the EDeep-

XGB, using the selected samples of Unraveled−2023

dataset (Sowmya et al., 2023). Additionally, the entire

Unraveled−2023 real-world dataset was used to eval-

uate the performance of our EDeepXGB. The over-

all performance of EDeepXGB will be discussed in

comparison with four baseline models (e.g., Naive

Bayes, Quadratic Discriminant Analysis (QDA), Ran-

dom Forest and AdaBoost) (Kostas, 2018).

5.1 Dataset

In order to delve into the characteristic of data

exﬁltration, the newest public real-world dataset (

Unraveled−2023) are utilized with two reasons that

(i): trafﬁc ﬂows are captured from nineteen user hosts

and there are eight attackers set in Unraveled−2023

dataset; (ii): the questions we investigated can be re-

ﬂected in scenarios of the APT groups (APT28, Drag-

onFly, Bronze Butler and Sandworm Team) excused

in this dataset, especially APT28 which split ﬁles into

small chunk sizes and exﬁltrates them through C2

channels (NSA et al., 2021).

As shown in Table 2, there are 7522 examples

of data exﬁltration in the Unraveled−2023 dataset.

Among the Unraveled−2023 dataset, there is an at-

tack implementation timeline where the data exﬁltra-

tion of the APT occurs in weeks 5 and 6. There-

fore, we only selected 7374 samples (including Inter-

net trafﬁc ﬂows with suspected data exﬁltration under

the assumption that APT attackers have established a

foothold in the victim system ) from these two weeks

(week 5 & 6), shown as Figure 4.

Table 2: Experiment Datasets.

Dataset APT Life-Cycle Training Testing

Unraveled-

2023

(Week 5&6)

NormalTrafﬁc 1454574 969837

LateralMovement 16435 10806

EstablishFoothold 16249 10862

DataExﬁltration 4499 3023

Coverup 229 133

446

Exfiltration Over .C2 Channel.

Data Transfer Size Limitation.

6988

Unraveled-2023

Figure 4: Exﬁltration Data Distribution of Experiment Data.

5.2 Evaluation of Detection

Mechanisms

Before detecting data exﬁltration by APT from

the perspective of Exﬁltration over C2 channel

(DEx f il −C2 ) and Exﬁltration transfer size limita-

tion (DEx f il − Size) with PT L and PT S, it is neces-

sary to investigate whether the detection mechanism

we summarized is able to detect data exﬁltration or

not. For this reason, we selected samples from the

Unraveled−2023 dataset into a training set (Establish

Foothold: 16278, DEx f il −C2: 4125, DEx f il −Size:

265) and a testing set (Establish-Foothold: 10662,

DEx f il −C2: 2787, DEx f il − Size: 181). Under the

assumption that the APT attacker has established a

foothold in the victim system, evaluation experiments

based on EDeepXGB are implemented.

The classiﬁcation reports of experiments’ results

are shown as Figure 5 and Figure 6 (E − f oothold

represents Establish Foothold).

It can be noticed that with eight different DL mod-

els (DNN

1−4

, CNN

1−4

), our detection mechanisms

New Perspectives on Data Exﬁltration Detection for Advanced Persistent Threats Based on Ensemble Deep Learning Tree

281

79%

46%

100%

81%

60%

59%

80%

62%

74%

81%

58%

66%

80%

56%

93%

81%

64%

61%

82%

58%

72%

81%

60%

76%

99%

4% 2%

96%

18%

60%

99%

81%

96%

18%

79%

97%

13%

38%

97%

17%

78%

96%

19%

80%

97%

16%

78%

88%

28%

59%

88%

11%

77%

88%

27%

72%

88%

22%

54%

88%

27%

68%

88%

28%

76%

88%

25%

77%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

Base DNN1 EDeepXGB-

DNN1

Base DNN2 EDeepXGB-

DNN2

Base DNN3 EDeepXGB-

DNN3

Base DNN4 EDeepXGB-

DNN4

Precision Recall F1-Score

Figure 5: Classiﬁcation Report Based on DNN

1−4

80%

55%

95%

82%

65%

72%

82%

57%

84%

82%

60%

70%

81%

56%

95%

81%

63%

78%

82%

57%

83%

81%

63%

69%

97%

14%

44%

97%

18%

74%

96%

19%

76%

96%

18%

79%

97%

13%

67%

97%

17%

81%

96%

21%

58%

97%

14%

75%

88%

22%

60%

89%

28%

73%

88%

28%

80%

88%

28%

74%

88%

22%

79%

89%

26%

79%

88%

31%

68%

88%

24%

72%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

E-foothold

DExfil-C2

DExfil-Size

Base CNN1 EDeepXGB-

CNN1

Base CNN2 EDeepXGB-

CNN2

Base CNN3 EDeepXGB-

CNN3

Base CNN4 EDeepXGB-

CNN4

Precision Recall F1-Score

Figure 6: Classiﬁcation Report Based on CNN

1−4

can successfully help to detect data exﬁltration from

the perspective of Exﬁltration over C2 channel and

Exﬁltration transfer size limitation. Moreover, the

overall performance (Precision, Recall, and F1-Score)

in detecting DEx f il −C2 and DEx f il − Size can be

improved by EDeepXGB compared to the perfor-

mance of their base DL models (DNN

1−4

, CNN

1−4

In addition, we can see that the F1-Score of

E − f oothold is almost keeping on 88% both

base DL models (BaseDNN

1−4

, BaseCNN

1−4

)

and EDeepXGB models (EDeepXGB −

DNN

1−4

, EDeepXGB − CNN

1−4

). It means

that there is nearly no negative affection on the

detection of the other APT steps (except for the step

of Data Exﬁltration) using our detection mechanisms.

Therefore, it can be said that the detection mecha-

nism we summarized is able to detect data exﬁltration.

5.3 Evaluation of Overall Performance

After conﬁrming the feasibility of detecting data ex-

ﬁltration of APT attacks from two perspectives (Ex-

ﬁltration over C2 channel (DEx f il −C2 ) and Exﬁl-

tration transfer size limitation (DEx f il − Size)) based

on EDeepXGB, we executed a couple of experiments

on the whole Unraveled−2023 testing dataset (which

be processed by (Sowmya et al., 2023)) to ensure the

performance of our proposed method. In addition,

to evaluate the overall performance of our EDeep-

XGB, we also implement four basic machine learn-

ing models (Naive Bayes, QDA, Random Forest, and

AdaBoost) as a baseline for comparison.

5.3.1 Results of EDeepXGB

With the detection mechanisms (PT L, PT S) we sum-

marized in Section 3, by testing in the whole test-

ing dataset of Unraveled−2023, Figure 7 shows the

True Negative (TN) results, True Positive (TP) results,

False Negative (FN) results and False Positive (FP)

results of data exﬁltration detection.

991643

991593

991696

991659

991654

991262

991631

991690

991676

991697

991550

991676

991693

991686

991698

105

2 39 44 436

67 8 22

148 22

5 12 0

167

2166

259

212

201

212 136

197

280

634

147

2170

753

385

1406

3023

2954

857 2764 28112822 28112887 28262743 2389 2876 853 2270 26381617

989000

989500

990000

990500

991000

991500

992000

992500

993000

993500

994000

994500

995000

Base

EDeepXGB

Base

EDeepXGB

Base

EDeepXGB

Base

EDeepXGB

Base

EDeepXGB

Base

EDeepXGB

Base

EDeepXGB

Base

EDeepXGB

DNN1 DNN2 DNN3 DNN4 CNN1 CNN2 CNN3 CNN4

TN FP FN TP

Figure 7: Confusion Matrix of Data Exﬁltration Detection.

It can be observed that, with the exception of

EDeepXGB − DNN

and EDeepXGB − CNN

, the

other EDeepXGB models predicted data exﬁltration

more accurately than their corresponding base DL

models. In particular, EDeepX GB − DNN

con-

tributed to an increase in the prediction sample by

1907. This implies that EDeepXGB − DNN

and

EDeepXGB −CNN

did not perform well in predict-

ing data exﬁltration samples for the Unraveled−2023

dataset with the current tree structure. The underlying

reason for this could be an overﬁtting issue, or that

the models EDeepXGB − DNN

and EDeepXGB −

CNN

are not effectively learning from the unbal-

anced training data provided.

Moreover, although the base CNN

is not sensitive

to our test data exﬁltration samples (TP-(Base CNN

)

= 0), it becomes capable of detecting data exﬁltration

after being incorporated with our proposed method

(TP-(EDeepXGB −CNN

) = 69).

On the other hand, with the whole dataset of

Unraveled−2023, the performance results of EDeep-

XGB based on DL models (DNN

1−4

, CNN

1−4

) are

shown in Table 3 and Table 4. It can be observed that

the data exﬁltration detection ability of EDeepXGB −

CNN

is weakest compared to all other seven en-

semble deep learning models, with a Precision of

29%, Recall of 2%, F1-Score of 4%. This deﬁ-

ciency may attributed to a severe overﬁtting problem

in EDeepX GB − CNN

, causing its neuronal struc-

ture to become overly tailored to the provided training

data.

WEBIST 2023 - 19th International Conference on Web Information Systems and Technologies

282

Table 3: Classiﬁcation Report of EDeepXGB Based DNN.

Precision Recall F1-Score

EDeepXGB − DNN

Benign 1.00 1.00 1.00

Data Exﬁltration 0.96 0.91 0.94

Establish Foothold 0.98 1.00 0.99

Lateral Movement 0.93 0.96 0.95

Cover up 0.74 0.19 0.30

EDeepXGB − DNN

Benign 1.00 1.00 1.00

Data Exﬁltration 0.99 0.93 0.96

Establish Foothold 0.98 1.00 0.99

Lateral Movement 0.96 0.99 0.98

Cover up 0.53 0.42 0.47

EDeepXGB − DNN

Benign 1.00 1.00 1.00

Data Exﬁltration 0.87 0.96 0.91

Establish Foothold 0.99 0.96 0.98

Lateral Movement 0.96 0.99 0.98

Cover up 0.78 0.75 0.76

EDeepXGB − DNN

Benign 1.00 1.00 1.00

Data Exﬁltration 1.00 0.91 0.95

Establish Foothold 0.98 1.00 0.99

Lateral Movement 0.95 0.99 0.97

Cover up 0.67 0.90 0.77

5.3.2 Results of Baseline Models

In order to conﬁrm the overall performance of our

proposed method, we implemented Naive Bayes,

QDA, Random Forest, and AdaBoost as our overall

performance baseline models. The results of those

baselines are shown in Figure 8. All four baseline

models can detect APT attacks well (with 97.39% av-

erage accuracy). However, the detection ability (Pre-

cision, Recall and F1-Score) of baselines are all under

40%.

97.50%

97.72%

97.50%

96.85%

19.50%

23.05%

19.50%

27.34%

20.00%

34.22%

20.00%

39.56%

19.75%

25.22%

19.75%

30.71%

0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%

Naive Bayes

QDA

Random Forest

AdaBoost

F1-Score Recall Precision Accuracy

Figure 8: Overall Performance of Baseline Models.

Table 4: Classiﬁcation Report of EDeepXGB Based CNN.

Precision Recall F1-Score

EDeepXGB −CNN

Benign 1.00 1.00 1.00

Data Exﬁltration 1.00 0.95 0.97

Establish Foothold 0.98 1.00 0.99

Lateral Movement 0.99 0.96 0.98

Cover up 0.28 0.32 0.30

EDeepXGB −CNN

Benign 1.00 1.00 1.00

Data Exﬁltration 0.99 0.75 0.85

Establish Foothold 0.97 1.00 0.98

Lateral Movement 0.98 0.96 0.97

Cover up 0.45 0.21 0.29

EDeepXGB −CNN

Benign 1.00 1.00 1.00

Data Exﬁltration 0.99 0.53 0.70

Establish Foothold 0.97 1.00 0.99

Lateral Movement 1.00 0.96 0.98

Cover up 0.05 0.46 0.09

EDeepXGB −CNN

Benign 0.99 1.00 1.00

Data Exﬁltration 0.29 0.02 0.04

Establish Foothold 0.76 1.00 0.86

Lateral Movement 0.23 0.01 0.01

Cover up 0.00 0.03 0.00

5.4 Discussion

From the results we obtained, we can answer those

two questions: Q1. How to detect data exﬁltration

of APT attacks within normal trafﬁc? Q2. How to

detect sensitive data exﬁltration of APT attacks if the

leaked information is split into extremely small sizes

in different victim systems?

Since the APT attacker must keep the C2 chan-

nel alive to receive exﬁltrated sensitive data and bal-

ance the transfer time and the size of the exﬁltrated

data transfer at the same time, those questions be-

come to detecting data exﬁltration over C2 channel

and detecting exﬁltrated data being transfer size lim-

ited. Hence, with the discussion of detection mech-

anisms (PT L, PT S) in Section 3 and the evaluation

results of Section 5.2, it can be said that our pro-

posed method can detect data exﬁltration of APT suc-

cessfully from the perspective of Exﬁltration over C2

channel and Exﬁltration transfer size limitation.

Meanwhile, the performance in Section 5.3.1

shows that in a larger sample space (the whole

Unraveled−2023 dataset), our EDeepXGB is still

strong in detecting data exﬁltration of APT attacks.

Otherwise, for the evaluation of overall perfor-

New Perspectives on Data Exﬁltration Detection for Advanced Persistent Threats Based on Ensemble Deep Learning Tree

283

mance, by supporting the newest public real-world

dataset (Unraveled−2023), the comparison results be-

tween EDeepXGB and their base DL models are

shown in Table 3 and Table 4. Moreover, comparison

results between EDeepXGB and our baseline models

are all shown in Figure 9.

From these comparative results in Table 3 and Ta-

ble 4, we can note that although not all of EDeep-

XGB’s classiﬁcation accuracies are better than that

of their base DL models, the overall detection abili-

ties (Precision, Recall, and F1-Score) of EDeepXGB

are stronger than the base models. It means that

our ensemble deep learning tree helps the model to

predict data exﬁltration more precisely. Meanwhile,

compared to baseline models (shown in Figure 9),

the overall performance of our EDeepXGB is signiﬁ-

cantly better than that of the baselines.

99.83%

99.92%

99.89%

99.91% 99.91%

99.85%

99.80%

98.58%

97.50%

97.72%

97.50%

96.85%

92.18%

89.33%

91.89%

92.04%

85.12%

87.87%

80.31%

45.48%

19.50%

23.05%

19.50%

27.34%

81.20%

86.92%

93.19%

96.01%

84.66%

78.43%

79.04%

41.14%

20.00%

34.22%

20.00%

39.56%

83.43%

87.97%

92.49%

93.63%

84.84%

81.94%

75.06%

38.36%

19.75%

25.22%

19.75%

30.71%

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

45.00%

50.00%

55.00%

60.00%

65.00%

70.00%

75.00%

80.00%

85.00%

90.00%

95.00%

100.00%

EDeepXGB-DNN1

EDeepXGB-DNN2

EDeepXGB-DNN3

EDeepXGB-DNN4

EDeepXGB-CNN1

EDeepXGB-CNN2

EDeepXGB-CNN3

EDeepXGB-CNN4

Naive Bayes

QDA

Random Forest

AdaBoost

EDeepXGB-DNN EDeepXGB-CNN Baseline

Accuracy Precision Recall F1-Score

Figure 9: Comparison with Baseline Models.

As Figure 9 shows, in the Unraveled−2023

dataset, the overall performance of EDeepXGB −

DNN

is optimal (Precision: 92.04%, Recall:

96.01%, F1-Score: 93.63%).

However, from the discussion of confusion matri-

ces (Section 5.3.1), it is known that EDeepXGB −

DNN

can not predict well for the data exﬁltration

samples in the current deep learning tree structure.

Hence, by considering the ability and the robust-

ness of detecting data exﬁltration of APT attacks and

the overall performance results, for the data exﬁl-

tration detection problems, the optimal EDeepXGB

we implemented is EDeepXGB − DNN

(Accuracy:

99.98%, Precision: 91.89%, Recall: 93.19%, F1-

Score: 92.49%).

6 CONCLUSIONS

To ﬁll the gap of little attention being paid to detecting

sensitive data exﬁltration after an APT attack has es-

tablished a foothold on victim systems, and to detect

data exﬁltration of APT attacks within normal trafﬁc

if the exposed information is converted into extremely

small sizes in different victim systems, in this paper,

we veriﬁed that data exﬁltration detection problems

of APT attack can be analyzed from two perspectives:

Exﬁltration over C2 channel and Exﬁltration transfer

size limitation.

Compared to base deep learning models (four

different Deep Neural Networks and four Convolu-

tional Neural Networks) and basic machine learning

models (Naive Bayes, Quadratic Discriminant Anal-

ysis, Random Forest and AdaBoost), with the detec-

tion mechanisms we summarized (Transfer Lifetime

Volatility and Transfer Speed Volatility), the ensem-

ble deep learning tree based on eXtreme Gradient

Boosting (EDeepXGB) we proposed can detect the

data exﬁltration of APT attacks in high overall perfor-

mance. Meanwhile, with all EDeepXGB based eight

DL models, the optimal ensemble deep learning tree

we implemented can obtain an overall performance

of (Accuracy: 99.98%, Precision: 91.89%, Recall:

93.19%, F1-Score: 92.49%).

Since the data exﬁltration detection samples are

very small compared to other attack data in public

datasets, our next step will be to expose our proposed

method to a real-world APT attack environment to

test the detection performance. Moreover, our pro-

posed deep learning architecture requires more opti-

mization experiments to ﬁnd the best implementation.

ACKNOWLEDGEMENTS

This work is supported by JSPS Kakenhi Grant Num-

ber 21K11888 and Hitachi Systems, Ltd.

REFERENCES

Abdullayeva, F. J. (2021). Advanced persistent threat at-

tack detection method in cloud computing based on

autoencoder and softmax regression algorithm. Array,

10:100067.

Alenezi, R. and Ludwig, S. A. (2021). Classifying dns

tunneling tools for malicious doh trafﬁc. In 2021

IEEE Symposium Series on Computational Intelli-

gence (SSCI), pages 1–9.

Alminshid, K. A. and Omar, M. N. (2020). A framework

of apt detection based on packets analysis and host

destination. Iraqi Journal of Science, page 215–223.

Aseel Kayal, Mark Lechtik, P. R. (2021). Lyceum re-

born: counterintelligence in the Middle East; VB2021

localhost — vblocalhost.com. https://vblocalhost.

com/uploads/VB2021-Kayal-etal.pdf/. [Accessed 15-

April-2023].

Charan, P. S., Anand, P. M., and Shukla, S. K. (2021).

Dmapt: Study of data mining and machine learning

WEBIST 2023 - 19th International Conference on Web Information Systems and Technologies

284

techniques in advanced persistent threat attribution

and detection. In Thomas, C., editor, Data Mining,

chapter 5. IntechOpen, Rijeka.

Chen, P., Desmet, L., and Huygens, C. (2014). A study

on advanced persistent threats. In De Decker, B.

and Z

uquete, A., editors, Communications and Mul-

timedia Security, pages 63–72, Berlin, Heidelberg.

Springer Berlin Heidelberg.

Chen, T. and Guestrin, C. (2016). Xgboost: A scalable

tree boosting system. In Proceedings of the 22nd

ACM SIGKDD International Conference on Knowl-

edge Discovery and Data Mining, KDD ’16, page

785–794, New York, NY, USA. Association for Com-

puting Machinery.

D’Agostino, J. and Kul, G. (2021). Toward pinpointing

data leakage from advanced persistent threats. In

2021 7th IEEE Intl Conference on Big Data Security

on Cloud (BigDataSecurity), IEEE Intl Conference

on High Performance and Smart Computing, (HPSC)

and IEEE Intl Conference on Intelligent Data and Se-

curity (IDS), pages 157–162.

Edgar, T. W. and Manz, D. O. (2017). Chapter 11 - applied

experimentation. In Edgar, T. W. and Manz, D. O.,

editors, Research Methods for Cyber Security, pages

271–297. Syngress.

Eric, C., Liam, O., and Nicolas, F. (2012). W32.duqu: The

precursor to the next stuxnet. In USENIX Workshop

on Large-Scale Exploits and Emergent Threats.

Ghaﬁr, I., Hammoudeh, M., Prenosil, V., Han, L., Hegarty,

R., Rabie, K., and Aparicio-Navarro, F. J. (2018). De-

tection of advanced persistent threat using machine-

learning correlation analysis. Future Generation

Computer Systems, 89:349–359.

Irshad, H., Ciocarlie, G., Gehani, A., Yegneswaran, V., Lee,

K. H., Patel, J., Jha, S., Kwon, Y., Xu, D., and Zhang,

X. (2021). Trace: Enterprise-wide provenance track-

ing for real-time apt detection. IEEE Transactions on

Information Forensics and Security, 16:4363–4376.

King, J., Bendiab, G., Savage, N., and Shiaeles, S. (2021).

Data exﬁltration: Methods and detection countermea-

sures. In 2021 IEEE International Conference on Cy-

ber Security and Resilience (CSR), pages 442–447.

Kostas, K. (2018). Anomaly Detection in Networks Using

Machine Learning. Master’s thesis, University of Es-

sex, Colchester, UK.

Lal, A., Prasad, A., Kumar, A., and Kumar, S. (2022).

Data exﬁltration: Preventive and detective counter-

measures. In Proceedings of the International Con-

ference on Innovative Computing & Communication

(ICICC) 2022.

Mamun, M. and Shi, K. (2021). Deeptaskapt: Insider apt

detection using task-tree based deep learning. In 2021

IEEE 20th International Conference on Trust, Secu-

rity and Privacy in Computing and Communications

(TrustCom), pages 693–700.

Mandiant (2014). APT1 — Exposing One of China’s

Cyber Espionage Units — mandiant.com.

https://www.mandiant.com/resources/reports/

apt1-exposing-one-chinas-cyber-espionage-units.

[Accessed 18-April-2023].

Mengqi, Z., Yang, L., Guangxi, Y., Bo, L., and Weiping, W.

(2022). Detecting dns over https based data exﬁltra-

tion. Computer Networks, 209:108919.

Moghaddam, A. K. and Zincir-Heywood, N. (2020). Ex-

ploring data leakage in encrypted payload using su-

pervised machine learning. Proceedings of the 15th

International Conference on Availability, Reliability

and Security.

NSA, CISA, FBI, and NCSC (2021). Russian gru

conducting global brute force campaign to com-

promise enterprise and cloud environments.

https://media.defense.gov/2021/Jul/01/2002753896/

-1/-1/1/CSA

GRU GLOBAL BRUTE FORCE

CAMPAIGN UOO158036-21.PDF. [Accessed

15-April-2023].

Sabir, B., Ullah, F., Babar, M. A., and Gaire, R. (2021).

Machine learning for detecting data exﬁltration: A re-

view. ACM Comput. Surv., 54(3).

Sabir, B., Ullah, F., Babar, M. A., and Gaire, R. (2022). Ma-

chine Learning for Detecting Data Exﬁltration. ACM

Computing Surveys, 54(3):1–47.

Sowmya, M., Kritshekhar, J., Abdulhakim, S., Garima, A.,

Yuli, D., Ankur, C., and Dijiang, H. (2023). Unrav-

eled — a semi-synthetic dataset for advanced persis-

tent threats. Computer Networks, 227:109688.

Stojanovi

c, B., Hofer-Schmitz, K., and Kleb, U. (2020).

Apt datasets and attack modeling for automated de-

tection methods: A review. Computers & Security,

92:101734.

Thongsuwan, S., Jaiyen, S., Padcharoen, A., and Agarwal,

P. (2021). Convxgb: A new deep learning model for

classiﬁcation problems based on cnn and xgboost. Nu-

clear Engineering and Technology, 53(2):522–531.

Veena, R. C. and Brahmananda, S. H. (2022). A frame-

work for apt detection based on host destination

and packet—analysis. In Smys, S., Bestak, R.,

Palanisamy, R., and Kotuliak, I., editors, Computer

Networks and Inventive Communication Technolo-

gies, pages 833–840, Singapore. Springer Singapore.

Xianrui, M. and Joan, F. (2020). Privacy-preserving xg-

boost inference. In NeurIPS 2020 Workshop on Pri-

vacy Preserving Machine Learning.

Zebin, T., Rezvy, S., and Luo, Y. (2022). An explain-

able ai-based intrusion detection system for dns over

https (doh) attacks. IEEE Transactions on Information

Forensics and Security, 17:2339–2349.

Zimba, A., Chen, H., Wang, Z., and Chishimba, M.

(2020). Modeling and detection of the multi-stages

of Advanced Persistent Threats attacks based on

semi-supervised learning and complex networks char-

acteristics. Future Generation Computer Systems,

106:501–517.

Zou, Q., Singhal, A., Sun, X., and Liu, P. (2020). Auto-

matic recognition of advanced persistent threat tactics

for enterprise security. In Proceedings of the Sixth In-

ternational Workshop on Security and Privacy Ana-

lytics, IWSPA ’20, page 43–52, New York, NY, USA.

Association for Computing Machinery.

New Perspectives on Data Exﬁltration Detection for Advanced Persistent Threats Based on Ensemble Deep Learning Tree

285