Using Deep Learning with Attention to Detect Data Exﬁltration by POS

Malware

Gabriele Martino

, Federico Andrea Galatolo

1 a

, Mario G. C. A. Cimino

1 b

and Christian Callegari

2 c

Dept. Information Engineering, University of Pisa, L.go Lazzarino 1, 56122, Pisa, Italy

Quantavis s.r.l., L.go Spadoni, 56126 Pisa, Italy

Keywords:

POS Malware, RAM Scraper, Anomaly Detection, Malware Trafﬁc Data, Self-Attention, Transformer.

Abstract:

In recent years, electronic payment through Point-of-Sale (POS) systems has become popular. For this reason,

POS devices are becoming more targeted by cyber attacks. In particular, RAM scraping malware is the most

dangerous threat: the card data is extracted from the process memory, during the transaction and before the

encryption, and sent to the attacker. This paper focuses on the possibility to detect this kind of malware through

anomaly detection based on Deep Learning with attention, using the network trafﬁc with data exﬁltration

occurrences. To show the effectiveness of the proposed approach, real POS transaction trafﬁc has been used,

together with real malware trafﬁc extracted from a collection of RAM scrapers. Early results show the high

potential of the proposed approach, encouraging further comparative research. To foster further development,

the data and source code have been publicly released.

1 INTRODUCTION

In the last years, card and contactless payments sensi-

bly increased, with card payments representing 51%

of all payments (UK Finance, 2020). Moreover, the

gap between credit and debit cards is closing, with

cash usage continuing to decline (creditcards.com,

2021). It is predicted that, by 2025, as many as 75%

of all transactions will be made without cash (ﬁnance-

magnates.com, 2016).

RAM scraping is behind many of the major POS

attacks (Caldwell, 2014). Another kind of attack is

POS Skimmers (d3security.com, 2017): the cyber-

criminal places an ’overlay’ skimmer on top of the

card reader and pin pad, to steal the data later.

RAM scraping malware is the most dangerous at-

tack on POS systems, because a malware could be

easily installed via social engineering or phishing, to

exﬁltrate customers’ data through the Internet (Trend-

Micro, 2015). When a customer’s card is swiped on

the POS device using the magnetic stripe, the card and

owner’s information present in Track 1 and Track 2

are momentarily stored in the process memory of the

payment system. The POS malware steals this data

https://orcid.org/0000-0001-7193-3754

https://orcid.org/0000-0002-1031-1959

https://orcid.org/0000-0001-7323-8069

before it is encrypted and deleted, and sends it to the

cyber criminals.

There are many ways to exﬁltrate customers’

data: by sending it to a ﬁctitious server via SMTP

or FTP protocol, or to C&C servers via HTTP

POST/GET/Header, RDP (Remote Desktop Proto-

col), or via TOR protocol in sophisticated malware

(TrendMicro, 2015).

Malware types have been extensively classiﬁed in

(Rodr

ıguez, 2017) (Cimino et al., 2020), consider-

ing persistence method, protection method, function-

ality, how data are exﬁltrated, and how data are ci-

phered. Since data exﬁltration is characterized by

well-deﬁned behavioral patterns, the related mali-

cious activity is easily detectable by network moni-

toring tools.

There are four main methods of trafﬁc clas-

siﬁcation: port-based, deep packets inspection

(DPI), statistical-based and behavioral-based (Bier-

sack et al., 2013). The accuracy of port-based meth-

ods is very low nowadays, because of the common

use of random ports and port disguises. On the other

hand, DPI-based methods encounter great difﬁculties

because they are unable to decrypt the trafﬁc. The

current research mainly focuses on statistical-based

methods and behavioral-based methods. Both meth-

ods are based on machine-learning approaches (Azab

638

Martino, G., Galatolo, F., Cimino, M. and Callegari, C.

Using Deep Learning with Attention to Detect Data Exﬁltration by POS Malware.

DOI: 10.5220/0011993900003467

In Proceedings of the 25th International Conference on Enterprise Information Systems (ICEIS 2023) - Volume 1, pages 638-648

ISBN: 978-989-758-648-4; ISSN: 2184-4992

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

et al., 2022). Deep Learning approaches try to solve

the overly engineered features that need to be ex-

tracted from the ﬂow of packets. Many methods have

been proposed, both supervised and unsupervised:

1D-CNN (Convolutional Neural Network), 2D-CNN,

RNN (Recurrent NN) + CNN, LSTM (Long Short-

Term Memory) Autoencoder, Combination of LSTM

+ CNNs (Wang et al., 2017a), (Zhou et al., 2017),

(Chen et al., 2017), (Aceto et al., 2020), (Lopez-

Martin et al., 2017), (Liu et al., 2019), (Aceto et al.,

2019), (Wang, 2015), (Mirsky et al., 2018), (Wang

et al., 2018), (Cimino. et al., 2022).

For validation purposes, in this paper, a dataset

of exﬁltration occurrences has been created from real

samples of POS RAM scraping malware, and from

transactions of real POS systems. As anomaly detec-

tion approaches, deep learning models with attention

have been developed. Early results show the high po-

tential of the proposed approach, encouraging further

comparative research. To foster further development,

the data and source code have been publicly released

on GitHub (Martino, 2023).

The paper is organized as follows. Section 1 de-

scribes related works. The fundamental behavior of

POS systems is shown in Section 3. Section IV ex-

plains how POS RAM scraping malware works. Sec-

tion 5 shows some methodologies of application clas-

siﬁcation using network trafﬁc with deep learning

models. Datasets and Deep Learning architectures are

detailed in Section 6, together with some early exper-

imental studies. Results are discussed in Section 7.

Conclusions are drawn in Section 8.

2 RELATED WORK

The following research works represent the main ap-

proaches in the literature. In (Wang et al., 2017b) a

CNN has been used to classify Normal and Malware

trafﬁc. The trafﬁc has been taken from (CTU Uni-

versity, 2016). However, no POS malware trafﬁc has

been used in the experimental studies. In (Bader et al.,

2022) a similar approach has been proposed, building

a more complex model, via different kinds of features

extracted from the trafﬁc samples. In (Shaikh and

Shashikala, 2019) a pipeline made by an autoencoder

and an LSTM has been used to classify Normal and

DoS attack trafﬁc. In (Anderson and McGrew, 2017)

an interesting analysis of issues related to the adop-

tion of machine learning on trafﬁc data for detect-

ing malware is presented: from mislabelling of data

to non-stationarity of the network trafﬁc. In (Mar

ın

et al., 2021) a combination of 1D-CNN and LSTM

networks has been experimented for classifying the

trafﬁc of three types of malware, using Raw Packets

and Raw Flows. In (Lichy et al., 2023) a comparison

between classic ML-based and DL-based solutions is

made, showing that not necessarily the DL ones out-

perform to classify malware trafﬁc.

A major challenge in the literature is the huge

amount of different types of malware present nowa-

days. For this reason, training a single DL model for

classiﬁcation could lead to obsolescence quite soon.

Moreover, it has been shown that, although complex

models could get high performance, simple models

perform similarly. Last but not least, it is well known

that the difﬁculty of labeling malicious trafﬁc data

leads to noisy datasets. The purpose of this research

is to develop models that are robust to identify sus-

picious trafﬁc, without recognizing the speciﬁc mal-

ware.

To the best of our knowledge, the literature lacks

research works focused on trafﬁc data extracted from

RAM scraping malware, in terms of both approaches

and available benchmark data. For this reason, in

this research, some malware samples available in

(Rodr

ıguez, 2017) have been used to create a dataset

for testing purposes. A deep learning approach based

on autoencoders with an attention mechanism has

been used to exploit only normal trafﬁc for training

purposes.

3 POINT-OF-SALE SYSTEMS

This section focuses on how a POS system handles

transaction data ﬂow. Fig.1 shows an overview of the

transaction ﬂow of a card payment (FirstData, 2010)

(Rodr

ıguez, 2017). Speciﬁcally, the customer inserts

his card into the merchant’s payment system through a

POS terminal. The related data is sent to the acquirer

bank, which carries out a routing to a card payment

brand circuit (e.g., VISA, MasterCard, or American

Express). Then, the related issuer bank veriﬁes the

card legitimacy, i.e. not reported as stolen or lost,

and veriﬁes that the customer’s account has enough

funds/credit available to pay. If that is the case, the

issuer bank generates an authorization number and

routes it back to the card payment brand, which for-

wards it to the acquirer bank. The acquirer bank then

forwards it to the merchant, which concludes the sale

with the customer, providing her with an acknowl-

edgment (normally in terms of a receipt) (Rodr

ıguez,

2017).

There exist different kinds of POS systems, which

can be classiﬁed on the basis of the interface, i.e., an

external interface with respect to the transaction pro-

cessing system, or an integrated interface, such as a

Using Deep Learning with Attention to Detect Data Exﬁltration by POS Malware

639

Figure 1: An abstraction of Point-of-Sale card transaction ﬂow (extracted and adapted from (Rodr

ıguez, 2017) (FirstData,

2010)).

mobile app on a smartphone (connectpos.com, 2020)

(ﬁtsmallbusiness.com, 2022) (intel.com, 2022). In

(Gomzin, 2014) it is stated that at least three of the

vulnerabilities of POS systems are located where the

customer data may reside: (i) in memory: data ma-

nipulations are carried out by the payment application

when processing an authorization or a settlement, and

thus, payment card data remains in the memory of the

processing machine; (ii) at rest, i.e., when the pay-

ment application stores data on a disk device, either

temporarily or for a long term; (iii) in transit, when

payment data are received and sent to and from other

application and devices within the system.

In modern credit cards, the data stored is accessed

by four different interfaces: by physical access, by

magnetic stripe, by a chip reader (EMV, i.e., Euro-

pay, MasterCard and Visa), or by an NFC reader.

The spread of the usage of a certain kind of interface

with respect to others strongly depends on the coun-

try. EMV introduced a way to authenticate chip-card

transactions and to minimize the magnetic stripe card

counterfeiting fraud, but it is mainly spread in the EU,

and less used in USA (Secure Technology Alliance,

2014) (Symantec, 2014).

Data provided by physical access to the card is

well known: Name, Expiration Date, Credit Card

Number, and Card Veriﬁcation Value (CVV/CVV2).

The card magnetic stripe, located on the back, is hor-

izontally divided into three tracks. Track 1 and Track

2 contain similar data, but different formats, both

standardized in ISO/IEC 7813 (ISO/IEC 7813:2006,

2006). Track 3, also called THRIFT, was origi-

nally intended for use with Automatic Teller Ma-

chines. NFC is a bidirectional short-range (less

than 10 cm) contactless communication technology,

operating on the 13.56 MHz spectrum, based on

two Radio Frequency Identiﬁcation (RFID) standards.

Namely, contactless payment cards follow the ISO-

14443 (ISO/IEC 7813:2006, 2013) standard. Security

with NFC is debatable since a contactless card can

communicate with any NFC reader, without any iden-

tiﬁcation of it. Hence, a contactless card’s track trans-

mits private customer information once communica-

tion is established. In (Chabbi et al., 2022) a classiﬁ-

cation of possible attacks is reported, such as: Eaves-

dropping, Relay Attack, Replay attack, also Skim-

ming, Cloning and Malware as well.

Considering the variety of card interfaces and POS

systems as well as OSs on which they are based,

the vulnerability of these kinds of payment methods

could be everywhere, so it is crucial to develop sys-

tems that are able to detect any of these attempts of

attack.

To the best of our knowledge, POS RAM scraper

malware mainly searches for Track 1 and Track 2

data, given the vast diffusion of magnetic stripe inter-

faces for payment in the USA in the past years. How-

ever, there is the possibility that in the future this mal-

ware can steal other information deriving from differ-

ent interfaces and for all Operating Systems ((Bod-

hani, 2013)).

4 POS RAM SCRAPING

MALWARE

Point-of-Sale RAM scraping malware is a malicious

software that, once installed in a POS system, usu-

ally based on Windows OS, steals credit/debit card

data such as Track 1 and Track 2. The malware

uses a multitude of techniques to collect data: it it-

erates over all running processes, uses a blacklist to

avoid scanning where Track 1 and Track 2 cannot be

found, uses regex matching techniques, encodes data

in base64 to obfuscate their content. The malware can

also differentiate over several data exﬁltration meth-

ods: manually removed, HTTP POST, FTP Server,

HTTP HEADER, TOR, SMTP Protocol. Some well-

known names are: alina, Dexter, BlackPOS, Soraya

and many others (Trend Micro, 2014). Cybercrim-

inals register fake domains for data-exﬁltration pur-

poses with hosting providers in countries with lax In-

ternet law enforcement, such as Russia and Romania,

among others. These fake domains act like man-in-

the-middle (MitM) data collectors. The Tor network

conceals C&C servers’ IP addresses and, by default,

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

640

encrypts all trafﬁc. The C&C servers’ addresses end

with a .onion Top Level Domain, which cannot be re-

solved outside the Tor network and can only be ac-

cessed using a Tor proxy application. ChewBacca

malware makes use of this functionality. Cybercrim-

inals use compromised email accounts to exﬁltrate

stolen data. A command line email client invoked

through a batch script may be used to exﬁltrate stolen

data as an attachment. BlackPOS makes use of this

functionality. Cybercriminals create accounts on FTP

servers that are hosted in countries with lax Inter-

net law enforcement. Malware such as BlackPOS or

BrutPOS log in to FTP servers using hardcoded cre-

dentials and copy over the stolen data (Trend Micro,

2014). Since some of these protocols can be blocked

by the ﬁrewall of the POS system, malware is evolv-

ing as well. An example is the usage of the DNS pro-

tocol: it cannot be blocked for normal functioning of a

device connected to the internet. Multigrane malware

encrypts with a 1024-bit RSA key the stolen payment

card data, then it passes data through a Base32 en-

coding process. The resulting encoded data is used in

a DNS query for log.[encoded

data].evildomain.com,

where ”evildomain” is a domain name controlled

by the attackers (computerworld.com, 2016) (secure-

box.comodo.com, 2016).

Figure 2: Data-exﬁltration techniques observed among PoS

RAM scrapers (Trend Micro, 2014).

5 TRAFFIC CLASSIFICATION

VIA DEEP LEARNING

The most two common choices of trafﬁc representa-

tion are session and ﬂow (Dainotti et al., 2012). A

ﬂow is made by all the packets for which the fol-

lowing 5-tuple match: source IP, source port, desti-

nation IP, destination port and transport-level proto-

col. A session is made by combining the two ﬂows

in the opposite direction, i.e. the source and destina-

tion IP / port are swapped. In (Wang et al., 2017a)

both ﬂows and sessions are taken into account, and

packets information relates to Layers 7 and 4 of the

ISO/OSI model. Then, it takes the ﬁrst 785 bytes

of each packets and builds the temporal series. The

research work compares 1D-CNN and 2D-CNN. In

(Chen et al., 2017) statistical features are extracted

from sessions, then a pseudo-image is created from

it and a CNN classiﬁer. In (Aceto et al., 2020) the

ﬁrst 784 or 576 bytes of L4 payload and a combina-

tion of CNNs are used. In (Lopez-Martin et al., 2017)

ﬂows are extracted for each packet using only six fea-

tures: source port, destination port, number of bytes

in payload, TCP windows size, interrarival time and

direction of packet. Then the ﬂows are divided in

batch of 20 packets. At the end, they test 2D-CNN,

RNN (LSTM) and a combination of the two models.

In (Liu et al., 2019) an Autoencoder made of GRUs

(RNN) is used to extract relevant features from raw

ﬂows, then dense layers are used for classiﬁcation. In

(Aceto et al., 2019) sessions are directed to two dif-

ferent models: a certain number of bytes of the L4

payload, to a 2D-CNN and some ﬁelds of the pack-

ets to a RNN, then combines the features in a ﬁnal

layer for classiﬁcation. In (Yang et al., 2020) the TCP

ﬂag is reported as quite informative, especially in the

intra-session correlation. In (Wang et al., 2018) a se-

ries of features are extracted from packets, and then

proposed to a Stacked Autoencoder model to extract

higher order features from the ﬁrst 144 bytes of the

packets.

6 METHOD

6.1 Trafﬁc Datasets

The dataset is divided in Normal trafﬁc and Malware

trafﬁc. Fig. 3 shows how the two trafﬁc data have

been sniffed. A stand-alone Android-based POS is

connected to a WiFi hotspot made by a laptop con-

nected to an access point. On the laptop, that acts

like a bridge, Wireshark was activated to sniff all the

trafﬁc from and towards the mobile POS. Around a

hundred of actual card transactions were made with-

out really charge any money.

The Malware trafﬁc is extracted from a group of

POS RAM scraping sample shown in Table 1 down-

loaded from http://webdiis.unizar.es/. These mal-

Using Deep Learning with Attention to Detect Data Exﬁltration by POS Malware

641

Figure 3: A) The mobilePOS is connected to a laptop as HotSpot, the laptop is connected to an Access Point. The packets are

sniffed from the laptop. b) An infected virtual machine is connected to internet through the host machine. Wireshark sniffs

packets from the VM.

waretypes are installed in a Virtual Machine with

Windows 7 installed on it.

Since this kind of malware are able to scan in

memory process to ﬁnd Track 1 and Track 2 format,

before installing these samples, we executed a credit

card number generator (github.com/bizdak/ccgen)

that was always running before the installation of the

malware. This Track 1 and Track 2 generator pro-

duces a valid card number every 0.2 seconds.

To avoid that different malware trafﬁc overlaps

each other, they have been installed and then removed

before the installation of the next one. We then ﬁl-

ter out background packets from the actual exﬁltration

trafﬁc generated from the malware.

Table 1: POS RAM Scraping Malware sample.

Malware Selected Sample

alina 1efeb85c8ec2c07dc0517ccca7e8d743

backoff 05f2c7675ff5cda1bee6a168bdbecac0

6a0e49c5e332df3af78823ca4a655ae8

blackpos 0ca4f93a848cf01348336a8c6ff22daf

7f1e4548790e7d93611769439a8b39f2

decebal 46185a6ec6d527576248ef65a82b891d

91100e23e59d5744a5720a6f84b68d99

frameworkPOS a5dc57aea5f397c2313e127a6e01aa00

b57c5b49dab6bbd9f4c464d396414685

getmypassPOS 1d8fd13c890060464019c0f07b928b1a

jackpos 00b09796519c60c7369290f19f89cd10

lusypos bc7bf2584e3b039155265642268c94c7

soraya 1483d0682f72dfefff522ac726d22256

1661aab32a97e56bc46181009ebd80c9

Table 2: Benign and Malign Network Trafﬁc Data Sample.

Traces Number of Packets

Benign 62923

Malign 1132

Fig. 4 and Fig. 5 show the distribution of the pro-

tocols of the trafﬁc of the two datasets. At ﬁrst, is

interesting to note that almost all the trafﬁc outcom-

ing from the mobilePOS is HTTPS. Instead, the larger

amount of trafﬁc from the malware dataset is made by

DNS packets. Part of this DNS trafﬁc derives from

all the malwares that check for server domains, and a

larger part from some malwares that uses this exﬁltra-

tion method. This approach splits the card number in

small chunks of bytes and send them as DNS requests.

It’s worth mentioning that in some POS systems

certain protocols could be blocked from the ﬁrewall,

so keeping safe the data. For this reason many of these

malware migrate towards DNS approach, because it

is necessary for the normal function of internet and

hence cannot be blocked.

POS systems based on different OSs could have

different vulnerabilities and different trafﬁc patterns.

Our experiments mixed a real transaction trafﬁc from

a POS Android-based and real malware trafﬁc in-

stalled on a Windows OS. We assume that besides

the POS OS, these are the exﬁltration methods that

can be implemented and should be detected, regard-

less the OS. Moreover, our challenge is to analyse

the capabilities of some models to distinguish the two

kind of trafﬁc, especially on the overlapping proto-

cols: HTTPS and DNS.

Figure 4: Protocol data distribution in Transaction Dataset.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

642

Figure 5: Protocol data distribution in Malware Dataset.

6.2 Trafﬁc Preprocessing

The sessions (also called biﬂows) are extracted from

the two trafﬁc data samples. After removing the

IPs, for each packet the following data was extracted:

source Port, destination Port, ACK ﬂag, PUSH ﬂag,

RESET ﬂag, SYN ﬂag, FYN ﬂag, interrarival time,

no. of bytes of payload, TCP window size. All ﬂags

are zero in the case of UDP packet. Then, each ses-

sion is split into sub-sessions of different time win-

dows: 5, 10, 15, 20 packets per session. Each of these

features is normalized in [0, 1]. When splitting ses-

sion, if the number of packets in the time series is less

than the window size, the remaining size is pad with

zeros. Features from (Lopez-Martin et al., 2017) and

(Yang et al., 2020) have been combined.

6.3 Deep Learning Models

All the considered models follow the Autoencoder ap-

proach: the model is trained to reproduce as an output

the same time series provided in input. The Normal

dataset is used as training set, in our case is the one

with the mobile-POS transaction.

6.3.1 LSTM Autoencoder Architecture

In (Sutskever et al., 2014) and (Cho et al., 2014) an

Encoder-Decoder model with LSTMs is proposed to

encode a time series (e.g. a sentence) in a single

ﬁxed-lenght vector. Then, this vector decodes it into

a another time series that could be the future time-

steps or a translated sentence, if purposely trained. If

the training is to reproduce the same time-series, we

could use this model as an Anomaly Detector. In (Wei

et al., 2022) and (Srivastava et al., 2015), the authors

report details of the used architecture.(Said Elsayed

et al., 2020) reports an example of usage of this model

for anomaly detection in network trafﬁc. An Autoen-

coder has been implemented, for which Encoder and

Decoder have two layers where the vector size of the

latent space in the outer layers is doubled respect to

the internal layers.

6.3.2 LSTM Autoencoder with Attention

Mechanism

An attentional mechanism has been used to improve

neural machine translation by selectively focusing on

parts of the source sentence during translation (Luong

et al., 2015). The idea of Global Attentional model is

to consider all the hidden states of the encoder when

deriving the context vector c

(Luong et al., 2015).

The context vector can be computed as weighted sum

of the hidden states of the encoder at any time-steps

(Bahdanau et al., 2014). The weights come in general

from a Softmax function, in which the Scores of the

mutual combination of the hidden states (in case of

self-attention). In (Luong et al., 2015) three different

alternatives for score function have been proposed:

score(h

, h

) =











dot

general

tanh(W

]) concat

(1)

The Autoencoder with attention mechanism has

the same architecture of the LSTM Autoencoder, but

the input of the decoder is made by context vectors,

derived from self-attention mechanism of the hidden

states of the encoder. Two versions for the two score

functions have been developed: dot and general.

6.3.3 Transformer

(Vaswani et al., 2017) is a paper inﬂuencing the liter-

ature, which shows that by stacking multiple attention

layers alone, the model is able to learn effectively high

correlated temporal sequence but it is also capable of

solving complex tasks in zero-shot fashion (Galatolo.

et al., 2022). BERT and GPT-3 are just examples (De-

vlin et al., 2018) (Brown et al., 2020).

A simple transformer model in Autoencoder-like

fashion has been developed: the output needs to be

equal to the input and the input projection is masked,

where the mask is learned. All the tested transformers

have a number of heads of 4 and a depths of 6, set as

hyperparameters.

6.4 Experimental Setup

The Pytorch Lightning Framework has been used

which runs on Ubuntu 22.04 64bit OS. The machine

is equipped with 16 CPUs and 32GB of memory.

Using Deep Learning with Attention to Detect Data Exﬁltration by POS Malware

643

An Nvidia Geforce RTX 3070 is used as accelerator.

Mini-batch size of 128, cost function of L1 with ’sum’

as reduction method. Adam as optimizer, 0.0008 as

learning rate, 300 epochs and early stopping as over-

ﬁtting avoiding method with 30 epochs as patience.

7 EXPERIMENTAL RESULTS

Table 3 reports the results of the experimentation. Af-

ter the training, the threshold that reach the best F1

score is searched. It’s possible to use several euristics

to ﬁnd the best threshold. The average of the distribu-

tion of the loss has been calculated. For Benign and

Malign trafﬁc, for the threshold that give the best F1

score. We used the LSTM AE model as performance

reference and it’s possible to see that the results are

quite variable. In (Said Elsayed et al., 2020) similar

results reported.

We can assert that all the attention-based models

have much more stable and predictable results. Inter-

estingly, all the attention-based models increase their

F1 score with the increasing of the length of the ses-

sion. Moreover, increasing the hidden size we see also

a small increasing of performance.

It’s important to note that the performance of these

models is quite inﬂuenced from the choice method of

threshold, to label a certain loss as malware or not.

Furthermore, given the similarity of the dynamics of

malware trafﬁc with the normal ones, this is pretty

hard problem to solve. In fact, training on certain traf-

ﬁc helps the model to reconstruct the trafﬁc of mal-

ware data exﬁltration as well.

The goodness of the models depends on how well

the loss distribution of Normal trafﬁc is separable

from the Malware trafﬁc one.

Fig. 6 shows an histogram of the loss distribu-

tions of the testset made from the malware trafﬁc and

a sample of the transactions trafﬁc. In this example

we notice a good separation of the two loss distribu-

tion.

In addition, it can be noticed that given the un-

balancing of the protocols in the two datasets, the

models, and so the results, could be biased to dis-

tinguish between the two majority protocols (HTTPS

and DNS) instead of the two kind of trafﬁc source,

Transactions and Malware; see Fig. 5 and Fig. 4. To

better analyse the performance of the models, we ex-

tracted from the trafﬁc sources only the packets corre-

sponding the predominants protocols which are in our

case HTTPS and DNS. Then we analyse how much

the models are able to distinguish between Normal

and Malware Trafﬁc over the same protocols.

Fig. 7a and Fig. 7b show the performance to dis-

Figure 6: Loss Distribution of Attention-LSTM AE - Gen-

eral Score - hidden size: 128, Window Size: 20.

tinguish the two majority protocols of the datasets. In

both the plots we measured the F1-score for all the

four window size and the four models. To make a fair

comparison we ﬁxed the hidden size to 8. We remem-

ber that the transformer has a more complex set of

hyperparameters but here we just considered the size

of internal features.

It’s possible to notice that the actual good perfor-

mance is achieved in distinguish Transaction sessions

from the Malware sessions, also in overlapping proto-

cols, that is exactly what it was supposed. The size of

the session the performance increase with the window

size accordingly as previously found.

The transformers seem to struggle for this kind of

purpose, and needs larger window size to be more

’embedded’ to the trafﬁc type. The LSTM Autoen-

coder used as reference, seems to have good results

as well, but it is suggested the LSTM AE models with

attention-mechanism embedded for higher stability of

the performance.

We can assess that with HTTPS protocol a larger

window size is necessary to effectively detect the mal-

ware trafﬁc. This is quite reasonable since this proto-

col needs from 6 to 10 packets to complete the hand-

shake, that moreover doesn’t bring any sensitive in-

formation. This fact strengthens the assessment of the

affectiveness of the method.

Instead with the DNS protocol, even with small

hidden size, we always have F1-score above 0.75,

reaching values above 0.90 in average. This is in-

teresting, because even if the DNS data-exﬁltration is

the most dangerous, since can bypass easily any ﬁre-

wall, seems that there has been no forethought in the

development of this protocol to obsfuscate this usage

respect to the normal one, resulting in an almost easy

detectability.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

644

Table 3: Performance of LSTM AE, LSTM AE with attention: dot, general, Transformer for the four different session size.

Window Size

5 10 15 20

Hidden

Size

F1 Recall F1 Recall F1 Recall F1 Recall

8 0.692 1.0 0.794 1.0 0.706 0.95 0.917 1.0

LSTM AE 16 0.672 0.965 0.289 0.966 0.239 0.95 0.348 0.967

32 0.816 0.965 0.546 0.966 0.553 0.962 0.693 0.967

64 0.681 1.0 0.694 1.0 0.377 0.962 0.674 0.967

128 0.769 1.0 0.723 1.0 0.862 0.962 0.859 0.967

8 0.527 1.0 0.824 1.0 0.86 1.0 0.878 1.0

Attention-LSTM AE 16 0.781 1.0 0.82 1.0 0.86 1.0 0.878 1.0

dot score 32 0.946 1.0 0.866 1.0 0.856 1.0 0.891 1.0

64 0.787 1.0 0.751 1.0 0.741 1.0 0.783 1.0

128 0.855 1.0 0.854 1.0 0.968 1.0 0.92 1.0

8 0.506 1.0 0.809 1.0 0.783 1.0 0.871 1.0

Attention-LSTM AE 16 0.552 1.0 0.852 1.0 0.61 1.0 0.917 1.0

general score 32 0.75 1.0 0.522 1.0 0.968 1.0 0.862 1.0

64 0.801 1.0 0.729 1.0 0.731 1.0 0.943 1.0

128 0.856 1.0 0.731 1.0 0.84 1.0 0.967 1.0

8 0.541 1.0 0.629 1.0 0.76 1.0 0.819 1.0

TRANSFORMER 16 0.575 1.0 0.634 1.0 0.777 1.0 0.825 1.0

32 0.479 1.0 0.703 1.0 0.819 1.0 0.83 1.0

64 0.069 0.482 0.142 0.923 0.77 0.962 0.837 1.0

128 0.097 0.69 0.328 0.897 0.364 0.988 0.682 1.0

8 CONCLUSIONS

Cashless transactions are becoming always more pre-

ponderant in the market. This research brings the de-

vices deed to these money exchanges to be always

more targeted from cyberattacks. POS RAM scraping

malwares seem to be nowadays still the more danger-

ous attack to the customers. In this paper it is analized

the possibility to detect the data exﬁltration methods

of these malware using novel DL models. To accom-

plish this aspect we created two datasets from a real

mobile POS device and real POS malwares that try to

exﬁltrate data from the Track 1 and Track 2 generator

process.

As we have shown, these kind of dataset are quite

difﬁcult to retrieve and to analyse given the variables

that can affect the kind of trafﬁc: topology of network,

types of applications, unbalancing of the protocols.

For this reason the environment where the POS sys-

tem is connected to small networks such as the stores

ones can be efﬁciently analysed.

The LSTM Encoder-Decoder model has the

known lack to encode also high informative and long

time series in a single ﬁxed-size vector that will be

later decoded. Attention models try to avoid this

keeping all the information of each timesteps and us-

ing them with the right re-weighting for the decoder

leading to better results.

In our context, we wanted to test if the attention-

mechanism brings to higher capacity to retrieve pecu-

liar information from a time series, compact and bet-

ter distinguish from similar but semantically different

network trafﬁc.

Our results suggest that attention in LSTM AE

models leads to an higher regularity in the latent

space, that could correspond to a better deﬁned dis-

tribution of reconstruction loss. Future work could

be creating a larger POS malware trafﬁc dataset with

all the possible exﬁltration methods, to cover all the

shades possibilities even for futures attacks of this

kind. Morever, it could be interesting to analyse how

similar time series such as network trafﬁc are en-

coded. This could help to better understand which

factors or hidden features lead to a differentiation of

the two trafﬁcs in Explainable AI manner.

Using Deep Learning with Attention to Detect Data Exﬁltration by POS Malware

645

(a) (b)

Figure 7: F1 score evaluation of the DNS and HTTPS sessions of Transaction Dataset vs DNS and HTTPS sessions of

Malware Dataset. All the models have an hidden size of 8.

ACKNOWLEDGEMENTS

The authors thank Leonardo Cecchelli for his work on

the subject during his thesis. Work partially funded

by: (i) the Tuscany Region in the framework of the

SecureB2C project, POR FESR 2014-2020, Project

number 7429 31.05.2017; (ii) PNRR - M4C2 - Inves-

timento 1.3, Partenariato Esteso PE00000013 - ”FAIR

- Future Artiﬁcial Intelligence Research” - Spoke 1

”Human-centered AI”, funded by the European Com-

mission under the NextGeneration EU programme;

(iii) the Italian Ministry of Education and Research

(MIUR) in the framework of the FoReLab project

(Departments of Excellence)

REFERENCES

Aceto, G., Ciuonzo, D., Montieri, A., and Pescap

e, A.

(2019). Mimetic: Mobile encrypted trafﬁc classiﬁca-

tion using multimodal deep learning. Computer net-

works, 165:106944.

Aceto, G., Ciuonzo, D., Montieri, A., and Pescap

e, A.

(2020). Toward effective mobile encrypted trafﬁc

classiﬁcation through deep learning. Neurocomput-

ing, 409:306–315.

Anderson, B. and McGrew, D. (2017). Machine learning

for encrypted malware trafﬁc classiﬁcation: account-

ing for noisy labels and non-stationarity. In Proceed-

ings of the 23rd ACM SIGKDD International Confer-

ence on knowledge discovery and data mining, pages

1723–1732.

Azab, A., Khasawneh, M., Alrabaee, S., Choo, K.-K. R.,

and Sarsour, M. (2022). Network trafﬁc classiﬁcation:

Techniques, datasets, and challenges. Digital Commu-

nications and Networks.

Bader, O., Lichy, A., Hajaj, C., Dubin, R., and Dvir, A.

(2022). Maldist: From encrypted trafﬁc classiﬁcation

to malware trafﬁc detection and classiﬁcation. In 2022

IEEE 19th Annual Consumer Communications & Net-

working Conference (CCNC), pages 527–533. IEEE.

Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural ma-

chine translation by jointly learning to align and trans-

late. arXiv preprint arXiv:1409.0473.

Biersack, E., Callegari, C., Matijasevic, M., et al. (2013).

Data trafﬁc monitoring and analysis. Lecture Notes in

Computer Science, 5(23):12561–12570.

Bodhani, A. (2013). Turn on, log in, checkout. Engineering

& Technology, 8(3):60–63.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.,

Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,

Askell, A., Agarwal, S., Herbert-Voss, A., Krueger,

G., Henighan, T., Child, R., Ramesh, A., Ziegler,

D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler,

E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner,

C., McCandlish, S., Radford, A., Sutskever, I., and

Amodei, D. (2020). Language models are few-shot

learners.

Caldwell, T. (2014). Securing the point of sale. Computer

Fraud & Security, 2014(12):15–20.

Chabbi, S., El Madhoun, N., and Khamer, L. (2022). Secu-

rity of nfc banking transactions: Overview on attacks

and solutions. In 2022 6th Cyber Security in Network-

ing Conference (CSNet), pages 1–5. IEEE.

Chen, Z., He, K., Li, J., and Geng, Y. (2017). Seq2img: A

sequence-to-image based approach towards ip trafﬁc

classiﬁcation using convolutional neural networks. In

2017 IEEE International conference on big data (big

data), pages 1271–1276. IEEE.

Cho, K., Van Merri

enboer, B., Gulcehre, C., Bahdanau, D.,

Bougares, F., Schwenk, H., and Bengio, Y. (2014).

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

646

Learning phrase representations using rnn encoder-

decoder for statistical machine translation. arXiv

preprint arXiv:1406.1078.

Cimino, M. G., De Francesco, N., Mercaldo, F., San-

tone, A., and Vaglini, G. (2020). Model checking

for malicious family detection and phylogenetic anal-

ysis in mobile environment. Computers & Security,

90:101691.

Cimino., M. G. C. A., Galatolo., F. A., Parola., M., Per-

illi., N., and Squeglia., N. (2022). Deep learning

of structural changes in historical buildings: The

case study of the pisa tower. In Proceedings of

the 14th International Joint Conference on Compu-

tational Intelligence - NCTA,, pages 396–403. IN-

STICC, SciTePress.

computerworld.com (2016). New point-of-sale mal-

ware multigrain steals card data over dns.

https://www.computerworld.com/article/3059317/new-

point-of-sale-malware-multigrain-steals-card-data-

over-dns.html.

connectpos.com (2020). Types of pos (point of sale) sys-

tem for retailers. https://www.connectpos.com/types-

of-pos-system-connectpos/.

creditcards.com (2021). Payment method statistics.

https://www.creditcards.com/statistics/payment-

method-statistics-1276/.

CTU University (2016). The stratosphere ips project

dataset. https://www.stratosphereips.org/datasets-

malware.

d3security.com (2017). Risks affecting point of sale ter-

minals. https://d3security.com/blog/risks-affecting-

point-of-sale-terminals/.

Dainotti, A., Pescape, A., and Claffy, K. C. (2012). Issues

and future directions in trafﬁc classiﬁcation. IEEE net-

work, 26(1):35–40.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.

(2018). Bert: Pre-training of deep bidirectional trans-

formers for language understanding.

ﬁnancemagnates.com (2016). The future of ﬁntech: the

death of cash and bank branches but a boost to retail.

https://www.ﬁnancemagnates.com/fintech/bloggers/

ﬁntech-the-death-of-cash-and-bank-branches-but-a

-boost-to-retail/.

FirstData (2010). Payments 101: Credit and debit card

payments- key concepts and industry issues. http:

//euro.ecom.cmu.edu/resources/elibrary/epay/Pa

yments-101.pdf.

ﬁtsmallbusiness.com (2022). Types of pos

systems: A guide for small businesses.

https://ﬁtsmallbusiness.com/types-of-pos-systems/.

Galatolo., F. A., Cimino., M. G. C. A., and Vaglini., G.

(2022). Zero-shot mathematical problem solving via

generative pre-trained transformers. In Proceedings of

the 24th International Conference on Enterprise In-

formation Systems - Volume 1: ICEIS,, pages 479–

483. INSTICC, SciTePress.

Gomzin, S. (2014). Hacking Point of Sale: Payment Appli-

cation Secrets, Threats, and Solutions. John Wiley &

Sons.

intel.com (2022). Deliver seamless expe-

riences with multiple types of pos.

https://www.intel.com/content/www/us/en/internet-

of-things/iot-solutions/pos/types-of-pos.html.

ISO/IEC 7813:2006 (2006). Iso/iec 7813:2006-

information technology — identiﬁca-

tion cards — ﬁnancial transaction cards.

https://www.iso.org/standard/43317.html.

ISO/IEC 7813:2006 (2013). Iso/iec 18092:2013 infor-

mation technology — telecommunications and in-

formation exchange between systems — near ﬁeld

communication — interface and protocol (nfcip-1).

https://www.iso.org/standard/56692.html.

Lichy, A., Bader, O., Dubin, R., Dvir, A., and Hajaj, C.

(2023). When a rf beats a cnn and gru, together—a

comparison of deep learning and classical machine

learning approaches for encrypted malware trafﬁc

classiﬁcation. Computers & Security, 124:103000.

Liu, C., He, L., Xiong, G., Cao, Z., and Li, Z. (2019). Fs-

net: A ﬂow sequence network for encrypted trafﬁc

classiﬁcation. In IEEE INFOCOM 2019-IEEE Con-

ference On Computer Communications, pages 1171–

1179. IEEE.

Lopez-Martin, M., Carro, B., Sanchez-Esguevillas, A., and

Lloret, J. (2017). Network trafﬁc classiﬁer with con-

volutional and recurrent neural networks for internet

of things. IEEE access, 5:18042–18050.

Luong, M.-T., Pham, H., and Manning, C. D. (2015). Ef-

fective approaches to attention-based neural machine

translation. arXiv preprint arXiv:1508.04025.

Mar

ın, G., Caasas, P., and Capdehourat, G. (2021).

Deepmal-deep learning models for malware trafﬁc de-

tection and classiﬁcation. In Data Science–Analytics

and Applications, pages 105–112. Springer.

Martino, G. (2023). Ramscrapersattentiondetectors github.

https://github.com/GabMartino/RAMScrapersAttenti

onDetectors.

Mirsky, Y., Doitshman, T., Elovici, Y., and Shabtai, A.

(2018). Kitsune: an ensemble of autoencoders for

online network intrusion detection. arXiv preprint

arXiv:1802.09089.

Rodr

ıguez, R. J. (2017). Evolution and characterization of

point-of-sale ram scraping malware. Journal of Com-

puter Virology and Hacking Techniques, 13(3):179–

192.

Said Elsayed, M., Le-Khac, N.-A., Dev, S., and Jurcut,

A. D. (2020). Network anomaly detection using lstm

based autoencoder. In Proceedings of the 16th ACM

Symposium on QoS and Security for Wireless and Mo-

bile Networks, Q2SWinet ’20, page 37–45, New York,

NY, USA. Association for Computing Machinery.

Secure Technology Alliance (2014). Emv: Faq.

https://www.securetechalliance.org/publications-

emv-faq/#q1.

securebox.comodo.com (2016). New multi-

grain malware eats memory, steals pos

data. https://securebox.comodo.com/blog/pos-

security/new-multigrain-malware-eats-memory-

steals-pos-data/.

Using Deep Learning with Attention to Detect Data Exﬁltration by POS Malware

647

Shaikh, R. A. and Shashikala, S. (2019). An autoencoder

and lstm based intrusion detection approach against

denial of service attacks. In 2019 1st International

Conference on Advances in Information Technology

(ICAIT), pages 406–410. IEEE.

Srivastava, N., Mansimov, E., and Salakhudinov, R. (2015).

Unsupervised learning of video representations using

lstms. In International conference on machine learn-

ing, pages 843–852. PMLR.

Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence

to sequence learning with neural networks. Advances

in neural information processing systems, 27.

Symantec (2014). Attacks on point-of-sales sys-

tems. https://docs.broadcom.com/doc/attacks-on-

point-of-sale-systems-en.

Trend Micro (2014). Pos ram scraper malware past,

present, and future. https://www.wired.com/wp-

content/uploads/2014/09/wp-pos-ram-scraper-

malware.pdf.

TrendMicro (2015). Defending against pos ram scrap-

ers. https://d3security.com/blog/risks-affecting-point-

of-sale-terminals/.

UK Finance (2020). Uk payment markets 2020.

https://www.ukﬁnance.org.uk/policy-and-

guidance/reports-publications/uk-payment-markets-

2020.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.

(2017). Attention is all you need. Advances in neural

information processing systems, 30.

Wang, K., Chen, L., Wang, S., and Wang, Z. (2018). Net-

work trafﬁc feature engineering based on deep learn-

ing. In Journal of Physics: Conference Series, volume

1069, page 012115. IOP Publishing.

Wang, W., Zhu, M., Wang, J., Zeng, X., and Yang, Z.

(2017a). End-to-end encrypted trafﬁc classiﬁcation

with one-dimensional convolution neural networks. In

2017 IEEE international conference on intelligence

and security informatics (ISI), pages 43–48. IEEE.

Wang, W., Zhu, M., Zeng, X., Ye, X., and Sheng, Y.

(2017b). Malware trafﬁc classiﬁcation using convo-

lutional neural network for representation learning.

In 2017 International conference on information net-

working (ICOIN), pages 712–717. IEEE.

Wang, Z. (2015). The applications of deep learning on traf-

ﬁc identiﬁcation. BlackHat USA, 24(11):1–10.

Wei, Y., Jang-Jaccard, J., Xu, W., Sabrina, F., Camtepe,

S., and Boulic, M. (2022). Lstm-autoencoder based

anomaly detection for indoor air quality time series

data. arXiv preprint arXiv:2204.06701.

Yang, K., Kpotufe, S., and Feamster, N. (2020). Feature ex-

traction for novelty detection in network trafﬁc. arXiv

preprint arXiv:2006.16993.

Zhou, H., Wang, Y., Lei, X., and Liu, Y. (2017). A method

of improved cnn trafﬁc classiﬁcation. In 2017 13th in-

ternational conference on computational intelligence

and security (CIS), pages 177–181. IEEE.

ICEIS 2023 - 25th International Conference on Enterprise Information Systems

648