Towards Automated Characterization of Malware’s High-level

Mechanism using Virtual Machine Introspection

Shun Yonamine

, Youki Kadobayashi

, Daisuke Miyamoto

and Yuzo Taenaka

Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan

The University of Tokyo, 2-11-16 Yayoi, Bunkyo, Tokyo, 113-8658, Japan

Keywords:

Malware Characterization, Virtual Machine Introspection, Taint Analysis, Malware Analysis.

Abstract:

One of the goals of malware analysis is to ﬁgure out the intention of an attacker, namely high-level mecha-

nism. Since malicious activities are typically performed by combining multiple APIs, to identify the malicious

intention, it is needed to inspect the series of APIs to analyze its semantics. In traditional malware analysis,

this task generally relies on manual efforts of experts. There is no methodology for associating multiple APIs

and identifying the malicious intention in an automated manner. In this paper, we propose a virtual machine

introspection-based method for automatically identifying high-level mechanisms. We developed Spaniel, a

prototype system, which uses taint analysis to track malicious processing that derives from the data read from

a speciﬁed ﬁle and collects the traces of malicious activities. For evaluation, we used adversary behavior

models deﬁned in ATT&CK and Spaniel identiﬁed key indicators that cover 26% of those models.

1 INTRODUCTION

One of the goals of malware analysis is to under-

stand the intention of an attacker. Since malware per-

forms malicious activities following the intention of

an attacker, security analysts need to ﬁgure out mali-

cious activities, namely high-levelmechanism (Mitre,

2018; Lee et al., 2013). Malicious activities are con-

ducted through the series of low-level actions such

as system calls. And then, the middle-level behav-

ior can be retrieved from a series of low-level actions.

For instance, given a middle-level behavior that it ﬁrst

calls

read()

to access user’s password ﬁle and next

calls

send()

to send that data outside the network,

there is a possibility that it is intended for data theft.

Therefore, in malware analysis, it is important to un-

derstand the relationship of each system call that was

executed independently by malware.

In traditional malware analysis, security analysts

monitor APIs and/or system calls executed by mal-

ware. API monitoring is one of the common malware

analysis techniques (Egele et al., 2012). Although

API monitoring enables security analysts to collect

rich information about malware, it is extremely time-

consuming since it requires analysts many steps, such

as setting breakpoints, inspecting memory values,

etc. There have been analysis platforms that support

automated malware analysis, e.g., Cuckoo Sandbox

(Cuckoo, 2013) . However, the information shown by

those platforms are limited and they do not preserve

the details about the relationship of each API call.

In this paper, we propose a novel approach to as-

sociate multiple actions of malware and extract its be-

havioral characteristics. Our approach aims at identi-

fying the malicious mechanisms of malware by ana-

lyzing its runtime behavior and reconstructing its se-

mantics. To accomplish this, we leverage technolo-

gies of virtual machine introspection and taint anal-

ysis. This paper presents Spaniel, a prototype sys-

tem for automatically extracting relationships of each

low-level action executed by malware. Our approach

is based on the insight that most types of malicious

activities accompany manipulation to the ﬁle data.

Spaniel performs taint analysis against ﬁle data and

associates low-level actions with each other through

tainting.

In order to show the capability of our approach,

we conduct a series of intention analysis experiment

including the scenarios of Exﬁltration, Command and

Control (C2), and Encryption. Further, we investigate

the applicable coverage of our approach in the analy-

sis of malicious activities. We used the adversary’s

behavior model deﬁned in ATT&CK (Strom et al.,

2017; Mitre, 2018) matrix which consists of various

techniques commonly used by an attacker.

Yonamine, S., Kadobayashi, Y., Miyamoto, D. and Taenaka, Y.

Towards Automated Characterization of Malware’s High-level Mechanism using Virtual Machine Introspection.

DOI: 10.5220/0007405504710478

In Proceedings of the 5th International Conference on Information Systems Security and Privacy (ICISSP 2019), pages 471-478

ISBN: 978-989-758-359-9

471

2 RELATED WORK

The automated methodologies for analyzing the mali-

cious activity of malware is a widely studied topic.

However, the way of leveraging dynamic analysis

techniques to analyze malicious intentions has not

been widely studied yet. Existing methods are tai-

lored to solve a speciﬁc problem in malware anal-

ysis. In order to detect a characteristic of spyware,

the information ﬂow tracking method was taken (Yin

et al., 2007; Egele et al., 2007). Further, informa-

tion ﬂow tracking-based methods to analyze the cryp-

tographic function of malware were proposed (Wang

et al., 2009; Gr¨obert et al., 2011). Jacob et al. (Ja-

cob et al., 2011) proposed a program analysis based

method to identify C2 communication of bot. As for

the analysis of code injection malware, the methods

of process tracking were studied (Caillat et al., 2015;

Korczynski and Yin, 2017). Many of those previous

researches share the key insight that how a program

processes a message gives rich information about the

behavioral characteristic of malware. In this paper,

we apply this insight to build a method for identify-

ing the intention of an attacker.

Also, the techniques that can be leveraged for au-

tomated malware analysis is widely studied. The vir-

tual machine introspection (VMI) is a technology for

inspecting a system running on the virtual machine

from outside the hypervisor (Garﬁnkel and Rosen-

blum, 2003) and enables whole-system dynamic mal-

ware analysis. VMI is widely used for many secu-

rity solutions (Dolan-Gavitt et al., 2011) , e.g., intru-

sion detection, forensics, malware analysis. There are

several projects that feature VMI technology, such as

DRAKVUF (Lengyel et al., 2014), PANDA (Panda-

re, 2018; Dolan-Gavitt et al., 2015) , and DECAF

(Henderson et al., 2014). Furthermore, VMI plat-

forms that are based on QEMU is widely used for dy-

namic taint analysis (Schwartz et al., 2010) . Dynamic

taint analysis (taint analysis) leverages dynamic bi-

nary instrumentation (DBI) technology, and then it

enables data tracking in instruction-level. Taint anal-

ysis can be used in malware analysis for analyzing

speciﬁc functionalities of malware(Yin et al., 2007;

Wang et al., 2009).

To perform malware analysis effectively, there is

a study about whole-system dynamic binary analysis

approach. The whole-system dynamic binary analy-

sis is a technique used for analyzing malicious code

by using the virtual machine. Although develop-

ing whole-system dynamic binary analysis tool from

scratch is not straightforward, recent studies (Hender-

son et al., 2014; Dolan-Gavitt et al., 2015) have devel-

oped platforms to facilitate those whole-system dy-

namic binary analysis techniques. We also leveraged

those efforts to develop our proposed method.

3 SYSTEM DESIGN

3.1 Behavioral Analysis Method based

on File-monitoring and Tainting

The key feature in our approach is using the data ﬂow

to automatically extract every APIs that are associated

with each other. Our approach aims at retrieving the

proﬁle of middle-level behavior needed to reason the

intention of an attacker. To accomplish this, our ap-

proach uses API monitoring and taint analysis (taint-

ing) based on VMI.

First, before staring malware analysis, we have to

specify the watched ﬁle, a ﬁle which can be a data

source for tracking data ﬂow. The watched ﬁle is used

as the taint source for taint analysis. For instance,

when it detects the

read()

API to the watched ﬁle, it

launches taint analysis on the memory where the ﬁle

data is loaded. We track the whole of the ﬁle data at

the byte level. By tracking the propagation of taint,

it can be possible to extract the series of instructions

that relate to each other. We collect tainted instruc-

tions, the code of instructions that processed tainted

data. From tainted instructions, also API calls that

handled tainted data can be retrieved.

Further, it makes possible to detect the presence of

an attacker’s intention by using the traces of taint as

the indicator. The traces of taint can be retrieved by

taint check, checking if the memory or register han-

dled by a tainted instruction is tainted or not. For in-

stance, the traces of taint could indicate the data exﬁl-

tration if

send()

API handle tainted data on its buffer.

Moreover, our approach provides a visualization of

an analysis result that shows the relationship between

each low-level action. The visualization is designed

to aid security analysts to estimate the malicious in-

tention of an attacker.

3.2 Implementation

We developed Spaniel, a prototype of our proposed

system. Spaniel is a plugin for PANDA (Panda-re,

2018; Dolan-Gavitt et al., 2015). PANDA is a whole-

system dynamic binary analysis platform that sup-

ports record-and-replay based analysis. Record-and-

replay can decouple the analysis from the execution

and thus suited for taint analysis that is too expensive

to be applied at runtime.(Chow et al., 2008; Stam-

atogiannakis et al., 2015). As for taint analysis, in

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

472

Recording Phase

VictimAttacker

attack

PANDA QEMU

Victim

(Snapshot)

Analysis Phase using Spaniel

Saving details about

data ﬂow and malicious

indicator

Intention

mining...

collecting information

- Taint check

- Tainted instructions code

Visualization Phase

/etc/passwdTainted Buffer

cat

cmd:cat

cmd:meterbind2.elf ld-2.13.so libc-2.13.so

[fd]1

[fd]4 [tcp]192.168.124.131:44261

sys_read

sys_write

sys_sendto

cmd:sh

Graph visualization

of malicious mechanism

Graphviz

Spaniel

dot

script

dot

script

Replaying...

Execution

Trace

Log

Execution

Trace

Log

Malicious

Activity

PANDA QEMU

Figure 1: System overview.

our case, we leverage the implementation of PANDA

(Whelan et al., 2013) to monitor the propagation of

taint at the byte level. PANDA leverages QEMU to in-

strument program execution per emulated code. Thus,

we can monitor and instrument every execution on the

system per emulated code to track the data ﬂow. Ad-

ditionally, in our use of PANDA, we enabled the taint

propagation through pointer dereference in order to

capture the data ﬂow on the malicious activity in de-

tail.

3.3 The Procedure of Malware Analysis

in our Approach

Spaniel performs analysis of malicious activities us-

ing record-and-replay. In addition, Spaniel has three

phases in analyzing malware as shown in Figure 1:

the recording phase for obtaining execution trace of

malware, the analysis phase for collecting informa-

tion used to identify the presence of malicious activ-

ity, and the visualization phase for making a visual-

ized output of analysis result.

In the recording phase, we execute a malware in

a sandboxed environment and record its execution to

obtain execution trace log. Execution trace log must

be captured for malware analysis using Spaniel. Dur-

ing malware performs its malicious activity, PANDA

records execution trace. Our assumption is that mal-

ware thoroughly performs its malicious activity dur-

ing the recording phase.

In the analysis phase, we use Spaniel to ana-

lyze adversary scenario from execution trace that we

recorded. While observing replayed malicious activ-

ity, Spaniel performs API hooking to monitor the read

access to the watched ﬁle that we speciﬁed. If our

watched ﬁle is accessed by ﬁle read operation, Spaniel

performs tainting on read-buffer and starts taint anal-

ysis. Spaniel then collects the tainted instruction code

which handled the tainted data and caused taint prop-

agation. Further, Spaniel uses VMI to obtain names

of shared libraries that were referenced from tainted

instruction codes. Spaniel also hooks output-related

API (e.g.,

write()

send()

) to conduct taint check

on write-buffer. If the buffer is tainted, under the

policies we deﬁned, we consider it as an indicator of

data theft or data tampering. In this phase, we col-

lect information for identifying the type of malicious

attempt.

In the visualization phase, Spaniel generates a

graph that represents the series of malicious actions

that are associated with our watched ﬁle data. Based

on the analysis result, Spaniel produces a dot script

used by Graphviz (Graphviz, 2018) to output a graph.

The visualization phase is designed to help intuitively

understand the analysis results, for a case of reason-

ing about malicious activities that are not yet deﬁned

under our detection policies.

As a result of three phases, Spaniel obtains the in-

dicator of adversary intent from recorded malware’s

activities. We use this as the indicator of compromise

(IOC) as well as the evidence that identiﬁes the type

of high-level mechanism.

4 EXPERIMENT

In this section, we test Spaniel on its capability of

identifying the attacker’s intention. We demonstrate

that Spaniel can analyze footprints of malicious ac-

tions and associate them to identify malicious charac-

teristics. We set up experiments of three case studies

namely, Exﬁltration, Encryption, Command and Con-

trol (C2) all selected from ATT&CK (Mitre, 2018).

Since Spaniel performs malware analysis through re-

playing the execution trace log, we need to run mal-

ware samples on virtual machine and record mali-

cious activities in advance. Malicious activities are

recorded using the record-and-replay functionality of

Towards Automated Characterization of Malware’s High-level Mechanism using Virtual Machine Introspection

473

PANDA and saved into execution trace log as shown

in Figure 1.

In order to set up those experiments, we used

Linux as a victim’s platform and prepared malware

samples that are appropriate for each experiment. We

used meterpreter (Security, 2018) to simulate cases of

exﬁltration and C2. In addition, we used OpenSSL

(Foundation, 2018) as a sample for simulating en-

cryption.

We set up experiments as follows. In the cases

of exﬁltration and C2, we simulate adversary be-

havior accessing victim machine via meterpreter. In

these scenarios, the attacker attempts to steal a cre-

dentials ﬁle (

/etc/passwd

) using

cat

command via

meterpreter and sends it outside the network. In

the case of encryption, Spaniel analyzes the execu-

tion trace log that records encryption processing per-

formed by OpenSSL that we imitate as ransomware.

In this scenario, ransomware targets simple text ﬁle

(

cryptme.txt

). In all of our experiments, we employ

a strategy of monitoring the system calls that access

our watched ﬁle (e.g.

/etc/passwd

cryptme.txt

Spaniel implements this strategy to start taint analysis

when it detects the read-related APIs to our watched

ﬁle.

4.1 Detecting IOC of “Exﬁltration Over

C2 Channel”

We demonstrate that Spaniel analyzes “Exﬁltration

Over C2 Channel” attack model having both aspects

of exﬁltration and C2. We created a scenario where

the attacker tries to steal credentials ﬁle with

cat

command via meterpreter. Spaniel then analyzes the

execution trace log which recorded the attack we sim-

ulated. The goal of this experiment is to detect indi-

cators that identify each attack of exﬁltration and C2.

4.1.1 The Case of Exﬁltration

In order to ﬁnd the essence of the high-level mech-

anism of ﬁle exﬁltration activity, before network

transfer, we have to detect the send-buffer, which

holds data tainted and associated with our watched

ﬁle. Spaniel applies taint analysis to data of our

watched ﬁle and tracks every instructions that han-

dle tainted data through taint propagation. At the mo-

ment when network transfer happens, Spaniel con-

ducts taint check to data held in the send-buffer. If

the send-buffer is tainted, we regard it indicates a sign

of occurrence the data exﬁltration because we think it

is anomalous that the taint tag associated with creden-

tials data is propagated to the send-buffer.

In many cases, the modern malware performs

Figure 2: Detecting indicator of “File Exﬁltration”.

obfuscation or tampering against target data before

sending them to the attacker’s machine. Since the

tampered data, which is encrypted or compressed,

shows no signs of original data on its surface, then

it becomes harder to associate the data kept in the

send-buffer with stolen data. Therefore, we use taint

checking to detect a sign of network transfer that tar-

gets our watched data, as shown in Figure 2. Through

taint propagation, our watched data leaves its vestiges

throughout memory areas or registers. Spaniel per-

forms taint checking to check if buffers handled by

APIs are tainted or not. If tainted, we consider that

fact as an indicator of ﬁle exﬁltration.

We veriﬁed our hypothesis in the analysis of me-

terpreter as follows. When

read()

API is called and

it reads our watched ﬁle (

/etc/passwd

), then Spaniel

applies taint analysis to read-buffer. In the subsequent

processing after taint analysis is enabled, we observed

send()

API call holding tainted buffer on its argu-

ment. Consequently, we consider those tainted buffer

as an indicator of ﬁle exﬁltration.

4.1.2 The Case of Command and Control

We analyze meterpreter for the purpose of detecting

the presence of command and control. Spaniel ap-

plies taint analysis to

/etc/passwd

in the same way

we performed in Section 4.1.1 and collects tainted in-

structions. In dealing with a malware that implements

C2, we assume that an attacker is likely to utilize sys-

tem utilities, e.g., command and shared libraries, dur-

ing malicious activities in C2 channel. If the attacker

follows our assumption, tainted instructions are likely

to contain much information about the traces of the

attacker.

Spaniel instruments the execution to examine

tainted instructions during the analysis. While col-

lecting tainted instructions as shown in Figure 2,

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

474

/etc/passwdTainted Buffer

cat

cmd:cat

cmd:meterbind2.elf ld-2.13.so libc-2.13.so

[fd]1

[fd]4 [tcp]192.168.124.131:44261

sys_read

sys_write

sys_sendto

cmd:sh

Figure 3: Visualization of “Exﬁltration Over C2 Channel”.

Spaniel leverages VMI to obtain a list of processes as

well as shared libraries loaded for each process. We

found out that some tainted instructions are codes that

are used by

cat

command. Thus, we can conﬁrm the

presence of a malicious activity via the C2 channel.

Also, we found out that it is possible to associate

exﬁltration and C2 activities. We used the name of

system utility as an indicator of the malicious attempt

through C2. Some malware use system utilities to

make it difﬁcult to distinguish malicious operations

and legitimate ones. However, we think examining

tainted instructions helps analysts to deal with this

problem.

4.1.3 Visualization of Execution Trace of

“Meterpreter”

We visualized our experiment result as a graph (Fig-

ure 3). From a dot script that Spaniel generated

from anaysis result, we can obtain a graph by us-

ing Graphviz. This graph shows a series of ac-

tions that were performed by malware to accomplish

its malicious attempt of data theft. Each node rep-

resents a system component, e.g., process, system

call, that we observed during record-and-replay based

analysis with Spaniel. Edges between nodes repre-

sent caller/callee relationships. The “Tainted Buffer”

node represents the buffer that holds the data of our

watched ﬁle (

/etc/passwd

). Every node is involved

in handling data that are tainted. We hope this graph

helps analysts understand the mechanism of exﬁltra-

tion and C2 more intuitively and deal with threats.

4.2 Detecting IOC of “Encryption”

4.2.1 The Case Encryption of OpenSSL

In this section, we demonstrate the way of detecting

the evidence of the use of encryption from a malicious

activity. In order to solve this task, we have to conﬁrm

/home/john/tmp/cryptme.txt

/home/john/tmp/encrypted.txt

Tainted Buffer

cmd:openssl ld-2.13.so libc-2.13.so libcrypto.so.1.0.0 openssl

cmd:bash

sys_read

sys_write

Figure 4: Visualization of “Encryption”.

that the malware uses crypto-related API through the

series of malicious actions. Thus we examine tainted

instructions to ﬁnd the traces of using crypto-related

API. Since tainted instructions are instruction codes

which derive from tainting our watched ﬁle data, we

assume that tainted instruction codes give us more de-

tails about malicious processing.

We conducted our experiment as follows. First,

we simulated encryption processing and obtained ex-

ecution trace log. In this scenario, openssl command

performs AES encryption against a simple text ﬁle,

namely

cryptme.txt

. We next use Spaniel to ana-

lyze the encryption scenario that we simulated. Fur-

ther, we try to collect the evidence of encryption from

a signiﬁcant amount of instruction codes.

The openssl command performs encryption pro-

cessing to our watched ﬁle,

cryptme.txt

. In the

similar way that we analyzed meterpreter in Sec-

tion 4.1, Spaniel monitors the ﬁle operations and

starts taint analysis when our watched ﬁle is accessed,

and then collects tainted instructions. While col-

lecting tainted instruction codes, Spaniel leverages

VMI to obtain details of tainted instructions, such

as addresses and names of shared libraries, like we

have conducted to ﬁnd a footprint of

cat

command

in Section 4.1.2. From the experiment result, we

conﬁrmed that

libcrypto.so

was included in a list

of shared libraries we obtained by using VMI. The

libcrypto.so

is a shared library used for encryp-

tion processing, and then we conﬁrmed the possibility

that our watched ﬁle data was processed with crypto-

related APIs. From the experiment result, we con-

ﬁrmed that examing tainted instructions is efﬁcient to

ﬁnd the traces of using the crypto-related libraries and

identify the malicious mechanism of encryption.

4.2.2 Visualization of Execution Trace of

“Openssl”

In the same way that we visualized exﬁltration and

C2, we obtained a graph (Figure 4) that represents

ﬁle encryption activity. This graph shows the rela-

tionship between each node of handling tainted data

Towards Automated Characterization of Malware’s High-level Mechanism using Virtual Machine Introspection

475

and follows the same representation rule that we used

in Section 4.1. This graph shows

read()

API call,

which handles our watched ﬁle

cryptme.txt

. The

“Tainted Buffer” node represents data of our watched

ﬁle. Further, the

libcrypto.so

node indicates that

our ﬁle data is encrypted. From the name of crypto-

related API we obtained, we can identify the mali-

cious attempts of encryption.

5 EVALUATION

In order to conﬁrm the effectiveness of Spaniel, we

investigated its capability of ﬁnding IOCs that we ex-

pect to ﬁnd from various kinds of malicious activities

listed in ATT&CK. ATT&CK matrix includes 106

models of malicious activities. First, we explain our

IOC for identifying the kinds of malicious activities

as follows.

• If instruction codes executed by malware handle

tainted data.

• If tainted data is stored on the argument when

output-related API is called.

• If it is possible to detect what kind of activity

is performed by malware from the name of the

shared library mapped from instruction code. e.g.,

We identiﬁed the occurrence of encryption by de-

tecting API of

libcrypto.so

in an experiment.

Next, we conducted an investigation and the re-

sults can be seen in Table 1. This result shows that

Spaniel is capable of ﬁnding the IOC that we de-

scribed above from 26 models out of a total of 106

models. This result indicates that Spaniel is poten-

tially able to extract the middle-level behavior of the

malicious activity on every stage of the cyber kill

chain, (e.g., Control, Execute, Maintain). Those mod-

els Spaniel could detect have a common behavioral

characteristic; the ﬁle input and the output to exter-

nal resources such as a ﬁle or socket, can be easily

related by taint propagation and taint checking. We

demonstrated that Spaniel could detect models where

explicitly data ﬂow by data alteration (e.g., encryp-

tion) occurs between the ﬁle input and external out-

put. We want to state that, in the security incident

caused by malware, the analysis method based on

tracking the ﬁle data accessed by malware is a rea-

sonable approach.

We examined “Data source” item from each tech-

nique of ATT&CK (Mitre, 2018). “Data source” con-

tains information that can be used to detect and ana-

lyze each attack model. Since Spaniel performs taint

analysis against malware’s ﬁle operations, we enu-

merated models that have “File monitoring” in their

data sources, namely ﬁle-monitoring type. There are

56 models of ﬁle-monitoring type. We conﬁrmed that

there are 18 attack models out of 56. However, that is

fewer than the 26 models shown in Table 1. We note

several reasons as follows.

• “Account discovery” model of discovery tactic is

not deﬁned as a ﬁle-monitoring type. Regarding

“Account discovery”, ATT&CK considers only

process activities, e.g., id command and groups

command, in data sources. However, we con-

ﬁrmed it empirically through the experiment of

exﬁltration targeted at

/etc/passwd

• “Exﬁltration over command and control channel”

model of exﬁltration tactic is not deﬁned as the

ﬁle-monitoring type. Regarding this technique,

ATT&CK considers data sources only about pro-

cess activities and network activities.

• Several models of C2 tactic are not deﬁned as the

ﬁle-monitoring type. Regarding these techniques,

ATT&CK considers their data sources as network

activities such as packet monitoring.

In the descriptions of ATT&CK, to analyze a

malware’s network activity, packet monitoring is re-

garded as an appropriate method. This also means

that network trafﬁc data is generally considered as an

appropriate data source for malware analysis.

We expect that, since the data source is limited

to only a regular ﬁle, the analysis enabled coverage

of Spaniel is also limited. From this, we want to

point out that there is a gap between the traditional

malware analysis method and the data ﬂow tracking-

based method. We suppose that considering the kind

of data source, e.g., ﬁle or network, is important since

the kind of data source which malware tries to access

varies depending on its purpose.

Although the result was lower than half the total

number, we are not pessimistic about this result. We

evaluated through a series of experiments (e.g., exﬁl-

tration, encryption, C2), the policy we used for eval-

uation might not be severe. For instance, we did not

consider the case of “Automated Collection” model

into the results even though we accomplished an ex-

periment of credentials ﬁle exﬁltration in Section 4.1

since “Automated Collection” belongs to collection

tactics, not exﬁltration tactics. On the malware anal-

ysis that uses the data ﬂow tracking, it is needed to

design appropriate policies to determine if the behav-

ior tracked through data ﬂow is malicious or not. De-

pending on detection policies, we possibly could im-

prove detection rate against ATT&CK models.

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

476

Table 1: A list of attack models in the ATT&CK Matrix that Spaniel can detect.

Tactic Technique Detected attack models as ⊚, similar models as ◦

Persistence .bash proﬁle and .bashrc ⊚

Hidden Files and Directories ⊚

Rc.common ◦

Privilege Escalation Setuid and Setgid ⊚

Defense Evasion Clear Command History ◦

File Deletion ⊚

Hidden Files and Directories ⊚

Scripting ◦

Credential Access Bash History ◦

Credentials in Files ⊚

Discovery Account Discovery ◦

Lateral Movement Remote File Copy ⊚

Execution Command-Line Interface ⊚

Scripting ◦

Collection Data Staged ◦

Data from Local System ◦

Exﬁltration Data Compressed ◦

Data Encrypted ⊚

Exﬁltration Over Command and Control Channel ⊚

Command and Control Commonly Used Port ◦

Custom Command and Control Protocol ◦

Custom Cryptographic Protocol ⊚

Data Encoding ⊚

Data Obfuscation ⊚

Remote File Copy ⊚

Standard Cryptographic Protocol ◦

6 DISCUSSION

In this section, we discuss the limitations in Spaniel

and suggestions for future work. First, Spaniel is de-

signed to handle malware that does not use techniques

to thwart analysis using a virtual machine. If mal-

ware detects the presence of virtual machine and then

stops or changes its behavior, recording phase (Fig-

ure 1 in Section 3) may not work effectively. Since

Spaniel relies on record-and-replay using QEMU, we

need countermeasures on each anti-analysis technique

against QEMU. It still leaves a technical challenge

to deal with real-world malware that uses those anti-

analysis techniques

Next, the predeﬁned adversary behavior models

of ATT&CK we detected are explicit data processing

and their maliciousness could only be identiﬁed from

the relationship between ﬁle input and external out-

put, e.g., ﬁle or network, through tainting. However,

also taint analysis has the weakness, e.g., overtaint-

ing or undertainting (Schwartz et al., 2010; Slowin-

ska and Bos, 2009) . We have to consider possibilities

where malware generates intentionally indirect or im-

plicit data ﬂow to bypass taint analysis.

Next, Spaniel detects encryption activities of the

malware based on the presence of crypto-related API

calling. Therefore, Spaniel can be evaded if mal-

ware performs encryption with its own method. Our

approach is a sort of signature-based detection. We

treated the crypto-related APIs as an explicit indi-

cator for detection. To deal with encryption activ-

ity that does not match with signatures, it is needed

to adopt the heuristic-based approaches. However,

also heuristic-based (Wang et al., 2009; Gr¨obert et al.,

2011) approachesrely on assumptions that are statisti-

cal and/or empirical. If malware does not follow those

assumptions, also heuristic-based approaches can be

bypassed. We leave this to future work.

Finally, although we expect that Spaniel can help

security analysts in the complicated tasks of malware

analysis, we have not tested Spaniel from a perspec-

tive of performance improvement of analysts. To

evaluate efﬁciency from a view of security analysts,

we might need to conduct user tests. For example,

Yakdan et al. (Yakdan et al., 2016) have conducted a

user study for evaluating the usability of decompiler

they designed to help reverse engineers.

7 CONCLUSION

Understanding the attacker’s intention is one of the

challenges in malware analysis. From the perspective

of automated malware analysis, there is no method to

reason about the intention of an attacker. In this paper,

we proposed a novel approach to pinpoint the kind of

malicious mechanism. Spaniel, a prototype we devel-

oped, examines instruction codes that relate to the ﬁle

operations by using taint analysis. In order to conﬁrm

our hypothesis, we tested Spaniel with several attack

models, exﬁltration, encryption, C2. We conﬁrmed

that Spaniel is capable of detecting IOC and identify-

ing the type of high-levelmechanism. Through the se-

ries of experiments, we used minimal-installed Linux

as victim’s system and time cost for analysis was less

than 5 minutes. We hope this type of characteriza-

tion method would give more insights in this ﬁeld of

malware analysis.

Towards Automated Characterization of Malware’s High-level Mechanism using Virtual Machine Introspection

477

REFERENCES

Caillat, B., Gilbert, B., Kemmerer, R., Kruegel, C., and Vi-

gna, G. (2015). Prison: Tracking process interactions

to contain malware. In High Performance Computing

and Communications (HPCC), 2015 IEEE 7th Inter-

national Symposium on Cyberspace Safety and Secu-

rity (CSS), 2015 IEEE 12th International Conferen on

Embedded Software and Systems (ICESS), 2015 IEEE

17th International Conference on, pages 1282–1291.

IEEE.

Chow, J., Garﬁnkel, T., and Chen, P. M. (2008). Decou-

pling dynamic program analysis from execution in vir-

tual environments. In USENIX 2008 Annual Technical

Conference on Annual Technical Conference, pages

1–14.

Cuckoo (2013). Automated Malware Analysis. https:

//www.cuckoosandbox.org/.

Dolan-Gavitt, B., Hodosh, J., Hulin, P., Leek, T., and Whe-

lan, R. (2015). Repeatable reverse engineering with

panda. In Proceedings of the 5th Program Protection

and Reverse Engineering Workshop, page 4. ACM.

Dolan-Gavitt, B., Leek, T., Zhivich, M., Gifﬁn, J. T., and

Lee, W. (2011). Virtuoso: Narrowing the semantic

gap in virtual machine introspection. In IEEE Sympo-

sium on Security and Privacy, pages 297–312. IEEE

Computer Society.

Egele, M., Kruegel, C., Kirda, E., Yin, H., and Song, D.

(2007). Dynamic spyware analysis.

Egele, M., Scholte, T., Kirda, E., and Kruegel, C. (2012). A

survey on automated dynamic malware-analysis tech-

niques and tools. ACM computing surveys (CSUR),

44(2):6.

Foundation, O. S. (2018). OpenSSL Cryptography and

SSL/TLS Toolkit. https://www.openssl.org/.

Garﬁnkel, T. and Rosenblum, M. (2003). A virtual machine

introspection based architecture for intrusion detec-

tion. In Proc. Network and Distributed Systems Se-

curity Symposium.

Graphviz (2018). “Graphviz - Graph Visualization Soft-

ware”. https://www.graphviz.org/.

Gr¨obert, F., Willems, C., and Holz, T. (2011). Automated

identiﬁcation of cryptographic primitives in binary

programs. In International Workshop on Recent Ad-

vances in Intrusion Detection, pages 41–60. Springer.

Henderson, A., Prakash, A., Yan, L. K., Hu, X., Wang, X.,

Zhou, R., and Yin, H. (2014). Make it work, make it

right, make it fast: building a platform-neutral whole-

system dynamic binary analysis platform. In Proceed-

ings of the 2014 International Symposium on Software

Testing and Analysis, pages 248–258. ACM.

Jacob, G., Hund, R., Kruegel, C., and Holz, T. (2011). Jack-

straws: Picking command and control connections

from bot trafﬁc. In USENIX Security Symposium, vol-

ume 2011. San Francisco, CA, USA.

Korczynski, D. and Yin, H. (2017). Capturing malware

propagations with code injections and code-reuse at-

tacks. In Proceedings of the 2017 ACM SIGSAC Con-

ference on Computer and Communications Security,

CCS ’17, pages 1691–1708, New York, NY, USA.

ACM.

Lee, A., Varadharajan, V., and Tupakula, U. (2013).

On malware characterization and attack classiﬁca-

tion. In Proceedings of the First Australasian Web

Conference-Volume 144, pages 43–47. Australian

Computer Society, Inc.

Lengyel, T. K., Maresca, S., Payne, B. D., Webster, G. D.,

Vogl, S., and Kiayias, A. (2014). Scalability, ﬁdelity

and stealth in the drakvuf dynamic malware analysis

system. In Proceedings of the 30th Annual Computer

Security Applications Conference.

Mitre (2018). “ATT&CK Linux Technique Matrix”. https:

//attack.mitre.org/wiki/Linux

Technique Matrix (ac-

cessed 2018-02-13).

Mitre (2018). “MAEC Core Speciﬁcation, Ver-

sion 5.0”. http://maecproject.github.io/releases/5.0/

MAEC

Core Speciﬁcation.pdf.

Panda-re (2018). “Platform for Architecture-Neutral Dy-

namic Analysis”. https://github.com/panda-re/panda.

Schwartz, E. J., Avgerinos, T., and Brumley, D. (2010). All

you ever wanted to know about dynamic taint anal-

ysis and forward symbolic execution (but might have

been afraid to ask). In Security and privacy (SP), 2010

IEEE symposium on, pages 317–331. IEEE.

Security, O. (2018). About the Metasploit Me-

terpreter. https://www.offensive-security.com/

metasploit-unleashed/about-meterpreter/.

Slowinska, A. and Bos, H. (2009). Pointless tainting?: eval-

uating the practicality of pointer tainting. In Proceed-

ings of the 4th ACM European conference on Com-

puter systems, pages 61–74. ACM.

Stamatogiannakis, M., Groth, P., Bos, H., et al. (2015). De-

coupling provenance capture and analysis from execu-

tion. In Proceedings of the 7th USENIX Workshop on

the Theory and Practice on Provenance (TaPP). Ed-

inburgh, Scotland.

Strom, B. E., Battaglia, J. A., Kemmerer, M. S., Kuper-

sanin, W., Miller, D. P., Wampler, C., Whitley, S. M.,

and Wolf, R. D. (2017). Finding cyber threats with

att&ck-based analytics.

Wang, Z., Jiang, X., Cui, W., Wang, X., and Grace, M.

(2009). Reformat: Automatic reverse engineering of

encrypted messages. In ESORICS, volume 9, pages

200–215. Springer.

Whelan, R., Leek, T., and Kaeli, D. (2013). Architecture-

independent dynamic information ﬂow tracking. In

International Conference on Compiler Construction,

pages 144–163. Springer.

Yakdan, K., Dechand, S., Gerhards-Padilla, E., and Smith,

M. (2016). Helping johnny to analyze malware: A

usability-optimized decompiler and malware analysis

user study. In Security and Privacy (SP), 2016 IEEE

Symposium on, pages 158–177. IEEE.

Yin, H., Song, D., Egele, M., Kruegel, C., and Kirda, E.

(2007). Panorama: capturing system-wide informa-

tion ﬂow for malware detection and analysis. In Pro-

ceedings of the 14th ACM conference on Computer

and communications security, pages 116–127. ACM.

ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy

478