Classifying Malicious Thread Behavior in PaaS Web Services

Cemile Diler

Ozdemir

1,2

, Mehmet Tahir Sandıkkaya

and Yusuf Yaslan

Corporate Technology, Development Center, Siemens AS, Istanbul, Turkey

Computer Engineering Department, Istanbul Technical University, Istanbul, Turkey

Keywords:

Cloud Security, PaaS, Malicious Behavior, Machine Learning.

Abstract:

Multitenant structure of PaaS cloud delivery model allows customers to share the platform resources in the

cloud. However, this structure requires a strong security mechanism that isolates customer applications to pre-

vent interference between different applications. In this paper, a malicious thread behavior detection frame-

work using machine learning algorithms is proposed to classify whether user requests are malicious. The

framework uses thread metrics of worker threads and N-Gram frequencies of operations as its features. Test

results are evaluated on a real-life scenario using Random Forest, Adaboost and Bagging ensemble learning

algorithms and evaluated using different accuracy metrics. It is found that the malicious request detection

accuracy of the proposed system is 87.6%.

1 INTRODUCTION

Cloud computing is a popular concept for compa-

nies to be enabled on-demand network access for out-

sourcing their resources as infrastructure, platform or

software with the minimum effort. Cisco reports that

cloud data center workloads will tripled from 2015 to

2020 (Networking, 2017). Considering this growth,

cloud computing vendors focus on security research

to adapt rapid development of this technology (Baner-

jee et al., 2013).

Most of the PaaS providers offer web application

platforms to their customers because PaaS develop-

ment is web oriented. This strategy is beneﬁcial both

for the PaaS providers and for the PaaS customers.

Since scripting languages (Ruby and Python) and vir-

tualized platforms (Java and .NET) are commonly

used in web development in recent years, providers

build their servers on these popular technologies. In

addition, cloud customers quickly customize their

existing web applications and deliver them to the

providers to be served in the cloud. Thus deploying

many customers’ applications turns into an easy and

cheap process for PaaS providers. Thereby, the com-

mon beneﬁt is rapid adoption to the cloud.

These advantages, however, come with a major

ﬂaw. Different cloud customers share PaaS platform

resources (hardware, software, services, conﬁgura-

tion, etc.) and it requires isolation between customer

applications to prevent interference between differ-

ent applications. This interference can occur uncon-

sciously or maliciously. For instance a faulty applica-

tion can consume most of memory or CPU on the pro-

vided platform for many customers. Other customers

are inﬂuenced; even there is no conscious attack to

platform. In addition it is possible that maliciously

acting customers can execute code to attack other cus-

tomers or platform. Availability, conﬁdentiality and

integrity of PaaS are threaten for these reasons (Modi

et al., 2013). PaaS providers need a strong security

mechanism to protect and isolate their customer ap-

plications and the platform.

Currently, PaaS customers are limited to web ap-

plications due to leading PaaS providers Google

Heroku

and Amazon

. The providers may limit cus-

tomer applications’ access to trivial resources such as

ﬁles or sockets via carefully set up permissions. How-

ever, memory and CPU are shared among multiple

threads in web applications as well as per-request user

behavior cannot be traced (Sandıkkaya et al., 2014).

Several systems have already been proposed for

cloud security, such as host based intrusion detection

systems (Arshad et al., 2012), network based intru-

sion detection systems (Hamad and Al-Hoby, 2012),

distributed intrusion detection systems (Sanjay Ram,

2012) and hypervisor based intrusion detection sys-

tems (Garﬁnkel et al., 2003). However, these men-

https://cloud.google.com/appengine/

https://www.heroku.com/

https://aws.amazon.com/elasticbeanstalk/

418

Özdemir, C., Sandıkkaya, M. and Yaslan, Y.

Classifying Malicious Thread Behavior in PaaS Web Services.

DOI: 10.5220/0006688204180425

In Proceedings of the 8th International Conference on Cloud Computing and Services Science (CLOSER 2018), pages 418-425

ISBN: 978-989-758-295-0

tioned systems were designed to run on operation sys-

tem, system virtual machine or hypervisor level. They

do not consider isolation of several different customer

applications hosted in the same process virtual ma-

chine. In addition, only application and data layers

are manageable by customer in PaaS service model.

Base layers are managed by cloud provider. Fig-

ure 1 presents PaaS service model and its layers. The

customers cannot manage the underlying cloud in-

frastructure including network, operating systems or

servers in PaaS model.

Hardware and software levels isolation mecha-

nisms have also been utilized in PaaS. Software

level mechanisms isolate threads, processes or vir-

tual machines of different tenants (Bazm et al., 2017).

Heroku uses container-based isolation which groups

operating system processes by kernel name-spaces

and resource allocations to isolate from other groups.

Docker

is one of the most popular open-source con-

tainer platform provider which has been adopted by

many PaaS providers. In addition, Cloud Foundry

isolates its tenants using user-based isolation mecha-

nism. It is a traditional and widely used technique that

each application runs as a different user on the oper-

ating system. However, sharing the same process vir-

tual machine environment by multiple tenants needs

runtime-based isolation mechanisms (Zhang et al.,

2014).

In this paper a runtime-based security framework

is proposed for multitenant PaaS providers to detect

malicious behavior of threads using machine learning.

The main contributions of this paper are summarized

as follows:

• Thread behavior detection framework : Thread

behavior is detected with the proposed frame-

work and this framework could be integrated into

cloud customer web application with the mini-

mum effort. The framework is designed for PaaS

providers which have many customer web appli-

cations in the same application server. In this

deployment scenario, customers’ web applica-

tions reside on the same operating system process.

Within this process, worker threads serve web ap-

plications. Proposed framework measures, classi-

ﬁes and detects malicious behavior using worker

threads’ resource usage metrics.

• Well selected metrics: All necessary metrics are

measurable on web application level. Measure-

ments of the features are independent from op-

eration system, programming language or cloud

provider. Therefore, features can be collected

https://www.docker.com/

https://www.cloudfoundry.org/

Figure 1: PaaS deployment model: The physical com-

puter resides a hypervisor to monitor the operating systems

through system virtual machines. This level of abstraction

is mostly known as IaaS. On top of operating systems, many

process virtual machine instances could be run. Each of

these process virtual machines (E.g. JVM) may isolate an

application, but not the threads within the same application.

This level of abstraction is mostly known as PaaS. The pro-

cess virtual machine may be conﬁgured as a web application

server and the threads may belong to different web applica-

tions.

without any dependency. Also features are pri-

vacy friendly, so that they do not contain any sen-

sitive data of the cloud providers or the cloud cus-

tomers.

The main difference of the proposed framework

from intrusion detection systems is, it does not mon-

itor the network activity or any feature from the net-

work connection; but directly focuses on the running

thread. On the other hand; the framework’s main dif-

ference from a virus (malware) scanner is, it detects

malicious threads rather than ﬁles. Moreover, this de-

tection is done throughout considering access to the

critical OS resources without inspecting the whole

code sequence of the thread.

The rest of this paper is organized as follows: Sec-

Classifying Malicious Thread Behavior in PaaS Web Services

419

tion 2 gives the background of the subject. In Sec-

tion 3, the proposed security mechanism is described.

Section 4 presents how the experimental setup is orga-

nized. Section 5 describes feature set and classiﬁca-

tion algorithms applied to the data. The results of the

experiments are presented in Section 6. Finally, sec-

tion 7 concludes the paper and presents a discussion

on the results.

2 RELATED WORK

In the literature there is a vast amount of research

on malicious behavior detection systems that use ma-

chine learning algorithms (Fan et al., 2016), (Modi

et al., 2013). Most of the existing works deal with

intrusion detection systems and they try to classify

the malicious communications to avoid the external

attacks on perimeterized computer networks of an or-

ganization. These models generally extract the fea-

ture vector from data, packets user input command

sequences, log ﬁles, low-level system information and

CPU/memory usage (Wu and Banzhaf, 2010). How-

ever, these intrusion detection systems differ from the

proposed mechanism as they always assume a trusted

perimeter to be protected. As a result, they hardly

detect malicious activity that originates from the in-

siders. Note that, in PaaS deployments, the providers

are willingly to accept the customers as tenants. Then,

malicious activities may originate from these tenants

or tenants’ users. Therefore, an intrusion detection

system cannot detect malicious user behavior after the

request is once accepted to access internal resources.

In practice, an intrusion detection system may ex-

ist independent of the proposed mechanism and con-

trolled by the PaaS provider to protect the whole PaaS

deployment against third party attackers; e.g. denial-

of-service attackers.

Application program interface (API) calls and ma-

chine instructions are mostly used features of intel-

ligent malware detection systems (Bazrafshan et al.,

2013). Pirscoveanu et al. (Pirscoveanu et al., 2015)

used sequence, frequency and count of the windows

platform system calls as main features to classify with

Random Forest algorithm. Several malwares can be

dynamically classiﬁed in parallel using this approach.

Fan et al. (Fan et al., 2016) also used API calls as fea-

tures of their Malicious Sequential Pattern Malware

Detection (MSPMD) framework. They applied mod-

iﬁed Generalized Sequential Pattern algorithm for se-

quence mining with All-Nearest-Neighbor classiﬁer

to Windows Portable Executable (PE) samples. Up-

pal et al. (Uppal et al., 2014) applied N-Gram algo-

rithm to extract features from API sequences. Shabtai

et al. (Shabtai et al., 2012) utilized machine instruc-

tion data and extracted features with N-Gram pattern

from opcodes. Several classiﬁcation algorithms are

applied to extracted feature vector to detect unknown

malicious codes.

N-Gram algorithm is also used in web prediction

models. Su et al. (Su et al., 2000) utilized N-Gram

model to predict future request of users. This prob-

abilistic prediction model aims to make best guesses

on the users’ next actions based on previous actions.

Although they do not focus on user’s malicious be-

havior, it is foreseen that N-Gram prediction model is

adaptable to predict malicious behavior of users. In

addition, N-Gram is applied to API calls and machine

instructions data in previous security studies (Uppal

et al., 2014), (Shabtai et al., 2012). However, they can

provide only system level cloud detection and pre-

vention mechanisms. In addition, collection of the

mentioned data is a time and resource consuming pro-

cess. Moreover, one may argue that, privacy of PaaS

user is not considered during data collection. Pro-

posed mechanism observes only thread behavior and

collects processor usage and requested operation se-

quences on runtime. Finally framework classiﬁes ma-

licious behavior of PaaS user using collected feature

vector.

Despite the need of malicious thread execution de-

tection in PaaS cloud and mentioned malware detec-

tion techniques’ favorable results; a malicious thread

behavior detection framework is proposed in PaaS

cloud using machine learning algorithms.

3 PROPOSED MECHANISM

The proposed mechanism covers a wide range of sce-

narios offered in the current PaaS ecosystem. A PaaS

system provides its computational power to its cus-

tomers through threads, and these customers can re-

side in the same process virtual machine. In that

case, there is a risk of inference between different cus-

tomers in addition to the PaaS provider resources. In

such an adversarial model, the aim of the proposed

method is not completely isolate cloud customers’ ac-

cess to the platform resources. The aim of the pro-

posed model is to determine if the cloud customer

is acting maliciously. This malicious act can occur

consciously or unconsciously. The maliciously acting

threads can be stopped or at least kept away from ac-

cessing more resources right after they are classiﬁed

as malicious by the proposed security framework.

It is assumed that, a PaaS customer’s cloud ap-

plication reside in a JVM and deployed the proposed

framework. The framework focuses on web applica-

CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science

420

tion users’ request based analysis. Fundamentally, the

mechanism distinctively analyses each worker thread

per request of each user, so even capable of classify

one-time unconscious malicious activity of a trust-

worthy user.

The framework runs in the PaaS provider side, to-

gether with PaaS customers’ cloud applications. This

framework monitors thread behavior and collected in-

formation is analyzed ofﬂine to train a classiﬁer that

is used for real-time decision making.

Java Management Extensions (JMX) is utilized to

measure processor shares, memory and average time

consumption for a user request. In addition, access to

each PaaS resource is wrapped in a separate method

which are entitled as checkpoints. Checkpoints are

deﬁned in enter and exit of resource consuming sys-

tem methods and are identiﬁed using aspects (Kicza-

les et al., 1997) because of its numerous advantages.

First, time consumption of each resource access can

be collected per-request without affecting privacy of

the customers. The only leaked data is the sequence of

the resource access and the time consumption in each

resource. Second, the checkpoints are programmed to

disrupt the execution of a thread if its behavior after

it is classiﬁed as malicious by the framework. This

is beneﬁcial because the threads cannot request any

more resources if they are classiﬁed as malicious.

Proposed mechanism detects maliciously acting

threads on the cloud platform using machine learning

techniques. First, classiﬁcation features are selected

to be measured by proposed framework. Instant CPU

usage and cumulative CPU usage per request are

two attributes of the feature vector and measured by

JMX. Moreover, feature vector contains three more

attributes per request. These attributes are resource

access duration, resource access type and resource

access sequence. Access type feature is mapped to

CRUD (create, read, update, delete) functions and

contains information about requested function. Se-

quence feature holds order of requested functions. It

is so informative to have sequence of these operations

to obtain frequency of the operation and transaction

between each operation.

Proposed mechanism processes feature vector us-

ing N-Gram algorithm. N-Gram represents a contigu-

ous sequence of N items from a given list of items

and predicts the next item. In natural language pro-

cessing these items can be letters or words, in speech

recognition items can be phonemes and in malware

analysis they can be system calls or machine instruc-

tion sequence. Operation sequence is one the feature

in proposed mechanism and set of operations is repre-

sented as O = {Add, Delete, Read, Update}. Table 1

represents sample tokenized operation sequences data

and their types to visualize structure of the train data.

This sequence data is represented as string and these

tokens are converted into a set of new attributes using

Weka NGramTokenizer API with the values N

min

= 1

and N

max

= 5. After this ﬁltering process, new feature

vector has 132 new attributes according to occurrence

frequency of the transactions between each operation

from the sequence text data. Operation transactions

are visualized in Figure 3.

Classiﬁcation algorithms are applied to training

data after feature vector measured and tokenized with

the N-Gram. In order to detect malicious thread, dif-

ferent classiﬁcation algorithms have been evaluated.

After the comparison of test results, Random Forest

classiﬁer is integrated into proposed framework as the

most accurate classiﬁer.

Proposed framework uses runtime observation to

classify a request. Figure 2 shows classiﬁcation pro-

cess of runtime observations. Proposed framework is

integrated into an experimental cloud web application

which is explained in the next section. This behavior

based system does not require a signature database

and framework can be integrated into a cloud applica-

tion with the minimum effort.

4 EXPERIMENTAL SETUP

The proposed mechanism is tested on a demo cloud

system that contains an event ticketing application

connected to a relational database. Conventional

paths of usage are recorded and repeated with Apache

JMeter to reproduce a set of regular requests. Real-

istic attack scenarios are considered and also added

to the query set of JMeter. Experimental ticketing

cloud application is composed of basic user opera-

tions that may be mapped to CRUD (create, read, up-

date, delete) operations on a database. These four

main operations are: add, delete, read, and update.

Depending on the payload of the request an event, a

user, a ticket may be added, read, updated or deleted

from the application. The application also includes

many meta elements such as text, graphics, audio, and

video. Training set contains add, delete, read and up-

date functions either as a regular or a malicious oper-

ation. Regular requests are deﬁned as common user

behavior depending on cloud application’s scope and

goal. On the other hand, malicious operations are se-

lected from a wide set of possibilities. An unexpected

content may be added to the database, whole table

may be dropped, a large set of bogus data may be in-

serted, etc. The attack scenarios are produced by con-

sidering cross-site scripting, SQL-injection, database

modiﬁcation and ﬁle system access scenarios.

Classifying Malicious Thread Behavior in PaaS Web Services

421

Figure 2: The overall architecture of the proposed framework for malicious thread behavior detection. The framework extracts

four features from any PaaS web service deployment in a cost-effective and privacy-friendly way; instant CPU share of a

thread in the web service, CPU share of a thread over a period of time, a thread’s access type (create, read, update, delete)

to critical resources (databases, ﬁles, sockets, etc.) and ﬁnally the duration of this access. These features are enriched with

N-Gram based on access sequence during training phase. During testing, a thread’s behavior is classiﬁed based on previous

knowledge.

The ﬁnal set contains ten sets that include nearly

1% malicious requests and the total number of queries

is 100000. Each experiment is conducted with 10 000

requests of which nearly 100 of the requests are ma-

licious. Each experiment is repeated 10 times. In the

ﬁnal set, there are 1000 malicious requests and 99 000

regular requests. The results are presented as the av-

erage of 10 independent experiments.

5 USED FEATURE SET AND

CLASSIFICATION

ALGORITHMS

In this paper, N-Gram feature extraction algorithm is

applied to feature set and its brief description is given

in Subsection 5.1. After the feature extraction, en-

semble learning algorithms, Random Forest, Bagging

and AdaBoost, are run on the data to evaluate their ac-

curacy in proposed framework. These classiﬁcation

algorithms are described respectively in Subsections

5.2, 5.3 and 5.4.

5.1 N-Gram Features

N-Gram models sequence of n elements that can

be letters, words or phonemes. In this pa-

per, N-Gram probabilistic model predicts type of

next operation X

based on previous operation se-

quence X

i−(n−1)

, X

i−(n−2)

, . . . X

i−1

. Likelihood of

next element in the sequence is symbolized as

P(X

| X

i−(n−1)

, X

i−(n−2)

, . . . X

i−1

) and it bases on

(n − 1) order Markov model.

Proposed framework applies Weka API NGram-

Tokenizer ﬁlter to collected operation sequence data.

An example data is shown in Table 1. N-Gram splits

this sequence data with the minimum and maximum

grams. It calculates frequencies and transitions of

each operation that illustrated in Figure 3. N-Gram

model can store more context with larger N. Storing

more context provides better prediction but it requires

more memory usage and time consumption. How-

ever, prediction gets worse with the small N while

memory usage and time consumption decrease. Even-

tually, N is given with the interval of [1, 5] that pro-

vides best efﬁciency both prediction accuracy, time

consumption and memory usage in proposed frame-

work. After the calculation, NGramTokenizer gener-

ates new features vector that contains probability of

each grams.

5.2 Random Forest Classiﬁer

Random Forest is an ensemble learning method that

grows many random classiﬁcation or regression trees.

Trees vote for the most popular class and the result is

the combination of these tree predictors. It runs efﬁ-

ciently on large datasets. Moreover, Random Forest

algorithm has randomness in tree construction which

CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science

422

Table 1: Sample data is shown below for N-Gram classiﬁca-

tion. N-Gram classiﬁes regular or malicious threads only by

the sequence of their access types. It does not check which

resource threads access or for how long. It also does not

check the parameters of the operations. Such information is

fed into the classiﬁcation model by other features.

Operation Sequence

Request

Type

Read → Read Regular

Read → Read → Add → Update Regular

Delete → Delete → Delete Malicious

Update Regular

Figure 3: Transitions, S

→S

t+1

, between states. States rep-

resent resource access types and transitions can be evaluated

with this state machine.

minimizes the correlation. In addition, Random For-

est algorithm does not overﬁt to data (Breiman, 2001).

Rapidly increasing cloud data trafﬁc requires mem-

ory and time efﬁcient algorithms to run with big data.

These characteristics make random forest algorithm

applicable to cloud security. The pseudo code of the

random forest algorithm is given in Algorithm 1.

5.3 Bagging

Bagging (Mamitsuka et al., 1998), also called Boot-

strap Aggregating, is an ensemble technique that uses

classiﬁers trained on instances generated by randomly

drawn examples, with replacement. Therefore, each

classiﬁer in the ensemble is obtained with a different

random sampling of the training dataset. The ﬁnal de-

cision is given by majority vote over individual clas-

siﬁers’ outputs.

Algorithm 1: Random Forest generates ensemble

of trees using randomly selected instances and fea-

tures.

1: procedure RANDOMFOREST

2: f ← f eatures

3: N ← number o f trees

4: H ←

5: for i = 1 to T do

6: n

← bootstrap samples f rom original data

7: g

← GROWTREE(n

, f )

8: H ←H∪ g

9: procedure GROWTREE(n,f)

10: f

← subset o f f

11: best split among f

12: return tree

5.4 AdaBoost

AdaBoost, short for Adaptive Boosting, is a success-

ful boosting algorithm that constructs a strong clas-

siﬁer H(x), as linear combination of weak classiﬁers

(x) shown in Equation 1. Prediction of the class

label H(x) is made by calculating the weighted av-

erage of the weak predictions h

(x). The weight, α

is based on the classiﬁer’s error rate which infers the

number of misclassiﬁed instances over the training set

divided by the training set size.

Previously Adaboost algorithm was used in net-

work intrusion detection because of its low compu-

tational complexity, a high detection rate, and a low

false-alarm rate (Hu et al., 2008). This algorithm is

one of the selected classiﬁer to measure its accuracy

in the proposed framework because of these advan-

tages.

H(x) = sign

∑

t=1

(x)

(1)

6 EXPERIMENTAL RESULTS

The experimental results are obtained using ten folds

cross validation. Results are evaluated using given

metrics: incorrectly classiﬁed instance percentage,

precision, recall and F-measure. These measurements

are computed according to True Positive (TP), False

Positive (FP), True Negative (TN) and False Nega-

tive (FN) rates given in confusion matrix in Table 2.

Incorrectly classiﬁed percentage calculation is given

in Equation 2, respectively precision, recall and F-

Measure computations are given in Equation 3, Equa-

tion 4 and Equation 5.

Classifying Malicious Thread Behavior in PaaS Web Services

423

Table 2: Confusion matrix legend.

Predicted

Malicious Regular

Actual

Malicious TP FN

Regular FP TN

Table 3 shows different classiﬁers’ result without

resource access sequence data. Operation sequence

feature is not evaluated in this experiment. This fea-

ture set has just processor usage, resource usage du-

ration and base operation type metrics. Random For-

est classiﬁcation results on this dataset have a large

number of incorrectly classiﬁed instances. In addition

Bagging and AdaBoost algorithms with J48 decision

tree base classiﬁer do not obtain better accuracy than

Random Forest algorithm.

Table 4 shows classiﬁcation results with N-Gram

feature extraction. Operation sequence features are

ﬁltered with the values N

min

= 1 and N

max

= 5 and

feature extraction process in framework is shown in

Figure 2. After the extraction, new feature vector has

attributes processor usage, resource usage, base oper-

ation type and generated N-Gram features.

Table 4 has much more accurate results than Ta-

ble 3 for all classiﬁers, although the same classiﬁca-

tion algorithms are run. The number of misclassiﬁed

instances drops from 888 to 162 after the operation se-

quence feature is added to framework to be collected

and to be processed with N-Gram module.

Random Forest algorithm gets the most accurate

and promising result in Table 4. Only 162 instances

are labeled incorrectly out of 100 000. In addition,

only 38 regular instances out of 99000 are classiﬁed

as malicious. It shows that proposed framework has

low false-alarm rate. 124 malicious instances out of

1000 are classiﬁed as regular request. Table 5 shows

confusion matrix of Random Forest algorithm with N-

Gram feature extraction. Precision result given in Ta-

ble 4 indicates that, a malicious predicted instance is

classiﬁed correctly with the probability of 0.95 by the

proposed framework. In addition, recall result shows

a malicious request is detected by framework with the

probability of 0.87. Since classes are unbalanced F-

Measure is evaluated as success criteria of the frame-

work to inspect if it is close to its best value at 1. As a

result, 0.91 F-Measure result indicates that proposed

framework is accurate on both malicious and regular

classes.

Misclassi f ied =

FP + FN

FP + T P + FN + T N

(2)

Precision =

T P

T P + FP

(3)

Table 3: Classiﬁers’ result without resource access se-

quence data.

Random

Forest

Bagging

Ada-

Boost

Misclassiﬁed % 0.888 0.9111 0.986

Precision 0.6204 0.6164 0.5133

Recall 0.299 0.242 0.291

F-Measure 0.4011 0.3457 0.3690

Table 4: Classiﬁers’ results enhanced with N-Gram feature

extraction.

Random

Forest

Bagging

Ada-

Boost

Misclassiﬁed % 0.162 0.192 0.2

Precision 0.9584 0.9646 0.9381

Recall 0.876 0.839 0.857

F-Measure 0.9153 0.8968 0.8954

Recall =

T P

T P + FN

(4)

F–Measure = 2 ×

Precision × Recall

Precision + Recall

(5)

7 DISCUSSION AND

CONCLUSION

This paper proposes a malicious thread behavior de-

tection framework for the use of PaaS providers.

This approach utilizes machine learning techniques

and especially beneﬁcial in the current multitenant

PaaS ecosystem as cloud customers may share the

resources within the same OS process. The multi-

tenancy requires strong isolation between customer

applications. The proposed framework obtains such

isolation by monitoring thread behavior and purposes

to satisfy cloud customers’ need for security and iso-

lation. The proposed framework is deployed in the

application level and it can be integrated into any web

service application in the PaaS cloud with a standard

JVM requirement.

Table 5: Confusion matrix of the proposed framework. The

proposed framework runs Random Forest classiﬁer with N-

Gram feature extraction into feature set.

Predicted

Malicious Regular

Actual

Malicious 876 124

Regular 38 98962

CLOSER 2018 - 8th International Conference on Cloud Computing and Services Science

424

The proposed mechanism investigates several ma-

chine learning techniques and combines them. First,

N-Gram module ﬁlters the operation sequence. The

ﬁltered sequence is combined with resource usage

metrics. Then, the proposed framework classiﬁes the

requests as regular or malicious using the combined

measured metrics. This classiﬁcation module is built

on the training data. The Random Forest classiﬁer is

able to detect a malicious request with the probability

of 0.87 in the proposed framework.

It is obvious that better results can be obtained us-

ing proposed framework with the more precise mea-

surement and different metrics in the future. Im-

proved frameworks would be applicable to malicious

behavior detection into the PaaS clouds domain in the

near future. As a future work, proposed framework

scenarios will be extended using different cloud ap-

plications and different metrics.

REFERENCES

Arshad, J., Townend, P., and Xu, J. (2012). An abstract

model for integrated intrusion detection and severity

analysis for clouds. Cloud Computing Advancements

in Design, Implementation, and Technologies, 1.

Banerjee, C., Kundu, A., Basu, M., Deb, P., Nag, D., and

Dattagupta, R. (2013). A service based trust man-

agement classiﬁer approach for cloud security. In Ad-

vanced Computing Technologies (ICACT), 2013 15th

International Conference on, pages 1–5. IEEE.

Bazm, M.-M., Lacoste, M., S

udholt, M., and Menaud, J.-M.

(2017). Side Channels in the Cloud: Isolation Chal-

lenges, Attacks, and Countermeasures. working paper

or preprint.

Bazrafshan, Z., Hashemi, H., Fard, S. M. H., and Hamzeh,

A. (2013). A survey on heuristic malware detection

techniques. In Information and Knowledge Technol-

ogy (IKT), 2013 5th Conference on, pages 113–120.

IEEE.

Breiman, L. (2001). Random forests. Machine learning,

45(1):5–32.

Fan, Y., Ye, Y., and Chen, L. (2016). Malicious sequen-

tial pattern mining for automatic malware detection.

Expert Systems with Applications, 52:16–25.

Garﬁnkel, T., Rosenblum, M., et al. (2003). A virtual

machine introspection based architecture for intrusion

detection. In Ndss, volume 3, pages 191–206.

Hamad, H. and Al-Hoby, M. (2012). Managing intrusion

detection as a service in cloud networks. International

Journal of Computer Applications, 41(1).

Hu, W., Hu, W., and Maybank, S. (2008). Adaboost-

based algorithm for network intrusion detection. IEEE

Transactions on Systems, Man, and Cybernetics, Part

B (Cybernetics), 38(2):577–583.

Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C.,

Lopes, C., Loingtier, J.-M., and Irwin, J. (1997).

Aspect-oriented programming. ECOOP’97Object-

oriented programming, pages 220–242.

Mamitsuka, N. A. H. et al. (1998). Query learning strate-

gies using boosting and bagging. In Machine learn-

ing: proceedings of the ﬁfteenth international confer-

ence (ICML98), volume 1.

Modi, C., Patel, D., Borisaniya, B., Patel, H., Patel, A., and

Rajarajan, M. (2013). A survey of intrusion detection

techniques in cloud. Journal of Network and Com-

puter Applications, 36(1):42–57.

Networking, C. V. (2017). Ciscoglobal cloud index: fore-

cast and methodology, 2015-2020. white paper.

Pirscoveanu, R. S., Hansen, S. S., Larsen, T. M., Ste-

vanovic, M., Pedersen, J. M., and Czech, A. (2015).

Analysis of malware behavior: Type classiﬁcation us-

ing machine learning. In Cyber Situational Aware-

ness, Data Analytics and Assessment (CyberSA), 2015

International Conference on, pages 1–7. IEEE.

Sandıkkaya, M. T.,

Odevci, B., and Ovatman, T. (2014).

Practical runtime security mechanisms for an apaas

cloud. In Globecom Workshops (GC Wkshps), 2014,

pages 53–58. IEEE.

Sanjay Ram, M. (2012). Secure cloud computing based on

mutual intrusion detection system. International Jour-

nal of Computer application, 1(2):57–67.

Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., and

Elovici, Y. (2012). Detecting unknown malicious code

by applying classiﬁcation techniques on opcode pat-

terns. Security Informatics, 1(1):1.

Su, Z., Yang, Q., Lu, Y., and Zhang, H. (2000). Whatnext:

A prediction system for web requests using n-gram

sequence models. In Web Information Systems Engi-

neering, 2000. Proceedings of the First International

Conference on, volume 1, pages 214–221. IEEE.

Uppal, D., Sinha, R., Mehra, V., and Jain, V. (2014). Mal-

ware detection and classiﬁcation based on extraction

of api sequences. In Advances in Computing, Com-

munications and Informatics (ICACCI, 2014 Interna-

tional Conference on, pages 2337–2342. IEEE.

Wu, S. X. and Banzhaf, W. (2010). The use of computa-

tional intelligence in intrusion detection systems: A

review. Applied Soft Computing, 10(1):1–35.

Zhang, Y., Juels, A., Reiter, M. K., and Ristenpart, T.

(2014). Cross-tenant side-channel attacks in paas

clouds. In Proceedings of the 2014 ACM SIGSAC

Conference on Computer and Communications Secu-

rity, pages 990–1003. ACM.

Classifying Malicious Thread Behavior in PaaS Web Services

425