AN ANOMALY-BASED WEB APPLICATION FIREWALL

Carmen Torrano-Gimenez, Alejandro Perez-Villegas and Gonzalo Alvarez

Instituto de F´ısica Aplicada, Consejo Superior de Investigaciones Cient´ıﬁcas, Serrano 144 - 28006, Madrid, Spain

Keywords:

Web attacks, Anomaly intrusion detection, Web application ﬁrewall.

Abstract:

A simple and effective web application ﬁrewall is presented. This system can detect both known and unknown

web attacks following a positive security model. For attack detection, the system relies on an XML ﬁle, which

thoroughly describes normal web application behavior. Any irregular behavior is ﬂagged as intrusive. An

initial training phase is required to statistically characterize how normal trafﬁc for a given target application

looks like. The system has been tested with a real web application as target and an artiﬁcial request generator

as input. Experiments show that after the training phase, when the XML ﬁle is correctly conﬁgured, good

results are obtained, with a very high detection rate and a very low false alarm rate.

1 INTRODUCTION

Web applications are becoming increasingly popular

and complex in all sorts of environments, ranging

from e-commerce applications, to social networks, to

banking. As more vital information and services are

moved to web platforms, their interest as potential tar-

gets increases. As a consequence, web applications

are subject to all sort of attacks, many of which might

be devastating (Alvarez and Petrovic, 2003). Unfor-

tunately, conventional ﬁrewalls, operating at network

and transport layers, are usually not enough to protect

against web-speciﬁc attacks. To be really effective,

detection is to be moved to the application layer.

Intrusion detection is the act of detecting mali-

cious actions and behaviors that can compromise the

security and trust of a computer system. An Intrusion

Detection System (IDS) analyzes information from a

computer or a network to identify possible security

breaches. Traditionally, IDS’s have been classiﬁed as

either signature detection systems or anomaly detec-

tion systems. The ﬁrst method looks for signatures

of known attacks using pattern matching techniques

against a frequently updated database of attack signa-

tures. The second method looks for anomalous sys-

tem activity: once normal behavior is well deﬁned,

irregular behavior will be tagged as intrusive. An

hybrid intrusion detection system combines the tech-

niques of the two approaches.

Signature-based IDS’s rely on large signature

databases and are unable to detect new attacks. To

work properly, databases must be updated frequently

and signature matching usually requires high compu-

tational effort. Anomaly-based IDS’s overcome these

problems. However, in rather complex environments,

obtaining an up-to-date and feasible picture of what

“normal” network trafﬁc should look like proves to

be a hard problem.

The results of signature-based WAFs depend on

the actual signature conﬁguration for each web appli-

cation, and cannot be compared with anomaly-based

WAFs.

Varied techniques have been used to solve the gen-

eral intrusion detection problem (Patcha and Park,

2007) , like clustering (Petrovi´c et al., 2006). How-

ever some of these techniques are not appropriate to

solve the web intrusion detection problem due to the

difﬁculty of coding a request as an input. Some of the

main works developed to solve web attack detection

are (Kruegel et al., 2005) which follows an statistical

approach and (Est´evez-Tapiador et al., 2004) which

uses Markov chains.

In this paper, a simple and effective anomaly-

based Web Application Firewall (WAF) is presented.

This system relies on an XML ﬁle to describe what a

normal web application is. Any irregular behavior is

ﬂagged as intrusive. The XML ﬁle must be tailored

for every target application to be protected.

The rest of the paper is organized as follows. In

Sec. 2, a system overview is given, where system ar-

chitecture, normal behavior modeling, and attack de-

tection are explained. Section 3 refers to experiments.

Trafﬁc generation, the training phase, the test phase

and results are also described. Section 4 describes

Perez-Villegas A. and Alvarez G. (2009).

AN ANOMALY-BASED WEB APPLICATION FIREWALL.

In Proceedings of the International Conference on Security and Cryptography, pages 23-28

DOI: 10.5220/0002218900230028

 SciTePress

system limitations and suggests future work. Finally,

in Sec. 5, the conclusions of this work are captured.

2 SYSTEM OVERVIEW

2.1 Architecture

Our anomaly-based detection approach analyzes

HTTP requests sent by a client browser trying to get

access to a web server. The analysis takes place ex-

clusively at the application layer, speciﬁcally HTTP

protocol. Thus, the system can be seen as an anomaly-

based Web Application Firewall, in contrast with ex-

isting signature-based WAFs (ModSecurity, 2009).

By analyzing HTTP requests, the system decides

whether they are suspicious or not. An HTTP request

is considered suspicious when it differs from the nor-

mal behavior according to some speciﬁc criteria.

In our architecture, the system operates as a proxy

located between the client and the web server. Like-

wise, the system might be embedded as a module

within the server. However, the ﬁrst approach enjoys

the advantage of being independent of the web plat-

form.

This proxy analyzes all the trafﬁc sent by the

client. The input of the detection process consists of a

collection of HTTP requests {r

, r

, . . . r

}. The out-

put is a single bit a

for each input request r

, which

indicates whether the request is normal or anomalous.

Since the main goal of our system is to detect web at-

tacks trying to reach the web server, a single bit alert

and event log upon detection is enough. However, it

could be more practical to give the system the ability

to operate as a ﬁrewall, that is, allow normal requests

to reach the server and block malicious ones. There-

fore, the proxy is able to work in two different modes

of operation: as IDS and as ﬁrewall.

In detection mode, the proxy simply analyzes the

incoming packets and tries to ﬁnd suspicious patterns.

If a suspicious request is detected, the proxy launches

an alert; otherwise, it remains inactive. In any case,

the request will reach the web server. When operat-

ing in detection mode, attacks could succeed, whereas

false positives don’t limit the system functionality.

In ﬁrewall mode, the proxy receives requests from

client users and analyzes them. If the request is valid,

the proxy routes it to the server, and sends back the re-

ceived response to the client. If not, the proxy blocks

the request, and sends back a generic denegation ac-

cess page to the client. Thus, the communication be-

tween proxy and server is established only when the

request is valid.

A diagram of the WAF’s architecture is shown in

Fig. 1.

Figure 1: Web Application Firewall Architecture.

2.2 Normal Behavior Description

Prior to the detection process, the system needs a pre-

cise picture of what the normal behavior is in a spe-

ciﬁc web application. For this purpose, our system

relies on an XML ﬁle which contains a thorough de-

scription of the web application’s normal behavior.

Once a request is received, the system compares it

with the normal behavior model. If the difference ex-

ceeds a given threshold, then the request is ﬂagged as

an attack and an alert is launched.

The XML ﬁle consists of a set of rules describ-

ing the different aspects of the normal request. More

precisely, the XML contains rules regarding to the

correctness of HTTP verbs, HTTP headers, accessed

resources (ﬁles), arguments, and values for the argu-

ments. This ﬁle contains three main nodes:

Verbs

The verbs node simply speciﬁes the list of allowed

HTTP verbs. Requests using any other verb will be

rejected.

Headers

The headers node speciﬁes the list of allowed HTTP

headers and their allowed values.

Directories

The directories node has a tree-like structure, in close

correspondence to the web application’s directory

structure.

1. Each directory in the web application space is rep-

resented in the XML ﬁle by a directory node, al-

lowing nesting of directories within directories.

The attribute name deﬁnes these nodes.

SECRYPT 2009 - International Conference on Security and Cryptography

2. Each ﬁle in the web application space is repre-

sented by a ﬁle node within a directory node and

is deﬁned by its attribute name.

3. Input arguments are represented by argument

nodes within the corresponding ﬁle node. Each

argument is deﬁned by the following set of at-

tributes:

• name: the name of the argument, as expected

by the web page.

• requiredField: a boolean value indicating

whether the request should be rejected if the ar-

gument is missing.

4. Legal values for arguments should meet some sta-

tistical rules, which are represented by a stats

node within the corresponding argument node.

These statistical properties together give a de-

scription of the expected values. Requests with ar-

gument values exceeding their corresponding nor-

mal properties by a given threshold will be re-

jected. Each relevant property is deﬁned by an

attribute within the stats node. In our approach

we considered the following relevant properties:

• Special: set of special characters (no letters and

no digits) allowed.

• lengthMean: mean value of input length.

• lengthDev: std deviation of character length.

• letterMean: mean number of letters.

• letterDev: std deviation of number of letters.

• digitMean: mean number of digits.

• digitDev: std deviation of number of digits.

• specialMean: mean number of special charac-

ters (of those included in the propertie Special).

• specialDev: std deviation of number of special

characters (of those included in the propertie

Special).

When an incoming request is received, the fea-

tures of each argument value are measured. Since

these measurements are quantitative, it is possi-

ble to compute the distance to the average value.

If the distance is greater than the corresponding

threshold (scaled by the standard deviation), the

value is considered anomalous.

The construction of the XML ﬁle and the adequate

selection of the threshold parameters are crucial for a

good detection process. An example of XML conﬁg-

uration ﬁle is shown in Fig. 2.

2.3 Detection Process

In the detection process, our system follows an ap-

proach of the form “deny everything unless explicitly

<verbs>

</verbs>

<rule name="Accept-Charset"

value="ISO-8859-1"/>

<rule name="Accept-Charset"

value="utf-8"/>

</headers>

<argument name="quantity"

requiredField="true">

<stats

Special=""

digitDev="0.0"

digitMean="100.0"

lengthDev="0.1410730099"

lengthMean="1.93"

letterDev="0.0"

letterMean="0.0"

otherDev="0.0"

otherMean="0.0"/>

</argument>

...

Figure 2: XML ﬁle example.

allowed”, also known as positive security model. This

approach is always more secure than looking for at-

tack signatures in incoming requests, known as nega-

tive security model. However,the positive approach is

prone to more false positives, with a possible impact

on functionality and user comfort.

The detection process takes place in the proxy. It

consists of several steps, each constituting a different

line of defense, in which a single part of the request

is checked with the aid of the XML ﬁle. If an incom-

ing request fails to pass one of these lines of defense,

an attack is assumed: a customizable error page is re-

turned to the user and the request is logged for further

inspection. It is important to stress that these requests

will neverreach the web server when operating in ﬁre-

wall mode.

The following are the lines of defense considered

in our system.

• HTTP Verbs. The system can be conﬁgured to

accept a subset of HTTP verbs. For example, in

the applications in which only GET, POST and

HEAD are required to work correctly, the XML

ﬁle could be conﬁgured accordingly, thus reject-

AN ANOMALY-BASED WEB APPLICATION FIREWALL

ing requests that use any other verb.

• HTTP Headers. Allowed HTTP headers and

their expected data types can also be conﬁgured

in the XML ﬁle, thus preventing attacks embed-

ded in these elements.

• Static Files. In this line of defense, the system

checks whether the requested resource is valid.

For this purpose, the XML conﬁguration ﬁle con-

tains a complete list of all ﬁles that are allowed to

be served. If the requested resource is not present

in the list, a web attack is assumed.

• Dynamic Files. If an allowed resource is re-

quested, then it is checked whether it accepts in-

put arguments. In this case, the incoming data are

checked against the validation rules. These rules

include all arguments that are allowed for the re-

source, and which ones are mandatory in the re-

quest. Again, these rules are of the form “deny

everything unless explicitly allowed”. Thus, if

the user-supplied request contains incorrect argu-

ments for an allowed resource, an attack is as-

sumed and the request will not reach the web

server.

• Argument Values. If an allowed resource with

allowed parameters is requested, the value of the

arguments is checked. An incoming request will

be allowed if all parameter values are identiﬁed as

normal. Argument values are decoded before be-

ing checked. As described in Sec. 2.2, for each

resource and parameter, the XML ﬁle describes

statistical features of normal values. By analyz-

ing actual values, the system decides whether they

are anomalous (rejecting the incoming request) or

normal (allowing the request).

3 EXPERIMENTS

3.1 Case Study: Web Shopping

The WAF has been conﬁgured to protect a speciﬁc

web application, consisting of an e-commerce web

store, where users can register and buy products us-

ing a shopping cart.

3.2 XML File Generation

As already stated, the XML ﬁle describes the normal

behavior of the web application. Therefore, to train

the system and conﬁgure this ﬁle, only normal and

non-malicious trafﬁc to the target web application is

required. Nevertheless, how to obtain only normal

trafﬁc may not be an easy task. Since a statistical

approach was used for the characterization of normal

argument values, thousands of requests are needed.

There are some alternatives to obtain normal trafﬁc:

• Thousands of legitimate and non-malicious users

can surf the target web application and gener-

ate normal trafﬁc. However, getting thousands of

people to surf the web might not be an easy task.

• The application can be published in the Internet,

but unfortunately attacks would be mixed with

normal trafﬁc. Classifying normal and anomalous

trafﬁc is unviable. In (Kruegel et al., 2005) this

approach is used. However their training data in-

clude attacks, so some attacks cannot be detected,

as they are considered as normal trafﬁc.

• Trafﬁc can be generated artiﬁcially. Although the

trafﬁc is not real, we can be sure that only normal

trafﬁc is included.

Normal trafﬁc acquisition is a general problem in

attack detection, still to be completely solved. For our

purposes, we considered artiﬁcial trafﬁc generation to

be the most suitable approach.

3.3 Artiﬁcial Trafﬁc Generation

In our approach, normal and anomalous request

databases are generated artiﬁcially with the help of

dictionaries.

3.3.1 Dictionaries

Dictionaries are data ﬁles which contain real data to

ﬁll the different arguments used in the target applica-

tion. Names, surnames, addresses, etc., are examples

of dictionaries used.

A set of dictionaries containing only allowed val-

ues is used to generate the normal request database.

A different set of dictionaries is used to generate the

anomalous request database. The latter dictionaries

contain both known attacks and illegal values with no

malicious intention.

3.3.2 Normal Trafﬁc Generation

Allowed HTTP requests are generated for each page

in the web application. If the page presents a form, the

ﬁelds are ﬁlled out only with legal values. Arguments

and cookies in the page, if any, are also ﬁlled out with

allowed values. Depending on the case, the values

can be chosen randomly or obtained from the normal

dictionaries. The result is a normal request database

(NormalDB), which will be used both in the training

and test phase.

SECRYPT 2009 - International Conference on Security and Cryptography

3.3.3 Anomalous Trafﬁc Generation

Illegal HTTP requests are generated with the help

of anomalous dictionaries. There are three types of

anomalous requests:

• Static attacks fabricate the resource requested.

These requests include obsolete ﬁles, session id

in URL rewrite, directory browsing, conﬁguration

ﬁles, and default ﬁles.

• Dynamic attacks modify valid request argu-

ments: SQL injection, CRLF injection, cross-site

scripting, server side includes, buffer overﬂows,

etc.

• Unintentional illegal requests. These requests

should also be rejected even though they do not

have malicious intention.

The result is an anomalous request database

(AnomalousDB), which will be used only in the test

phase.

3.4 Training Phase

During the training phase, the system learns the web

application normal behavior. The aim is to obtain the

XML ﬁle required for the detection process. In this

phase only requests from NormalDB are used. In the

construction of the XML ﬁle, different HTTP aspects

must be taken into account.

• Argument values are characterized by extracting

statistical features from the requests.

• Verbs, headers and resources found in the requests

are included directly in the XML ﬁle as allowed

requests.

3.5 Test Phase

During the test phase, depicted in Fig. 3, the proxy

accepts requests from both databases, NormalDB and

AnomalousDB, and relies on the XML ﬁle to decide

whether the requests are normal or anomalous.

The performance of the detector is then measured

by Receiver Operating Characteristic (ROC) curves

(Provost et al., 1998). A ROC curve plots the attack

detection rate (true positives, TP) against the false

alarm rate (false positives, FP).

DetectionRate =

TP+ FN

(1)

FalseAlarmRate =

FP + TN

(2)

Once the system has characterized how normal re-

quests look like, the proxy analyzes a set of requests

(normal and anomalous) varying a certain parameter.

The results are measured by the ROC curves.

A sensibility parameter s can be tuned to obtain

different points in the ROC curve. An argument is

considered normal only if all its statistical properties

are inside the corresponding interval. Otherwise, the

request is classiﬁed as anomalous and a log is gener-

ated.

The three allowed intervals of the argument values

are calculated from the statistical features registered

in the XML ﬁle and the sensibility parameter s:

[µ− σ∗ s, µ+ σ∗ s], (3)

Figure 3: System Test Phase.

3.6 Results

Regarding normal trafﬁc, 36,000 normal requests are

generated in 1,000 iterations. With respect to anoma-

lous trafﬁc, approximately 25,000 malicious requests

are generated in 500 iterations. Tests have been done

varying the sensibility parameter, showing that results

only vary when the sensibility parameter values are

between 1 and 18. The ROC curve obtained is shown

in Fig. 4.

0 0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

0 0.3

0.99

Figure 4: ROC curve of WAF protecting the web store.

As can be seen, very good results are obtained:

the number of false positives is close to 0 whereas

the detection rate is close to 1. The reason why such

AN ANOMALY-BASED WEB APPLICATION FIREWALL

good results are obtained is that the XML ﬁle closely

characterizes the web application normal behavior in

an effective manner.

4 LIMITATIONS AND FUTURE

WORK

As shown in the previous section, when the XML ﬁle

is conﬁgured correctly, the system succeeds in detect-

ing any kind of web attacks. Thus, the main issue

is how to automatically conﬁgure the XML descrip-

tion ﬁle. In our approach, the XML ﬁle is built from

a large set of allowed requests in the target web ap-

plication. However, obtaining only normal and non-

malicious trafﬁc may not be an easy task, as was dis-

cussed in Sec. 3.2. Therefore, the main limitation

consists in correctly implementing the training phase

for any web application.

Other limitations arise when protecting complex

web applications. For instance, web sites that cre-

ate and remove pages dynamically, generate new

URLs to access resources, or allow users for updat-

ing contents, may difﬁcult the XML ﬁle conﬁgura-

tion. Further modiﬁcations of the system will attempt

to solvethese problems, by statistically characterizing

the URLs of allowed resources.

Future work refers to signing cookies and hidden

ﬁelds in order to avoid cookie poisoning and hidden

ﬁeld manipulation attacks. Also, URL patterns will

be used in describing sites with dynamic resources.

5 CONCLUSIONS

We presented a simple and efﬁcient web attack detec-

tion system or Web Application Firewall (WAF). As

the system is based on the anomaly-based method-

ology (positive security model), it proved to be able

to protect web applications from both known and un-

known attacks. The system analyzes input requests

and decides whether they are anomalous or not. For

the decision, the WAF relies on an XML ﬁle which

speciﬁes web application normal behavior. The ex-

periments show that as long as the XML ﬁle correctly

deﬁnes normality for a given target application, near

perfect results are obtained. Thus, the main challenge

is how to create an accurate XML ﬁle in a fully auto-

mated manner for any web application. We show that

inasmuch great amounts of normal (non-malicious)

trafﬁc are available for the target application, this au-

tomatic conﬁguration is possible using a statistical

characterization of the input trafﬁc.

ACKNOWLEDGEMENTS

We would like to thank the Ministerio de In-

dustria, Turismo y Comercio, project SE-

GUR@ (CENIT2007-2010), project HESPERIA

(CENIT2006-2009), the Ministerio de Ciencia e

Innovacion, project CUCO (MTM2008-02194), and

the Spanish National Research Council (CSIC),

programme JAE/I3P.

REFERENCES

Alvarez, G. and Petrovic, S. (2003). A new taxonomy of

web attacks suitable for efﬁcient encoding. Computers

and Security, 22(5):453–449.

Est´evez-Tapiador, J., Garc´ıa-Teodoro, P., and D´ıaz-Verdejo,

J. (2004). Measuring normality in http trafﬁc for

anomaly-based intrusion detection. Computer Net-

works, 45(2):175–193.

Kruegel, C., Vigna, G., and Robertson, W. (2005). A multi-

model approach to the detection of web-based attacks.

Computer Networks, 48(5):717–738.

ModSecurity (2009). Open source signature-based web ap-

plication ﬁrewall, http://www.modsecurity.org.

Patcha, A. and Park, J. (2007). An overview of anomaly de-

tection techniques: Existing solutions and latest tech-

nological trends. Computer Networks, 51(12):3448–

3470.

Petrovi´c, S.,

Alvarez, G., Orﬁla, A., and Carb´o, J. (2006).

Labelling clusters in an intrusion detection system us-

ing a combination of clustering evaluation techniques.

In Proceedings of the 39th Hawaii International Con-

ference on System Sciences, Kauai, Hawaii (USA).

IEEE Computer Society Press. 8 pages (CD ROM).

Provost, F., Fawcett, T., and Kohavi, R. (1998). The case

against accuracy estimation for comparing induction

algorithms. In Proceedings of the 15th International

Conference on Machine Learning, San Mateo, CA.

Morgan Kaufmann.

SECRYPT 2009 - International Conference on Security and Cryptography