ing requests that use any other verb.
• HTTP Headers. Allowed HTTP headers and
their expected data types can also be configured
in the XML file, thus preventing attacks embed-
ded in these elements.
• Static Files. In this line of defense, the system
checks whether the requested resource is valid.
For this purpose, the XML configuration file con-
tains a complete list of all files that are allowed to
be served. If the requested resource is not present
in the list, a web attack is assumed.
• Dynamic Files. If an allowed resource is re-
quested, then it is checked whether it accepts in-
put arguments. In this case, the incoming data are
checked against the validation rules. These rules
include all arguments that are allowed for the re-
source, and which ones are mandatory in the re-
quest. Again, these rules are of the form “deny
everything unless explicitly allowed”. Thus, if
the user-supplied request contains incorrect argu-
ments for an allowed resource, an attack is as-
sumed and the request will not reach the web
server.
• Argument Values. If an allowed resource with
allowed parameters is requested, the value of the
arguments is checked. An incoming request will
be allowed if all parameter values are identified as
normal. Argument values are decoded before be-
ing checked. As described in Sec. 2.2, for each
resource and parameter, the XML file describes
statistical features of normal values. By analyz-
ing actual values, the system decides whether they
are anomalous (rejecting the incoming request) or
normal (allowing the request).
3 EXPERIMENTS
3.1 Case Study: Web Shopping
The WAF has been configured to protect a specific
web application, consisting of an e-commerce web
store, where users can register and buy products us-
ing a shopping cart.
3.2 XML File Generation
As already stated, the XML file describes the normal
behavior of the web application. Therefore, to train
the system and configure this file, only normal and
non-malicious traffic to the target web application is
required. Nevertheless, how to obtain only normal
traffic may not be an easy task. Since a statistical
approach was used for the characterization of normal
argument values, thousands of requests are needed.
There are some alternatives to obtain normal traffic:
• Thousands of legitimate and non-malicious users
can surf the target web application and gener-
ate normal traffic. However, getting thousands of
people to surf the web might not be an easy task.
• The application can be published in the Internet,
but unfortunately attacks would be mixed with
normal traffic. Classifying normal and anomalous
traffic is unviable. In (Kruegel et al., 2005) this
approach is used. However their training data in-
clude attacks, so some attacks cannot be detected,
as they are considered as normal traffic.
• Traffic can be generated artificially. Although the
traffic is not real, we can be sure that only normal
traffic is included.
Normal traffic acquisition is a general problem in
attack detection, still to be completely solved. For our
purposes, we considered artificial traffic generation to
be the most suitable approach.
3.3 Artificial Traffic Generation
In our approach, normal and anomalous request
databases are generated artificially with the help of
dictionaries.
3.3.1 Dictionaries
Dictionaries are data files which contain real data to
fill the different arguments used in the target applica-
tion. Names, surnames, addresses, etc., are examples
of dictionaries used.
A set of dictionaries containing only allowed val-
ues is used to generate the normal request database.
A different set of dictionaries is used to generate the
anomalous request database. The latter dictionaries
contain both known attacks and illegal values with no
malicious intention.
3.3.2 Normal Traffic Generation
Allowed HTTP requests are generated for each page
in the web application. If the page presents a form, the
fields are filled out only with legal values. Arguments
and cookies in the page, if any, are also filled out with
allowed values. Depending on the case, the values
can be chosen randomly or obtained from the normal
dictionaries. The result is a normal request database
(NormalDB), which will be used both in the training
and test phase.
SECRYPT 2009 - International Conference on Security and Cryptography
26