Behaviour Knowledge Base.
We run the system with a synthetic user load to
evaluate all the system. The synthetic load corre-
spond to one hundred concurrently users requesting
random pages to the server. The system has one thou-
sand pages generated randomly. After runtime period
of one million requests, we need to validate the be-
havior and performance information of the different
elements in the system with the values observed and
gathered by the cache system and the behavior knowl-
edge base.
In that first experiment we only gather information
about the individual user requests: number or request,
response times and response sizes. Both in the user
side and in the origin server side we have logs about
the activity in the system. Apache provides modules
to create web logs in a standardized text file format.
Common Logfile Format (CLF) is the most usual for-
mat. CLF, for each request received by the server, add
a new line in the text file with the next format:
host ident authuser date request status bytes
10.0.0.1 - mathew [12/Jul/2004:14:50:13 -0700]
"GET /index.php HTTP/1.0" 200 3465
We can extract the number of requests and the size
of that requests. In the other side, with JMeter we can
extract a huge number of measures and statistics, in
particular, response sizes, response times and requests
number.
Once we have analyzed the Apache log and the
JMeter log we can compare those measures with the
measures gathered by the cache level in the known-
ledge base. We observed that the mean value of the
number of requests over each web page is one thou-
sand. This values are the same in the cache log and in
the Behaviour K.B.
For the response time analysis we could not use
the Apache log because all the pages stored in the
cache tier are not requested to the server, so these re-
quest are not registered in the Apache log. As the
cache tier log is not implemented to measure the re-
sponse time of the served requests only JMeter could
be used to gather that information. Once we have
compared the resquest time registered by JMeter and
stored in the Behaviour K.B. we observed that the
means time are practically the same, with a small
overhead over the JMeter measures. The reason of
that difference is the communication time between the
cache tier and the JMeter.
After the analysis of the measures we can con-
clude the Ontology defined in Section 4 can be used to
model the behavior and performance of web systems.
Our next future work is to implement some reasoning
rules to analysis that data and take decisions over the
configuration of the system.
4 WEB PERFORMANCE AND
BEHAVIOR ONTOLOGY
In this section is presented a summary of the Web Per-
formance and Behavior Ontology described in (Guer-
rero et al., 2008). Ontologies development is needed
of the use of some methodology. In our case, we use
the ontology building life-cycle explained in (Davies
et al., 2002; Uschold, 1995) and used in other research
works as (Lera et al., 2006; Lera et al., 2007).
The elements which take importance at our sce-
nario are: user sessions, user requests, HTTP re-
quests, HTTP responses and performance metrics.
Web applications and systems are built over HTTP
protocol (Internet based application protocol). There-
fore the definition of our Ontology, which is used to
represent the performance information, has to be de-
termined by the definition of HTTP (Fielding et al.,
1999).
For users, architecture of the server tiers are
completely clear and they make requests to an
URI (Berners-Lee et al., 2005). Users do not worry
about if that URI corresponse to an isolated server,
a proxy server, a cache server, load balancing server.
When the HTTP request arrive to the server identified
in the URI, the server processes that request. This
process could be divided in two different types of pro-
cess: local tasks and remote tasks.
The local tasks, that a server makes to give re-
sponse to user requests, generate a local workload on
that tier of the web-system. We could identified dif-
ferent kind of local workload in the different elements
or components of the server: disks, processors, DB
systems, memory, scripting interpreters modules, web
server modules. Some of that elements or compo-
nents could need tasks associated to other tiers in the
web-system. In those cases, new HTTP requests are
generated between the different system tiers. These
new requests generate network workload and local
workload in the target layer (remote tasks). Transmis-
sion times, latency and node process are the elements
corresponding to the network workload (Baldi et al.,
2003). In Figure 4 we present the model domain of
concepts corresponding to workload.
When local task in a tier is done, a HTTP response
is generated. If that HTTP response arrives to another
tier, once all the responses will arrive and all the lo-
cal tasks will be done, another HTTP response will
be generated. That path is repeated right to the tier
that received the user request. The HTTP response
generated by that last tier goes directly to the user.
HTTP responses generate network workload, in the
same way HTTP requests do. Figure 5 shows a sim-
ple model domain which corresponses to the HTTP
WEBIST 2008 - International Conference on Web Information Systems and Technologies
120