Towards New Metrics For Web Portals
Gennaro Costagliola, Filomena Ferrucci, Vittorio Fuccella, Luigi Zurolo
Dipartimento di Matematica e Informatica, Università di Salerno, Via Ponte Don Melillo, I-84084 Fisciano (SA)
Keywords: Web, www, metrics, portal, portlet, JSR 168, WSRP, logging, log, query, log4p.
Abstract: Content Management Systems and Web Portal Frameworks are more and more widely adopted in Web
development. Those kinds of software often produce web pages whose layout is divided in sections called,
in the case of Web Portals, “portlets”. Portlets can be produced by different sources and then aggregated in
the same page by the portal. For Web portals, traditional web metrics based on page visits can be inadequate
for fully understanding user’s interest, due to the heterogeneity of content and the variety of sources. This
paper proposes a system for evaluating the web traffic at a deeper level than the page visit one: the level of
the sections, or of the portlets. The interest of the user in the sections of the page is gauged through implicit
interest indicators, such as, section visibility, mouse movements and other client-side interactions. Our
system is composed of two different products: a framework that, opportunely instantiated in a web portal,
allows the production of a log, and a log analyzer. The possible uses and benefits gained by research in the
fields of web traffic analysis, portal design and usability are investigated in depth.
Content Management Systems (CMS, in the sequel)
and Web Portal Frameworks are more and more
widely adopted in the development of Web sites,
mainly due to their characteristic of allowing the
web designers to rapidly develop a web site and the
portal administrators to rapidly update its contents.
CMS and portal frameworks produce web pages
whose layout is divided in sections called portlets.
This division is not only a layout concern, but it
occurs in all the steps of the generation of the pages:
in the case of many portals, the portlets can be
produced by different, eventually remote, sources
and then aggregated in the same page. Thus, we are
on the way towards the creation of a portlet market,
where content, or part of it, is produced by third
parties and then shown on the publisher’s web site.
The technical solution to achieve this
organization, is based on the production of markup
fragments (a concern of the portlet), and their
aggregation in a single page (a concern of the
portal). The development of standards and
specifications, such as JSR 168 (2003) and WSRP
(2003), has helped to this extent. A noteworthy
example of a Web site, whose layout demonstrates
the use of a portal framework in its development, is
that of Yahoo. A screenshot of its home page is
shown in figure 1.
Figure 1: The Yahoo home page and its sections.
As a result of the aggregation, a portal page can
contain a highly heterogeneous content, taken from
Costagliola G., Ferrucci F., Fuccella V. and Zurolo L. (2007).
In Proceedings of the Third International Conference on Web Information Systems and Technologies - Web Interfaces and Applications, pages 98-105
DOI: 10.5220/0001289300980105
various portlet producers. For web sites developed
with CMS and portal framework technology,
traditional metrics based on page visits can be
inadequate to fully understand user’s interest: new
forms of metrics are needed. What we need are
metrics which can give information at a deeper level
than that of page visits: the level of the sections, or
of the portlets. Unfortunately, at present there is a
lack of these metrics.
This paper presents some tools which make use
of new metrics suitable for describing the behaviour
of the visitors on the portal pages. The gathering of
such information is carried out through the use of a
framework directly instantiated in the portal. The
framework produces an XML log, which includes
raw data, such as the implicit interest indicators (i. e.
any interaction of the user with the portlets) and the
visibility of the portlets in the pages, captured when
the users browse the portal pages.
The logs are analyzed through a suitable log
analyzer that obtains some higher level information
by performing several queries on the logs, such as:
An estimation of the visibility of the portlets
in the page.
The interactivity level of the portlets.
From this information we can obtain an estimation
of the interest shown by the users in the portlets. The
analysis can be made available per single user visit
and session, across multiple visits of the same user
or for all the users.
Once obtained, this data can be used for multiple
purposes. The possible uses and benefits gained by
research in fields of web traffic analysis, portal
design and web usability are investigated in depth.
The rest of the paper is organized as follows:
section 2 gives the knowledge background necessary
to understand some concepts on which the system is
based. The system is presented in section 3: the
section is composed in two sub-sections, the first to
describe the logger framework and the second for
the log analyzer. Finally, in section 4, we briefly
discuss possible uses and benefits of our system.
Several final remarks and a discussion on future
work conclude the paper.
A portal is a web site which constitutes a starting
point, a gate to a consistent group of resources and
services in the Internet or in an intranet. Most of the
portals were born as Internet directories (as Yahoo)
and/or as search engines (as Excite, Lycos, etc.). The
offer of services has spread in order to increment the
number of users and the time they spend browsing
the site. These services, which often require user
registration, include free email, chat rooms, and
personalization procedures. In the history of portals,
many authors identify two generations. Second
generation Web portals distinguish themselves from
first generation ones for their architecture, which is
component-oriented. In particular, the basic
component constituting them, is often referred to as
portlet. The portal is responsible for aggregating
information coming from different sources, local or
remote, available in the form of mark-up fragments.
Each of these fragments is produced by a portlet. In
the context of web portals, the possibility to deploy a
portlet in any portal is particularly significant. To
this extent, that is, to achieve interoperability among
portals, it has been necessary to define a standard
way to develop and deploy portlets. Two main
standards have been defined and widely adopted by
producers: the Web Services for Remote Portlets
(WSRP) and the Java Portlet Specification and API
(JSR 168). The former is more oriented to the
definition of rules about the use of remote portlets,
the latter is focused on the definition of interfaces
for the development of portlets which can run in
Java-based portals.
WSRP defines a Web service interface through
which portals can interact with the remote
producer’s portlets. The WSRP 1.0 specification was
approved as an OASIS standard in August, 2003.
Being based on Web services, several interfaces to
adopt the standard have been developed for the most
used technologies (e.g. J2EE, .NET, and so on).
Most of the Java technologies, part of the Java 2
Enterprise Edition, the platform for the development
and deployment of distributed enterprise
applications, follow a consolidated architectural
model, called container/component architecture.
This model offers the chance to develop components
and deploy them on different containers. Both
component and containers compliant to
specifications can be developed independently and
commercialized by different software vendors, thus
creating a market economy on Java software.
Furthermore, several good-quality Open Source
products compete with them. The JSR 168 follows
the container/component model and its adoption has
grown until it has become an important reference-
point which cannot be excluded from the projects
aimed at the development of Web portals. Among
other things, it defines the architecture model of
conformant portals. Its main constituents are the
portlet, which produces content mark-up, the portal,
which aggregates the mark-up, and the portlet
container, which manages the portlet lifecycle and
provides an API to the portlets for accessing to a set
of services. The typical architecture for a JSR 168
conformant web portal is shown in figure 2.
Figure 2: JSR 168 compliant portal architecture.
Our system is aimed at obtaining detailed statistics
in order to have a deep analysis of the user’s interest
in the Web portal and, in particular, in the sections
which compose its pages.
Our system is composed of two different pieces
of software:
1 A framework, to be used by Web portal
developers, which, instantiated in the Web
portal, is in charge of capturing information
about user’s behaviour during the navigation of
the portal and storing it in an XML log.
2 A logger analyzer, to be used by Web portal
administrators, developed as a stand-alone
application, which is responsible for analyzing
the data gathered by the logger.
3.1 The Logging Framework
The purpose of the Logger Framework is to gather
all of the user actions during the browsing of the
portal and to store raw information in a set of log
files in XML format.
The framework is composed of a server-side and
a client-side module. The client-side module is
responsible for “being aware” of the behaviour of
the user while he/she is browsing the portal pages.
The server-side module receives the data from the
client and creates and stores log files on the disk.
Despite the required interactivity level, due to
the availability of AJAX (Asynchronous JavaScript
and XML), the new technology for increasing the
interactivity of Web content, it has been possible to
implement the client-side module of our framework
without developing plug-in or external modules for
Web browsers. Javascript has been used on the client
to capture user interactions and the text-based
communication between the client and the server has
been implemented through AJAX method calls. The
client-side scripts are added to the portal pages with
a light effort by the programmer.
The events captured by the framework are the
Actions undertaken on the browser window
(open, close, resize)
Actions undertaken in the browser client area
(key pressed, scrolling, mouse movements)
The event data is gathered on the browser and
sent to the server at regular intervals. It is worth
noting that the event capture does not prevent other
scripts present in the page to run properly.
The server-side module has been implemented as
a Java servlet which receives the data from the client
and prepares an XML document in memory. At the
end of the user session the XML document is written
to the disk. To reduce the size of log files, a new file
is used every day.
The information model used for the log data is
shown in figure 3. All the information is organized
per user session. At this level, an identifier (if
available) and the IP of the user are logged as well
as agent information (browser type, version and
operating system). A session is composed of page
visits data. For every page, the referrer is logged and
a list of events is present. The data about the user
interactions are the following:
Event type
HTML source object involved in the event
Portlet containing the HTML object and its
position in the page (coordinates of the
Mouse coordinates
Timing information (timestamp of the event)
More information specific of the event
Figure 3: The information model for log data.
WEBIST 2007 - International Conference on Web Information Systems and Technologies
An important concern in web metrics is the log size.
In very crowded web sites, even simple HTTP
request web logs can reach big sizes. A
configuration system, including the following
configuration settings has been conceived in order to
reduce log sizes:
List of events to capture
List of portlets to monitor
Time interval between two data transmissions
from the client to the server
Sensitivity for mouse movements
Sampling factor (only for a random user every
n logging is activated)
The configuration is read by the server-side
module but affects the generation of the javascript
modules run on the client-side. The architecture of
the framework is graphically represented in figure 4.
On the client machine, everything can be done in
the web Browser. The Javascript modules for event
capturing, dynamically generated on the server, are
downloaded and run in the browser interpreter. Data
is sent to the server through an AJAX request. On the
server-side, a module called RequestHandler
receives it. Once received, a module called
LoggerHandler organizes the XML document in
memory and flushes it to the disk every time a user
session finishes.
3.2 The Log Analyzer
The next phase of the data gathering is the data
analysis. In our system this is done through a web-
based stand-alone application, optionally hosted on
the same server of the logging framework, which
takes in input the log files.
The analysis phase consists of a series of
analysis on the behaviour of the user, starting from
the data stored in the log. The analysis can have
several aims. Among them we can cite:
Giving a better organization to the portal
The choice of the contents more suitable to the
user or to group of users.
Usability analysis of the portal.
A deeper analysis on the uses that our system can
offer is contained in the next section.
The analyzer performs queries on the logs to
obtain the desired data and then calculates statistical
indicators, shown in the form of charts and tables.
Since the log files are in XML, the query engine has
been developed to understand XQuery (2006)
language, and has been carried out using an
implementation of the JSR 225 (2006). In the next
sub-sections we will show some useful statistics we
can obtain using the system. The module for
drawing charts has been developed using a free Java
library, named jCharts (2006).
3.2.1 Page Counts
A generic analysis is given by the simple count of
page visits. Even though such a task is easily
performed by a lot of already existing tools, page
visits count is an important statistic for our system,
since it allows us to understand on which page the
interest of the user is mostly concentrated. Starting
from this data, we could focalize our attention on a
subset of the portal pages and perform a deeper,
portlet-based, analysis on them. This statistic is
easily obtainable with our analyzer through a count
query on the page elements of our logs.
A sample of page visits chart, drawn using our
analyzer, is shown in figure 5.
3.2.2 Portlet Visibility Time
A more specific analysis can be obtained by
calculating the visibility time for each level of
portlet visibility (total, partial, invisible). Portal
layouts are usually organized in columns. A
commonly used layout organizes portlets in two
columns of the same size (50%, 50%). Another
common layout is composed of three columns (i.e.
Figure 5: Page count chart sample
Figure 4: Logging framework architecture
25%, 50%, 25% in size). The portal page can
contain some other elements, such as, a header, a
footer and an horizontal or a vertical menu or both
of them. If the number and size of portlets is such
that the portal page exceeds the size of the browser
window, only part of the page is shown, while some
other parts are hidden and can be shown through
scrolling. Thus, at any time some portlets can have
full visibility, some others partial visibility (only a
percentage of the portlet area is visible), while the
remaining are completely hidden to the user.
Every time a scrolling event occurs, our logger
records its timestamp and the position of the portlet
in the page. The availability of this data allows us to
precisely calculate the amount of time the portlets
were fully visible, partially visible or completely
This information, in the context of a portal, is
very useful, since, after knowing which page has
attracted the user more, it let us know his/her interest
in the single content units of the page. Figure 6
contains a chart showing the visibility percentage of
the portlet.
Figure 6: Portlet visibility chart sample.
When a portlet is partially visible, the chart of
figure 6 does not exactly tell us the extent of the
visible and hidden areas of the portlet. Thus, in order
to complete the visibility analysis, we considered it
opportune to show another chart summarizing the
visibility percentage of the portlet across users’ page
views. This indicator is calculated as the weighted
mean of all the visibility times, using the following
V is the total visibility indicator for the portlet, t
are, respectively, its time and percentage of
visibility in the i-th interval, T is the total time of the
page visit.
Figure 7: Portlet visibility percentage chart sample.
A sample of the chart is shown in figure 7. For the
sake of readability, the bars are shown in green for
high visibility, in yellow for low visibility and in red
for scarce visibility. The elaboration and the queries
performed on the log for obtaining the parameters in
(1) , have been reported in appendix.
3.2.3 Portlet Interactions
Some portlets can be more informative while others
can be more interactive. For example, an article of
an on-line news magazine is supposed to be
informative, while a section containing a form
should be more interactive, that is, it should receive
more user interactions than the former. Many people
use the mouse as a pointer while reading on-line
Figure 8: Portlet interactivity level chart sample.
With our tool, we can obtain an information
about portlet interaction from a bar chart. An
example is shown in figure 8. In the chart, each bar
WEBIST 2007 - International Conference on Web Information Systems and Technologies
is composed of three sections of different color.
They represent three different types of interactions:
window, mouse and keyboard events.
Another interest indicator to be considered is the
total time a portlet has the mouse pointer in it. An
eye tracking study (Chen et al., 2001) shows that
there is a significant correlation between the eye
movements and the mouse movements: tracking the
trajectory drawn by the mouse pointer could be
useful for obtaining the probable trajectory of user’s
eyes, that is, what the user is interested in.
Figure 9: Portlet mouse pointer focus.
While eye tracking cannot be performed, if not
in ad hoc equipped laboratories, mouse tracking can
be easily performed by our tool. All of the mouse
movements can be reconstructed from the log
With a great number of users, the reconstruction
of all mouse movements can be too onerous. A
similar interest indicator can be obtained by just
calculating the amount of mouse movements and the
amount of time spent by the user with the mouse
pointer inside a portlet. The number of movements
has already been described and charted in figure 8.
As for the amount of time the portlet has the mouse
pointer in it, a pie chart, showing the times of
presence of the mouse pointer inside the portlets of
the same page, is the most appropriate to this extent.
A sample is shown in figure 9.
The effectiveness of implicit interest indicators is
witnessed by several studies in literature (Claypool
et al., 2001; Shapira et al., 2006). These works
demonstrate the correlation of implicit indicators
and the actual interest of the user. Furthermore, they
produce a list of the most used interest indicators.
Some works propose the development of tools for
determining user’s interests. For example, in
(Atterer et al., 2006), a proxy server based system is
used to capture client-side interactions.
Some works are aimed at understanding the
structure of web pages in order to determine page
sections. Many algorithms have been presented to
this extent. The purposes of determining sections
include the following.
Wenyn et al. (2005) divide pages in sections to
detect similarities between two pages, in order to
prevent phishing. Chen et al. (2005) do the same
thing in order to better view web pages on small
screen devices. Blocks in the pages are detected for
identifying the informative sections of the page to
reduce storing sizes for search engines (Debnath et
al., 2005) or to eliminate redundant information for
Web mining (Taib et al., 2005). It is clear that, if
efforts have been made to divide pages in sections,
where the pages are explicitly divided in sections,
we can pursue the above discussed purposes and
more. Once determined, the indication of user
interest, calculated from the log analysis, can be
used for several purposes. The next sub-sections
analyze the possible practical uses of our system in
several Web research fields.
4.1 Integrating Web Metrics
Web metrics tell us how the users are using the web
site. E-commerce sites need to know this
information in order to improve their selling
capacity. Some of the most commonly used web
metrics are: the number of page visits, the number of
banner or link clicks, the percentage of users who
complete an action, etc.
Unfortunately, clickstream analysis metrics,
have the following limitations, as remarked by
Weischedel and Huizingh (2006):
They report activity on the server and not user
They can overestimate the actual use of web
sites due to spiders activity
They do not include the real time spent on the
page by the user.
Our system overcomes these problems, in fact, it
captures client side activity, can easily recognize
spiders from the absence of mouse movements and
records times, in such a way that it is easy to detect
inactivity time due to user absence from the screen.
4.2 Portal Design
The location of the portlets in the portal page has a
great importance, since some portlets can have more
visibility than others. An eye tracking study
(Goldberg et al., 2002), analyzing the behaviour of
users in some browsing tasks, has shown that user’s
interest is more concentrated, at least in the initial
phases of page browsing, in the portlets placed on
the top of the first column. In the same study, a
complete classification of the places which are
candidates to gain more user interest has been
performed. It is advisable that, if the portal holder
wants to emphasize the content of a portlet more
than another, he/she should put these portlets in
those places.
Our tool can help in determining the portlets
which attract user’s interest more and, on the basis
of this data, it can help portal administrator in
placing the portlets in the pages.
4.3 Personalization of the Portal
Web portal customization is often used to tailor the
services of the portal to a single user or to a group of
users. In some cases, the user has the freedom of
choosing his/her favourite portlets to place in his/her
home page. In other cases, the interest of the user
can be inferred from the logs, and the pages of the
portal constructed in order to give more visibility to
content which matches user’s interests.
In the case of groups of users, groups can be
obtained through clustering procedures. Our system
can be useful for gathering data to obtain cluster of
4.4 Portal and Portlet Usability
Due to their characteristic of being considered small
Web applications, usability can be defined for
portlets as well. Diaz et al. (2004) define portlet
usability as the capability of the portlet to be
understood, learned or used under specified
conditions. The implicit interest indicators can be
used to facilitate the task of usability evaluators.
(Atterer et al., 2006) shows many situations in which
this is true, for example using true users instead of
volunteers in the lab.
Portlet position affects the usability of the portal.
Let us suppose that a task can be performed by
interacting with more than one portlet. Their
position can affect the amount of time necessary to
perform the task. A usability study can be aimed at
finding the best location for each of these portlets.
Another study (Moraga et al., 2006) states the
possibility of performing a choice among different
portlets with similar features, choosing on the basis
of their usability. Our tool can be useful to this
extent, in order to isolate the interactions relative to
this portlet thus to evaluate its usability.
Among others, our tool captures key press
events. In the case of portlets with forms, the data
can be used to understand if the user had problems
in filling the form. This is valid for any kind of web
site and not only for portals.
In this paper we have presented a system aimed at
obtaining and analyzing the data about the behaviour
of the users of web portals. The system overcomes
the limitation of the simple page visit-based metrics,
giving more valuable information related to the
portlets, such as their visibility and interactivity and,
consequently, the interest of the user in them.
The system is composed of two components: a
framework for obtaining XML-based log files and
an application for log analyzing. Several charts,
drawn using the latter have been shown.
Referencing some recent work in literature, we
have argued that our system can be useful for
numerous purposes, such as, integrating web
metrics, optimizing portlet layout both for all users
and for personalization, and usability studies. Future
work is aimed at practically demonstrating the use of
our system in these fields.
Finally, an aspect that has been considered, but
not yet put into practice, is the availability of the log
data both to the portal and to the portlet producer. At
present some architecture and secure schemas are
taken into account, as the one proposed by Blundo
and Cimato (2004), applied for determining banner
clicks in advertising campaigns.
The system has been tested on a portal
developed with Apache Jetspeed II (2006) Portal
Atterer, R., Wnuk, M., Schmidt, A., 2006. Knowing the
User’s Every Move – User Activity Tracking for
Website Usability Evaluation and Implicit Interaction.
In Proceedings of the 15th international conference on
World Wide Web WWW '06. ACM Press
Bellas, F., 2004. Standards for Second-Generation Portals.
IEEE Internet Computing. 8(2): pp. 54-60.
Blundo, C. and Cimato, S., 2004. A Software
Infrastructure for Authenticated Web Metering. IEEE
Chen, Y., Xie, X., Ma, W. Y., Zhang, H. J., 2005.
Adapting Web pages for small-screen devices. IEEE
Internet Computing.
Claypool, M., Le, P., Wased, M., Brown, D., 2001.
Implicit interest indicators. In Proceedings of the 6th
international conference on Intelligent user interfaces.
ACM Press.
WEBIST 2007 - International Conference on Web Information Systems and Technologies
Chen, M. C., Anderson, J. R., Sohn Moore, M. H., 2001.
What can a mouse cursor tell us more?: correlation of
eye/mouse movements on web browsing. In CHI '01
extended abstracts on Human factors in comp. syst.
Debnath, S., Mitra, P., Pal, N., Giles, C.L., 2005.
Automatic identification of informative sections of
Web pages. In IEEE Transactions on Knowledge and
Data Engineering.
Diaz, O., C. Calero, M. Piattini, and A. Irastorza. Portlet
usability model. 2004. IBM Research Report.
RA221(W0411-084). ICSOC 2004.pp. 11-15.
Goldberg, J. H., Stimson M. J., Lewenstein, M., Scott, N.,
Wichansky, A. M., 2002. Eye tracking in web search
tasks: design implications. In Proceedings of the 2002
symposium on Eye tracking research & applications.
jCharts, 2006. Krysalis Community Project – jCharts.
Jetspeed 2, 2006. Apache Group. Jetspeed 2 Enterprise
Portal. http://portals.apache.org/jetspeed-2/
JSR 225, 2006. JSR 225: XQuery API for JavaTM (XQJ).
JSR 168, 2003. JSR-000168 Portlet Specification.
Moraga, M. A., Calero, C., Piattini, M., 2006. Ontology
driven definition of a usability model for second
generation portals. In Workshop proceedings of the
sixth int. conference on Web engineering, ICWE’06.
Shapira, B., Taieb-Maimon, M., Moskowitz, A., 2006.
Study of the usefulness of known and new implicit
indicators and their optimal combination for accurate
inference of users interests. In Proceedings of the 2006
ACM symposium on Applied computing SAC '06.
Taib, S.M., Yeom, S. J., Kang, B. H., 2005. Elimination of
Redundant Information for Web Data Mining. In
Proceedings of ITCC 2005, Int. Conf. on Information
Technology: Coding and Computing, Vol 1.
Weischedel, B., Huizingh, E. K. R. E., 2006. Website
Optimization with Web Metrics: A Case Study. In
Proceedings of ICEC '06, the 8th international
conference on Electronic commerce. ACM Press.
Wenyin, L., Huang, G., Xiaoyue, L., Deng, X., Min Z.,
2005. Phishing Web page detection, In Proceedings of
Eighth International Conference on Document
Analysis and Recognition.
WSRP, 2003. OASIS Web Services for Remote Portlets.
XQuery, 2006. XQuery 1.0: An XML Query Language
W3C Candidate Recommendation.
Here is, as an example, the pseudo-code procedure
used in the log analyzer for obtaining the
visibility percentage chart, shown in figure 7. Each bar in
the chart represents the percentage of visibility of a portlet
across all page views.
Those values are calculated
using (1). The procedure assume, simplistically, that
all the analyzed pages contain the same portlets.
Their number is passed as a parameter to the
procedure (line 1), and is used to associate the
correct event timestamp (lines 15-17) to portlet data
(name and coordinates).
Every time a user scrolls the page, the percentage
of visibility of a portlet changes, since part or all of
its area can fall inside/outside the browser’s client
area. On the initial page load event, and on every
scroll event, our logger registers the coordinates of
each portlet (through the portlet element of the
information model, see figure 3). Through our
sample code, for each event element and for each
portlet element, portlet names, coordinates and event
timestamps are obtained by querying the log and
storing the results in the
coordinates and timestamps vectors,
rispectively (lines 3-6).
Once obtained event timestamps and portlet
coordinates, the numerator in (1) is calculated
through the iteration of lines 11-22. The partial sum
is kept by the sum associative array (line 22), whose
keys are portlet names. The calculation of the
visibility percentage in the i-th time interval v
delegated, as shown in line 13, to the
calculateVisibilityPercentage sub-routine.
The time intervals t
can be easily calculated by
subtracting the (i+1)-th and the i-th timestamps (line
18). Those time intervals are summed in line 19 to
obtain the total time T.
The final results are put in the visibility
associative array, as shown in line 25. Those results
are obtained by dividing the partial sums by T.
At last, visibility is passed to the
sub-routine, which is responsible for drawing the bar
chart (line 27).