As a matter of fact, the computational cost is the
main obstacle to processing data in real time. Hence,
in real learning situations, this processing tends to be
done offline so as to avoid harming the performance
of the logging application, but as it takes place after
the completion of the learning activity has less
impact on it (Caballé et al., 2005).
Based on the Grid vision (Foster and Kesselman,
1998), a preliminary study was conducted (Xhafa et
al., 2004). This study showed that a parallel
approach based on the Master-Worker (MW)
paradigm might increase the efficiency of processing
a large amount of information from user activity log
files (for more information about MW, please follow
the link: http://www.cs.wisc.edu/condor/mw).
In this paper we show, first, the main challenges
to be faced in modelling students’ behaviour in
Web-based learning environments, and, then, how a
Grid-based approach can deal with them. In order to
show the feasibility of our approach, we use the log
data from the internal campus of the Open
University of Catalonia (the UOC is found at:
http://www.uoc.edu), though our approach is generic
and can be applied for reducing the processing time
of log data from web-based applications in general.
Our ultimate objective is to make it possible to
continuously monitor and adapt the learning process
and objects to the actual students’ learning needs as
well as to validate the campus’ usability by
analyzing and evaluating its actual usage.
2 MODELING STUDENTS’
BEHAVIOR IN WEB-BASED
DISTANCE LEARNING
Our real web-based learning context is the Open
University of Catalonia (UOC), which offers
distance education through the Internet in different
languages. As of this writing, about 40,000 students,
lecturers, and tutors from everywhere participate in
some of the 23 official degrees and other PhD and
post-graduate programs, resulting in more than 600
official courses. The campus is completely
virtualized. It is made up of individual and
community areas (e.g. personal electronic mailbox,
virtual classrooms, digital library, on-line bars,
virtual administration offices, etc.), through which
users are continuously browsing in order to fully
satisfy their learning, teaching, administrative and
social needs.
From our experience at the UOC, the description
and prediction of our students’ behaviour and
navigation patterns when interacting with the
campus is a first issue. Indeed, a well-designed
system’s usability is a key point to stimulate and
satisfy the students’ learning experience. In addition,
the monitoring and evaluation of real, long-term,
complex, problem-solving situations is a must in our
context. The aim is both to adapt the learning
process and objects to the actual students’ learning
needs as well as to validate the campus’ usability by
monitoring and evaluating its actual usage.
In order to achieve these goals, the analysis of
the campus activity and specifically the users' traces
captured while browsing the campus is essential in
this context. The collection of this information in log
files and the later analysis and interpretations of this
information provide the means to model the actual
user's behaviour and activity patterns.
2.1 The Collection of Information from
On-line Learning Activity
The on-line web-based campus of the UOC is made
up of individual and community virtual areas such as
mailbox, agenda, classrooms, library, secretary's
office, and so on. Students and other users (lecturers,
tutors, administrative staff, etc.) continuously
browse these areas where they request for services to
satisfy their particular needs and interests. For
instance, students make strong use of email service
so as to communicate with other students and
lecturers as part of their learning process.
All users' requests are chiefly processed by a
collection of Apache web servers as well as database
servers (Apache is found at:
http://httpd.apache.org)
and other secondary applications, all of which
provide service to the whole community and thus
satisfy a great deal of users’ requests. For load
balance purposes, all HTTP traffic is smartly
distributed among the different Apache web servers
available. Each web server stores in a log file all
users’ requests received in this specific server as
well as the information generated from processing
the requests. Once a day (namely, at 01:00 a.m.), all
web servers in a daily rotation merge their logs
producing a single very large log file containing the
whole user interaction with the campus performed in
the last 24 hours.
A typical daily log file size may be up to 10 GB.
This great amount of information is first pre-
processed using filtering techniques in order to
remove a lot of futile, non relevant information (e.g.
information coming from automatic control
processes, the uploading of graphical and format
elements, etc.). However, after this pre-processing,
SUPPORTING EFFECTIVE AND USEFUL WEB-BASED DISTANCE LEARNING
537