Approximately 14% of New Zealand’s total
resident population are Māori and about 1 in 4 of
these are able to converse in Māori (Te Puni Kōkiri
2003).
When we consider that 65% of Māori have
never accessed the Internet, the potential users of a
Māori language interface is perhaps 1-2% (40,000-
80,000) of New Zealand’s population. In contrast,
potential English-speaking New Zealand users of the
website are approximately 51% (2,040,000) (Te
Puni Kōkiri 2001). By setting the default language
of the Niupepa Collection to Māori we are clearly
going against the preferences of the majority of
potential users. Research undertaken by Jones (Jones
et al 2000) suggests that users of a digital library
system rarely amend the default settings for options
with regard to query types and result displays.
Through the use of log file analysis this research
seeks to find answers to what differences will occur
in user behaviour if the default language of an
interface is alternated between Māori and English.
2 LIMITATIONS OF LOG FILE
ANALYSIS
As a method of gathering data on user actions log
file analysis has its shortcomings– the primary one
being the effect of web caches. Web caches sit
between web servers and clients and will serve
repeated requests to the client without having to
bother the original server. This saves time and
reduces network traffic. However the issue is that
the original server does not receive the request and
consequently it is not be recorded in its web log file.
There are two main types of web caches: a
browser cache which is handled by a user’s browser
software and a proxy cache which is configured
within a network. Both types of caches will prevent
repeated requests from a single user appearing in
web log files. However a network cache will also
mask repeated requests from different users within
its network.
There are other limitations of log file analysis
including false hits due to web robot activity, false
hits due to server upgrades and maintenance; and the
inability to accurately delimit user sessions.
3 GATHERING THE DATA
The NZDL website (www.nzdl.org) makes available
over 40 different collections in various formats. All
user activity is logged. Every request or ‘hit’ is
recorded along with information such as the page
requested, the language used in the interface, the
time of the request, the type of request, the previous
action, the IP address of the requestor and the
various preferences that are set.
The NZDL site is mirrored with a site located at
the University of Lethbridge in Alberta, Canada. The
New Zealand site, located at the University of
Waikato, serves the collection to Web requests from
within New Zealand. The Lethbridge mirror site is
responsible for serving the collection to Web
requests from outside of New Zealand. The data
collected in this analysis is from the University of
Waikato site only, and thus only reflects usage
within New Zealand.
We chose to analyse a four week period running
from 8.40am Monday 5 July 2004 to 8:40am
Monday 2 August 2004. In the first week we
changed the Niupepa default language setting to
English. The second week we changed it back to
Māori. The third week it was in English and the
fourth week we changed it back to Māori again. The
raw NZDL log file was collected for this time period
and the hits relating to the Niupepa collection were
extracted.
The raw Niupepa data was then further filtered to
remove hits of unwanted origin. These included
incorrect language argument (27), undefined IP
address of requestor (213), web crawler and web
robot hits (33) and hits from the local research team
(18). This left a total of 14,416 hits, of which 7724
were in the weeks that the default language was set
to English and 6692 were in the weeks when the
default language was set to Māori (Graph 1).
Clearly there are significantly more hits (15.4%
more) when the default language of the website is
set to the more commonly spoken English language.
It is also clear to see, and perhaps unsurprising to
note, that the number of hits in English increases
when the default language is set to English and the
number of hits in Māori increases when the default
language is set to Māori.
4 DEFINING THE SESSIONS
To further analyse what the users of the web site
were doing, the hits recorded in the log files were
grouped into sessions. Cookies were used to define
these sessions. When users connect to the website a
cookie is created on their machines which holds
information that includes the IP address of the
machine connecting and the time that the cookie was
created. This information is recorded with each hit
as the z argument of a hit and so a sessions is simply
a group of hits with the same z argument within a
given time frame. Web analysis software usually
defines a session as a series of hits such that the time
WEBIST 2005 - WEB INTERFACES AND APPLICATIONS
264