FINDING AND REFINDING WEB PAGES IN CONTEXT

A Tree-based Model of Web History

David Briffa and Chris Staff

Department of Intelligent Computer Systems, University of Malta, Tal-Qroqq, Msida MSD 2080, Malta

Keywords: Web history navigation, Revisiting web pages in context, Automatic query generation, Global

reconnaissance, Tree-views, User modelling.

Abstract: A modern challenge for the World Wide Web (Web) is not of just finding information without getting ‘lost’

in hyperspace, but also re-finding it efficiently. Web Nav is an integrated navigation system that combines

both history and page recommendations into one context based tool. Web Nav’s framework signifies a

paradigm shift in the viewing of history from a linear stack-based system to a hierarchal tree-based system.

Web Nav was evaluated qualitatively and quantitatively, analysing 13 users’ activity over a seven day

period. The results are mixed but there is sufficient evidence to suggest tree-based views of history can be

beneficial: to allow users to revisit web pages in context; to show user sessions as trees; and to

automatically generate queries based on session contexts to recommend web pages.

1 INTRODUCTION

In this paper, we tackle two major Web Navigation

problems: finding and re-finding information.

Modern search engines generally do not take the

session context into account (i.e., a group of web

pages related to some task being performed by the

user). Two different users searching for the term

‘jaguar’ will be presented with the same results set,

even though in the session contexts, one user has

been visiting web pages related to jaguar the animal

and the other has been visiting pages related to

Jaguar the car. We also tackle the problem of re-

finding information. Between 58% and 81% of page

visits are page re-visits (c.f. section 2.3), so

mechanisms for organizing and easily accessing

already found information are important.

In our opinion, representations of history should

preserve the contextual structure in which web pages

are visited. Context can also help to automatically

construct queries to find more relevant information.

Web Nav (Briffa, 2010) incorporates context and

global reconnaissance into a tree-based model of

history.

2 FINDING INFORMATION

Finding information on the Web can be seen as a

combination of searching and browsing (Herder,

2006). Searching involves submitting a query to a

search engine, and browsing involves navigating

between pages using hyperlinks (Herder, 2004). One

of the main problems on the Web is the ‘Lost in

Hyperspace’ problem, where the user starts

browsing, and finds herself ‘lost’ in terms of where

she has been and where she intends to go.

Adaptation tools can give a sense of direction such

as through page recommendations or direct guidance

(Brusilovsky, 2001).

In a Web Browser, searching is catered for

through search engines and embedded widgets that

quickly return results for a query. Search can be

improved by building user models from visited

pages. PowerScout (Lieberman, Fry, & Weitzman,

2001) builds a user model from the pages visited and

constructs a query that is submitted to a third party

search engine, returning page recommendations.

A Web user must constantly choose between

browsing links or initiating a search for pages. This

is directly related to Information Foraging Theory

(Pirolli & Card, 1999). According to this theory,

users try to maximize the ratio of energy spent and

information gain. An automated approach may use

both neighbouring pages and a search engine to

determine which strategy will be more successful.

FollowMyLink (Briffa, 2009) integrates search

and browsing by turning user selected text on a Web

page into hyperlinks on-the-fly. The user is taken to

the top ranking page in the results set following a

426

Briffa D. and Staff C..

FINDING AND REFINDING WEB PAGES IN CONTEXT - A Tree-based Model of Web History.

DOI: 10.5220/0003401004260429

In Proceedings of the 7th International Conference on Web Information Systems and Technologies (WEBIST-2011), pages 426-429

ISBN: 978-989-8425-51-5

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

query generated from contextual information.

FollowMyLink maintains separate user models as a

user browses, using page relations and browsing

behaviour to decide which user model to update.

When a user selects text on a Web page and invokes

FollowMyLink, a query is automatically generated

from the user model which is updated after each link

traversal. Y!Q (Kraft, Maghoul, & Chang, 2005)

uses only the context supplied by text surrounding a

user selection. Y!Q may take the user to a page the

user has visited recently, whereas FollowMyLink

takes the user to a previously unseen relevant page.

2.1 Re-finding Information

Apart from searching and browsing, a third element

to navigation is backtracking. This involves going to

a previously visited page. A browser supports a

number of tools for both short term and long term

backtracking, such as the Back/Forward buttons,

history lists, bookmarks, and the History window.

Backtracking in Web navigation is important,

considering the number of page visits on the Web

that are actually re-visits. (Tauscher & Greenberg,

1996) give a figure of 58% in 1996, while

(Cockburn & Mckenzie, 2000) calculated a value of

81% in 2000. Herder estimates that 74% of page

visits are revisits (Herder, 2006). Reasons for

revisiting pages include: checking if information has

changed; authoring a page; exploring it further; or

the page is on a path to another revisited page

(Tauscher & Greenberg, 1996).

Of particular interest are the latter two cases.

First, the need to explore pages further may result in

a page being bookmarked. Although bookmarking is

efficient with regards to space and organization, any

context gathered from previous pages is lost from

one session to another, and hence when returning

there is no context or user model available for the

page in question. Second, the mention of a path

raises questions on whether bookmarking one page

is enough, or whether pages the user followed on the

way to the bookmarked page should also be

persisted. Empirical evidence shows that the path is

not only important, but should also be a factor in

history mechanisms (Teevan, 2004), with

‘waypoints’ such as page titles and descriptions to be

considered as supplementary metadata to the path

(Capra & Perez-Quinones, 2003). Tauscher and

Greenberg provide a comprehensive analysis of the

types of history mechanisms available, and focus

particularly on the stack based history list as applied

to the Back and Forward buttons. A drawback of this

approach is that the resulting history list does not

contain all visited pages, due to how all pages above

the stack pointer are removed during backtracking

(Cockburn & Jones, 2000). Cockburn and Jones also

suggest that users themselves have a skewed

understanding of how the history list works.

Tauscher and Greenberg refer to context sensitive

subspace history lists, which suggest the grouping of

pages in history into a subspace based on context.

Other systems may give a view of history in the

form of maps or views. WebView (Cockburn, A.,

Greenberg, S., McKenzie, B., Jasonsmith, M., &

Kaasten, S, 1999) and WebNet (Cockburn & Jones,

2000) generate overviews of the users browsing

path. As the views are not persisted to memory, they

are not useful for long-term backtracking. CZWeb

(Fisher, B., Agelidis, M., Dill, J., Tan, P., Collaud,

G., & Jones, C., 1997) uses a fish-eye view.

Figure 1: Tree Structure for a hypothetical path.

3 WEB NAV

We built a Mozilla Firefox Extension for the

persistence and use of context-based paths. As

Firefox uses a stack-based paradigm for its tab

history, we designed a separate framework so Web

Nav can store its own copy of navigation history as a

tree-based structure in an SQLite database. The

overall design rationale is to use this framework as a

basis for providing both context-based history

information and context-based page

recommendations so that users are never ‘lost’ or

unable to backtrack to a specific page.

3.1 Web Nav History

Web Nav tracks visited pages in a Context Model

through tab-based storage using the Mozilla Session

Store API. We identify the corresponding node of a

page by checking the Session Store. Paths are built

by making connections between nodes to reflect the

FINDING AND REFINDING WEB PAGES IN CONTEXT - A Tree-based Model of Web History

427

type of traversal performed. Figure 1 shows a typical

tree structure, where node A is the current page.

The tree shows the page where a link was clicked

to visit page A and other branches in the path. Paths

are acyclic. An instance of page B is added as a

child to page A, even though it already exists as the

root. A page may be seen in multiple contexts, and

in multiple instances within the same context.

We distinguish between two movement forms:

browsing - where a new node is created, and moving

- where we simply move to a different node in the

tree. This can be equated to backtracking in the

conventional sense. Each action resulting in a new

page-load saves the appropriate information in

Session Store so that the subsequent page-load event

handler may process the new page correctly and

create the necessary links in the database. If the

movement is a backtrack, then the nodeID of the

target node to move to is stored in Session Store.

Otherwise the nodeID of the parent node is stored so

that it may be used to create the appropriate

connection to any newly created node.

3.2 Interests and Recommendations

Web Nav saves page interests and uses them to

provide recommendations. Each ‘node’ saved in a

context can have associated NodeInterests that are

compiled through an indexing procedure. As in

FollowMyLink (Briffa, 2009), we collect the

relevant text from the corresponding page, and after

stemming and a modified TF.IDF calculation we

create a set of weighted keywords representing the

most relevant terms in the page. Each set of

NodeInterests is also used to update a set of

ModelInterests for the respective model. We save

NodeInterests and ModelInterests for the entire

model because we use two recommendation

algorithms. ModelInterests are used for the

algorithm that creates a query based on terms from

the entire context (tree) (Briffa, 2009). NodeInterests

are used in the algorithm that generates a query

based on the current branch, effectively creating an

on-the-fly merge of the nodes in the current path.

We use this algorithm to provide recommendations

that are localized to the path, rather than generalized

to the entire tree.

3.3 Adaptations

To support visual adaptations resulting from the

approach described in sections 3.1 and 3.2, we

implemented both a tab-based interface and a global

interface. For tab-based history and

recommendations we implemented the Web Nav

Popup view (figure 2).

Figure 2: Web Nav Popup View showing options at the far

left, history information on the left and page

recommendations on the right.

The view is split into two sections. The left side

shows the history information for the tab graphically

as either a tree or a linearly ordered view. In the tree

view, the parent and children of the current page are

shown. The user can navigate through the tree to see

the entire history for the tab’s context. In the linear

view a user may choose to show all the nodes in a

context sorted by some criteria, such as recency or

frequency.

Figure 3: Web Nav Manager showing a list of contexts at

the top left, with a preview below. A map of the selected

context is shown to the right, where nodes are coloured

based on frequency of visits.

A user can return to a previously visited context

using the Web Nav Manager (figure 3). The user can

see all the persisted contexts, as well as a map view

of all the paths saved. Users can jump back into a

session, with all context information saved.

4 EVALUATION

We conducted a preliminary evaluation for the gene-

WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies

428

ral usefulness of Web Nav and the appeal of the

paradigm involved in two stages. 14 volunteers used

the system in their own home for approximately 7

days, and submitted a qualitative questionnaire about

their experience with Web Nav.

The first stage of evaluation used empirical data

from action logs and database entities to determine

which users exhibited low browsing and low

backtracking behaviour. One user’s log file was

corrupt and another four users were removed as they

did not use the browser enough to yield any

meaningful data. The second stage concerned a

deeper analysis into the empirical data. It showed

that the ‘Up’ and ‘Down’ buttons were used less

than the regular ‘Back’ and ‘Forward’ buttons.

Qualitative data suggests that the reason might be

habitual. Surprisingly, although the usage data was

poor, qualitative preference showed that 38.5% of all

13 volunteers (excluding the one with a corrupt log

file) preferred Up/Down. The Web Nav popup view

was used regularly throughout the evaluation period,

and the Web Nav bookmarking feature was used

occasionally. However, only a few users actually

used recommendations, and even less re-visited

recommended pages, possibly due to a construction

bug found after evaluation had commenced,

especially since 84.6% of users indicated interest in

having recommendations provided and 69.2% of

users said the recommendations generated were

‘somewhat relevant’. Qualitative data suggested that

both recommendation methods were equally

preferred. The Web Nav Manager, while not used as

often as the Web Nav popup view, still showed

promise, especially since in many cases the opening

of the Web Nav Manager resulted in the

backtracking to a node in a previous context session.

The overall experience of users seems to have

been positive, with 100% of users agreeing that the

concept of paths shown as trees is useful/interesting.

Moreover, 69.3% of users expressed interest in

possibly using Web Nav in the future.

5 CONCLUSIONS

We have shown an approach to a tree-based

navigation system that has yielded fairly promising

results in light of the overwhelming shift required to

change to a new paradigm. Its main contribution lies

in its underlying framework for the persistence of

tree-based contexts.

REFERENCES

Briffa, D., 2009. FollowMyLink: An Implementation of

User Directed Browsing. (Unpublished report), Dept.

of Intelligent Computer Systems, University of Malta.

Briffa, D., 2010, Web Nav: A Context Based Navigation

Assistant for the Web. (Unpublished Report), Dept. of

Intelligent Computer Systems, University of Malta.

Brusilovsky , P., 2001. Adaptive Hypermedia. User

Modeling and User-Adapted Interaction, 11, 87-110,

Kluwer Academic Publishers.

Capra, R. G., and Perez-Quinones, M. A., 2003. Re-

Finding Found Things: An Exploratory Study of How

Users Re-Find Information. Technical Report,

Virginia Tech.

Cockburn, and Jones., 2000. Which way now? Analysing

and easing inadequacies in WWW navigation.

International Journal of Human-Computer Studies,

45, 105-129.

Cockburn, A., and Mckenzie, B., 2000. What Do Web

Users Do? An Empirical Analysis of Web Use.

International Journal of Human-Computer Studies,

903-922.

Cockburn, A., Greenberg, S., McKenzie, B., Jasonsmith,

M., and Kaasten, S., 1999. WebView: A Graphical

Aid for Revisiting Web Pages. Proceedings of

OZCHI'99 Australian Conference on Human

Computer Interaction.

Fisher, B., Agelidis, M., Dill, J., Tan, P., Collaud, G., and

Jones, C., 1997. CZWeb: Fish-eye views for

visualizing the World Wide Web. M. J. Smith, G.

Salvendy & R. J. Koubek Design of Computing

Systems: Social and Ergonomic Considerations, 2,

719--722.

Herder, E., 2006. Forward, Back and Home Again -

Analyzing User Behavior on the Web. (Doctoral

dissertation), University of Twente. Amsterdam: F&N

Boekservice.

Herder, E., 2004. Sniffing Around For Providing

Navigation Assistance. Proc. of Workshop on

Adaptivity and User Modeling in Interactive Systems.

Berlin.

Kraft, R., Maghoul, F., and Chang, C. C., 2005. Y!Q:

contextual search at the point of inspiration.

Proceedings of the 14th ACM international conference

on Information and knowledge management (CIKM

'05). ACM, New York, NY, USA, 816-823.

Lieberman, H., Fry, C., and Weitzman, L., 2001.

Exploring the Web with Reconnaissance Agents.

Communications of the ACM, 44(8), 69-75.

Pirolli, P., and Card, K. S., 1999. Information Foraging.

Psychological Review, 106, 643-675.

Tauscher, L., and Greenberg, S., 1996. Design Guidelines

for Effective WWW History Mechanisms. Workshop

on Designing for the Web: Empirical Studies.

Microsoft Corporation, Redmond, WA.

Teevan, J., 2004. How people re-find Information when

the Web changes. Massachusetts Institute of

Technology Computer Science and Artificial

Intelligence Laboratory.

FINDING AND REFINDING WEB PAGES IN CONTEXT - A Tree-based Model of Web History

429