as well as parallel links and self-loops. Each
category must have links to all its children, but can
also have links to other categories in the Web
directory which are semantically similar, or
otherwise analogous to the category (cross-links,
related links).
We will formally designate with C the set of all
categories in a Web directory; R will be the set of all
Web resources in a Web directory. One category
with unique identification number n is denoted c
n
.
Category has its own characteristic URL url and
member level l, where l is a natural number smaller
than or equal to the depth of a Web directory L
(Figure 1). The category c
n
must be a member of C.
C
n
is a subset of C that belongs to the category c
n
,
and R
n
the subset of R with Web resources that
belong to category c
n
. We formally describe
categories and structure of Web directories in
Uschold, 2003.
Figure 1: Schematic representation of a single category.
Mathematically speaking, Web directories are
simple rooted graphs (Sedgewick, 2001). Sometimes
the position of links within a category’s Web page is
prioritized, and in that case we are talking about
ordered and rooted simple graphs. The structure of a
Web category cannot be described as a tree because
more than one path can connect any of its two
categories: apart from paths which connect
parent/child categories, they can be associated with
ad hoc cross-links as in Figure 2.
Figure 2: Realistic Web directory with possible multiple
paths between two categories.
Although the categorization of a Web directory
should be defined by a standard and unchanging
policy this is frequently not the case. Web
directories often allow site owners to directly submit
their site for inclusion, even suggest an appropriate
category for the site, and have editors review
submissions. The editors must approve the
submission and decide in which category to put the
link in. However, rules that influence the editors’
decision are not completely objective and are thus
difficult to implement unambiguously. Sometimes a
site will fall in two or even more categories, or
require a new category. Defining a new category is
very sensitive task because it has to adequately
represent a number of sites, avoid interfering with
domains of other categories, and at the same time
the width and depth of the entire directory’s
structure has to be balanced. A Web directory with
elaborate structure at one end and sparse and
shallow at the other is confusing for users and
difficult to find quality information in. Furthermore,
after several sites have been added to a directory it
may become apparent that an entirely new
categorization could better represent the directory’s
content. In this case a part of directory’s structure or
even all of its levels have to be rearranged which is
again time and labor consuming task.
Therefore, recognizing the challenges implied by
the Web directory construction, and as well as their
overall importance, the paper’s authors are
motivated to design and develop a decision support
system – a computer-based intelligent agent – that
can support decision-making in this construction
process.
3 CONSTRUCTION SCENARIOS
The process of building Web directories has three
actors:
1. Web directory system (WDS)
2. Web directory administrator (WDA)
3. Administrator of a Web site listed in the Web
directory (WSA)
Ontology-based building process contains the
same three actors and represents a subset of the
general building process. This process includes three
main tasks, or actions, that have to be performed by
actors in order to construct a Web directory:
1. Semantics identification task (SIT)
2. Semantics assignment task (SAT)
3. Web directory addition task (WDAT)
ICSOFT 2009 - 4th International Conference on Software and Data Technologies
282