3 WWW AS A FOREST OF
TREES
The tree model that we advocate for the Web
depends on the stability of the Internet routing
methods. If the routes on the Internet are stable, the
routes used by clients to access the Web server will
form a shortest path tree (or routing tree) rooted at
the server. Existing studies in (V. Paxon, 1997) have
pointed out that in practice most routes in the
Internet are stable. It has been found that 80% of
routes change at a frequency lower than once a day.
(Krishnan et al, 2000) have traced the routes from
Bell Lab’s Web server to 13,533 destinations. They
have found that almost 93% of the routes are stable
during their experiments. Therefore the stability of
Internet routing is a realistic assumption that can
reduce the Web arbitrary topology to a forest of
trees.
Given the stability of Internet routing (V. Paxon,
1997) , an object requested by a client
c
and located
at server
s travels through a path
s →
1
r →
2
r …→
n
r → c , called a preference path
and is denoted by
),( cs
. The preference path
consists of a sequence of nodes with the
corresponding routers. Routes from
s
to the various
clients form a routing tree along which requests are
propagated. Consequently, for each server
s , a tree
s
T rooted at s could be constructed to depict the
routing tree, and the entire Web could be represented
as a collection of such routing trees, each routed at a
given Web server. Formally, a routing tree is the
union of the preference paths.
Each server
s knows the preference path from
itself to any client
c
. This information can be
extracted and periodically refreshed from the routing
database kept by the routers (Anne Benoit et al,
2006). The routing information allows the
comparison of network distances (e.g. number of
hops) among servers within a given platform. A user
issues one request at a time for a Web page, which is
fetched to the user as a single unit.
In the tree
s
T
, when a client c sends a request to
access a server s, the request is always sent to the
root along the preference path. If the server is
replicated, the request meets a replica on its way and
the requested object is available, it is served by the
replica. Otherwise it has to travel all the way to the
root where it is serviced by s. Note that if there is a
replica closer to the client
c but not enroot between
c and s, it is ignored.
According to this tree model, the network
topology is thus represented by a graph
G = (V, E)
where
m
= V is the number of nodes and E is the
set of edges and represent physical links connecting
these nodes. Nodes are routers, Web servers or a
combination of both (servers provide the
information a client is looking for). Routers are
connected via wide-area links to form the
communication network.
Some routers, called
gateways, provide connections to the outside
Internet. These are the gateways through which all
requests enter the system.
4 BENEFITS OF TREE
MODELLING
Replication is a technique of storing copies of shared
objects on servers where they are frequently
accessed. It is used to address the scalability
problem of popular sites (Anne Benoit et al, 2006).
Replication improves efficiency by allowing
operations to use local replicas instead of remote
ones (Anne Benoit et al, 2006, F. Tenzakhti et al,
2003).
The replica placement problem deals with many
issues. It tries to find how many replicas are needed
in the replicated system, where to place these
replicas, how to route requests to the appropriate
replica etc… (F. Tenzakhti et al, 2003, B.Li et al.,
1997). In this study, we are mainly interested in
where to place a given number of replicas. Along the
study, we propose to use the word proxy to mean a
replica of the whole site. The proxies discussed in
this paper are transparent proxies. They are located
along the routes from clients to a Web server and are
transparent to the clients. A proper placement of
proxies would lead most client requests to be served
at proxies, without letting them travel further to the
server. Since the access patterns of clients and the
sizes of the trees are different, the allocation and
placement of proxies have significant impact on the
overall system performance. To formally define our
problem, we introduce the following notations. Let
),( vud be the distance between any two vertices u
and
v in the tree graph,
s
T ),( vud is equal to the
length of the shortest path
),( vu
.
∈
=
),(),(
),(),(
vuyx
yxdvud
π
(1)
Let
),( svp
be the first proxy met while traveling
MODELING THE WEB AS A FOREST OF TREES
217