
 
 
3  WWW AS A FOREST OF 
TREES 
The tree model that we advocate for the Web 
depends on the stability of the Internet routing 
methods. If the routes on the Internet are stable, the 
routes used by clients to access the Web server will 
form a shortest path tree (or routing tree) rooted at 
the server. Existing studies in (V. Paxon, 1997) have 
pointed out that in practice most routes in the 
Internet are stable. It has been found that 80% of 
routes change at a frequency lower than once a day. 
(Krishnan et al, 2000) have traced the routes from 
Bell Lab’s Web server to 13,533 destinations. They 
have found that almost 93% of the routes are stable 
during their experiments. Therefore the stability of 
Internet routing is a realistic assumption that can 
reduce the Web arbitrary topology to a forest of 
trees. 
Given the stability of Internet routing (V. Paxon, 
1997) , an object requested by a client 
c
 and located 
at server 
s  travels through a path 
s →
1
r →
2
r …→
n
r → c , called a preference path 
and is denoted by 
),( cs
. The preference path 
consists of a sequence of nodes with the 
corresponding routers. Routes from 
s
 to the various 
clients form a routing tree along which requests are 
propagated. Consequently, for each server
s , a tree 
s
T  rooted at  s  could be constructed to depict the 
routing tree, and the entire Web could be represented 
as a collection of such routing trees, each routed at a 
given Web server. Formally, a routing tree is the 
union of the preference paths. 
Each server 
s  knows the preference path from 
itself to any client 
c
. This information can be 
extracted and periodically refreshed from the routing 
database kept by the routers (Anne Benoit et al, 
2006). The routing information allows the 
comparison of network distances (e.g. number of 
hops) among servers within a given platform. A user 
issues one request at a time for a Web page, which is 
fetched to the user as a single unit. 
In the tree
s
T
, when a client  c sends a request to 
access a server s, the request is always sent to the 
root along the preference path. If the server is 
replicated, the request meets a replica on its way and 
the requested object is available, it is served by the 
replica. Otherwise it has to travel all the way to the 
root where it is serviced by s. Note that if there is a 
replica closer to the client 
c  but not enroot between 
c and s, it is ignored. 
According to this tree model, the network 
topology is thus represented by a graph
 G = (V, E) 
where
 
m
 =  V  is the number of nodes and E is the 
set of edges and represent physical links connecting 
these nodes. Nodes are routers, Web servers or a 
combination of both (servers provide the 
information a client is looking for). Routers are 
connected via wide-area links to form the 
communication network.
  Some routers, called 
gateways, provide connections to the outside 
Internet. These are the gateways through which all 
requests enter the system. 
4 BENEFITS OF TREE 
MODELLING 
Replication is a technique of storing copies of shared 
objects on servers where they are frequently 
accessed. It is used to address the scalability 
problem of popular sites (Anne Benoit et al, 2006). 
Replication improves efficiency by allowing 
operations to use local replicas instead of remote 
ones (Anne Benoit et al, 2006, F. Tenzakhti et al, 
2003). 
The replica placement problem deals with many 
issues. It tries to find how many replicas are needed 
in the replicated system, where to place these 
replicas, how to route requests to the appropriate 
replica etc… (F. Tenzakhti et al, 2003,  B.Li et al., 
1997). In this study, we are mainly interested in 
where to place a given number of replicas. Along the 
study, we propose to use the word proxy to mean a 
replica of the whole site. The proxies discussed in 
this paper are transparent proxies. They are located 
along the routes from clients to a Web server and are 
transparent to the clients. A proper placement of 
proxies would lead most client requests to be served 
at proxies, without letting them travel further to the 
server. Since the access patterns of clients and the 
sizes of the trees are different, the allocation and 
placement of proxies have significant impact on the 
overall system performance. To formally define our 
problem, we introduce the following notations. Let 
),( vud  be the distance between any two vertices  u  
and 
v  in the tree graph, 
s
T   ),( vud  is equal to the 
length of the shortest path 
),( vu
. 
∈
=
),(),(
),(),(
vuyx
yxdvud
π
  (1) 
Let 
),( svp
  be the first proxy met while traveling 
MODELING THE WEB AS A FOREST OF TREES
217