match the same template are clustered and trails are
left in the system to make these clusters traceable.
In (Tolksdorf and Augustin, 2009) and (Koske,
2009) the ant colony algorithms of SwarmLinda were
adopted to realize a distributed storage for RDF triples
in which similar triples are clustered. RDF (Resource
Description Language)
4
is a language to represent in-
formation about resources on the Web in a machine
readable way. An RDF triple is used to describe a
resource that is identified by a URI. An RDF triple
always has the form (S,P,O), where S is the subject
and P and O the predicate and object of the triple. For
example, with the triple( http://birds.org/description/
onto.rdf#sparrow, http://animals.org/livesOn, http://
plants.org/grains) we can state that sparrows live on
grains.
To overcome the scalability issues for storing mil-
lions of triples, P2P-algorithms where adopted to fit
the needs of RDF-storage. While Edutella (Nejdl
et al., 2001) bases on a Gnutella like approach, RDF-
Peers (Cai and Frank, 2004) uses a distributed hash
table (DHT) to compute the storage locations of RDF-
triples. Edutella offers a simple and cost effective way
to create the P2P-network for an arbitrary large num-
ber of peers, query processing suffers from a large
number of peers since queries are processed by flood-
ing the network creating a large overhead of traf-
fic. On the other side the DHT approach of RDF-
Peers guaranties the routing of queries in O(logn)
messages, but DHTs are costly to maintain. In our
approch we try to combine the benefits of both ap-
proaches without their drawbacks. The network is
created as simple as possible like Edutella and the
routing with pheromones makes it possible to answer
a query as fast as RDFPeers by using the optimal path
to the location of the triple. Since the dimension d
of power law graphs is usually limited to O(logn),
a query can be processed with only O(log n) mes-
sages. In this work we present an implementation of
a triple storage using optimised versions of the algo-
rithms from (Tolksdorf and Augustin, 2009) and pro-
vide an evaluation of this system in respect of scal-
ability comparing the syntactical similarity measure
from (Tolksdorf and Augustin, 2009) and a fingerprint
based similarity measure.
3 ALGORITHMS
To store and retrieve RDF triples from the system the
write and read procedures are used which are per-
formed by w- and r-ants. Ants communicate indi-
4
http://www.w3.org/RDF/
rectly by depositing information on the nodes. These
so-called pheromones evaporate over time, making
the system adaptive to changes in network topology
and stored content. All ants behave autonomously
and probabilistically and use only local information
to compute their decisions. Each node holds a routing
table, maintained by the passing ants and which they
use for returning to the node of request.
3.1 Triple Writing
In the system, three copies for each inserted triple ex-
its, which each is clustered in regard to the triple’s
subject, predicate and object. As a result each node
holds three types of clusters, a subject, predicate and
object cluster which contain the triples that were clus-
tered regarding their subjects, predicates and objects,
respectively. Thus for a triple to be stored three w-ants
are generated which roam the network to find a suit-
able cluster for one of the triple’s resources respec-
tively. In the following we will refer to this resource
as cluster resource. The behavior of a single w-ant
is as follows. The w-ant starts living at the node of
the request and carries one triple copy and a resource
pointer r ∈ {S, P,O} which indicates the triple’s clus-
ter resource. In order to decide if the triple should be
dropped on the current node the ant measures the sim-
ilarity of its cluster resource r to the cluster resources
r
1
... r
n
in the appropriate resource cluster. The nor-
malized sum of the resulting similarity values
Sim =
1
n
n
∑
i=1
sim(r,r
i
) (1)
is used for computing the drop probability.
P
drop
=
Sim
Sim + c
drop
!
2
(2)
In equation 2 Sim is an exchangeable similarity mea-
sure to determine the similarity between two re-
sources and c
drop
a constant value which modifies the
likelihood of dropping the triple on the current node.
Based on P
drop
the w-ant decides whether to drop
its triple on the current location. If it does it stores
it in the appropriate cluster and walks back to its ori-
gin to report the success of the writing procedure. On
its way back it lays its cluster resource as pheromone
on each edge it takes. While the ant does not decide
to drop the triple it roams the network selecting paths
which are marked with pheromones of its cluster re-
source. The probability to change from the current
node i to a node j in its neighborhood NH(i) is given
by the following equation.
P
i j
=
Ph
j
(cr)
∑
n∈NH(i)
Ph
n
(cr)
(3)
WEBIST 2010 - 6th International Conference on Web Information Systems and Technologies
132