2 USER’S PERSPECTIVE
Following the approach introduced in (Jovanovi
ˇ
c
et al., 2006), a grid service must enable a user to do
the following two tasks:
1. Starting a new search: As a user is given full (read)
access to the raw dataset, a user should be able to
specify a program that searches the local chunk of
a distributed dataset and extract the relevant data.
2. Checking the list of performed searches: A user
must be able to check if any other user has already
performed a search equal or similar to the one he or
she is about to start.
As the user’s program extracts data from the local
chunk only, it is the responsibility of the grid service
to transfer the user’s program to each dataset server,
to run the program, and to collect the extracted data
from all servers.
The first reason for keeping a list of performed
searches is to keep the overall system load low (a) by
not repeating the same search all over again and (b)
by using results from a similar (possibly more gen-
eral) search instead of starting a new one. The second
reason is a fact that the list of performed searches acts
as a virtual blackboard about what data or relations
among data in the dataset seem interesting to other
users. This might be especially valuable in a collabo-
rative scientific research. It follows that the results of
the searches performed in the past should be stored in
one form or the other somewhere in a data grid.
Hence, the typical method for performing a search
is best described by an (informal) Algorithm 1. Note
(1) that starting a new search in line 6 does not block
the execution as the new search is performed on re-
mote servers; (2) that the search and its results will
eventually appear on the list of performed searches
(even if it yields no results) and thus the loop in lines
7–13 does terminate.
Algorithm 1 Performing a search.
1: check the list of performed searches
2: if the (similar) search has been found then
3: return its results and stop
4: end if
5: provide the extraction program
6: start a new search
7: loop
8: check the list of performed searches
9: if the search has been found then
10: return its results and stop
11: end if
12: sleep for a specified amount of time
13: end loop
This method allows different implementations of a
distributed index of performed searches and different
implementations of searching. At the time being, the
research is focused on the efficient peer-to-peer im-
plementation of a single search (line 6). The task
of maintaining a list of performed searches in a dis-
tributed index and especially a method for comparing
descriptions of different searches are left for the fu-
ture work.
3 ANT COLONY OPTIMIZATION
FOR SEARCHING RAW
DATASETS
Ant colony optimization is a biologically-inspired op-
timization method (Dorigo and St
¨
utlze, 2004). The
basic idea is to use a large number of simple agents
called ants: each ant performs a relatively simple task
but combined together they are able to produce so-
phisticated and effective optimization. Further im-
provements of ACO are based usually on a combina-
tion of the ACO algorithm with other local optimiza-
tion techniques (Dorigo and St
¨
utlze, 2004). Ant based
clustering and sorting have already been studied in the
past (Deneubourg et al., 1991; Handl et al., 2006).
There are two main differences between ant based
clustering and peer-to-peer searching as described in
this paper. First, a node in the system can possess
a pile of data, not just one datum. Second, other
ant based clustering algorithms are suited for mesh
or similar regular planar topologies, and thus the dis-
tance function is modified in order to be efficient for
an arbitrary topology of a distributed environment.
Extraction Ants. Using user-specified programs,
extraction ants extract data from local chunks of
raw datasets. Each extraction program is uniquely
identified. Extraction ants mark their paths with
pheromones associated with the id of the program
they are carrying. In order for extraction ants to dis-
cover as much data as possible they avoid the trails
and prefer the clean paths.
Aggregation Ants. The data extracted by extraction
ants is collected into piles by aggregation ants. Ag-
gregation ant is based on Algorithm 2. According to
the pick-up function 1/(1+x/n)
k
, small piles of data
are picked up with a very high probability while large
piles are most unlikely to be picked up. The distinc-
tion between small and large pile is predefined and
based on the characteristics of the environment, such
as the average connection bandwidth, and expected
amount of data. Note that the distinction between
a small and large piles can be simply regulated by
changing n (a measure for the size of a pile) and k
ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES
364