is stored in the log. We assume that update opera-
tions are idempotent, so as to avoid inconsistencies
during the recovery process. Before sending the re-
covery request, R
j
restores its volatile state using the
information from its local log. Then it sends a recov-
ery request to R
r
, containing the sequence number of
the last applied update. R
r
responds with the informa-
tion related to updates with a higher sequence number
than the one of the last update applied at R
j
, thus in-
cluding all forgotten and missed updates.
5 EVALUATION
Our testing configuration consists of eight computers
connected in a 100 Mbps switched LAN, where each
machine has an Intel Core 2 Duo processor running
at 2.13 GHz, 2 GB of RAM and a 250 GB hard disk
running Linux (version 2.6.22.13-0.3-bigsmp). The
file server initially includes 200 binary files of 10 MB
each. The persistent log for partial recovery is imple-
mented with a local Postgresql 8.3.5 database. Each
machine runs a Java Virtual Machine 1.6.0 execut-
ing the application code. Spread 4.0.0 has been used
as GCS, whereas point-to-point communication has
been implemented via TCP channels. In our experi-
ments we compare the performance of the implemen-
tations of passive and active replication to find the in-
fluence of a number of parameters on the saturation
point of the system. On the other hand, we assess the
cost of the recovery process and compare total and
partial recovery, considering the recovery time, the
impact of recovery on the system’s throughputand the
distribution in time of the main steps of the recovery
process.
5.1 Replication Experiments
We have evaluatedthe behavior of our replication pro-
tocols for the file server in a failure free environment,
depending on the following parameters: number of
replicas (from 2 to 8), replication strategy (active and
passive), percentage of updates (20%, 50% and 80%),
number of clients (1, 5, 10, 20, 30, 40, 50, 60, 80,
100, 125 and 150), number of operations per second
submitted by each client (1, 2, 4, 10 and 20) and oper-
ation size (10, 25, 50 and 100 KB). A dedicated ma-
chine connected to the same network executes client
instances. Each client chooses randomly one of the
replicas and connects to it in order to send requests
(read or write operations over a randomly selected
file) at a given rate during the experiment. Each ex-
periment lasts for 5 minutes.
Figure 3(a) shows the system performance ob-
tained with 4 replicas while incrementing the num-
ber of clients, each one sending 10 requests per sec-
ond. There is a proportional increase of the system
throughput as the number of clients grows, until a sat-
uration point is reached. As expected, system per-
formance is inversely proportional to the operation
size (due to the execution cost itself and network la-
tency), and to the update rate (as read requests are lo-
cally processed, whereas updates must be propagated
and sequentially applied at all replicas). In addition,
active and passive replication have almost the same
throughput levels when there is a low rate of updates,
as reads are handled in the same way. In contrast, pas-
sive replication is more costly if there is a high rate
of updates, since the primary acts as a bottleneck. We
shall remark that, as the constraint of uniform delivery
is responsible for the most part of multicast latency,
the cost of update multicasts is the same in passive
replication, where only FIFO order is needed, and ac-
tive replication, which requires total order. In fact,
Spread uses the same level of service for providing
Uniform Multicast, regardless of the ordering guaran-
tees (Stanton, 2009).
Figure 3(b) results from executing the same ex-
periments as in Figure 3(a), but in this case with 8
replicas in the system. From the comparison between
both figures we can conclude that an increase in the
number of replicas improves performance when there
is a low rate of updates, since read requests are han-
dled locally and therefore having more replicas allows
to execute more read requests. On the contrary, when
there is a high rate of updates, performance does not
improve, and it even becomes worse if the operation
size is small, as the cost of Uniform Multicast incre-
ments with the number of replicas. However, if the
operation size is big, the cost of Uniform Multicast is
masked by the execution costs.
5.2 Recovery Experiments
In the following we present how the recovery experi-
ments were run. The system is started with 4 replicas,
and then one of them is forced to crash. The crashed
replica is kept offline until the desired outdatedness is
reached. At that moment, the crashed replica starts
the recovery protocol. Figure 4 depicts the recovery
time depending on the recovery type. In this case, no
client requests are being issued during recovery. To-
tal recoveryhas been tested with different initial sizes;
therefore, the recovery process must transfer the ini-
tial data in addition to the outdated data. The results
show that, in total recovery, the recovery time is pro-
portional to the total amount of data to be transferred.
ICSOFT 2009 - 4th International Conference on Software and Data Technologies
150