wan and Agarwal, 2009), fault tolerant routing tech-
niques can be classified into two main categories: re-
transmission and replication. Retransmission is quite
popular since the packet loss rate in WSNs is higher
than in traditional networks. Two popular replica-
tion mechanisms are multipath routing (Karlof et al.,
2003; Ganesan et al., 2001) and erasure coding (Wang
et al., 2005). In the former approach multiple copies
of sensed data are transmitted over multiple routing
paths so that the data can successfully reach its desti-
nations, so long as one path is free from node failures
along the way. Thus, the multipath routing protocols
are more resilient to node failures at the expense of
increased overall traffic. Erasure coding is another
replication approach aiming at enhancing fault tol-
erance in WSNs. The basic idea is to add K parity
fragments to the M data fragments to have a total of
M + K fragments, which are divided into sub-packets
and transmitted over multiple paths. The sink can re-
construct the original data when at least M out of the
M + K fragments have been successfully transmitted.
Most of the existing fault tolerance techniques
in WSNs simply isolate the failed or malfunctioning
nodes in the communication layer and ignore the data
of the failed node as in (Marti et al., 2000). In these
papers, fault tolerance is achieved in the sense that
the networks can still fulfill the sensing tasks in the
presence of failures. However, these approaches do
not deal with the recovery of the failed node. Once a
node has failed, the data stored in the failed node is
lost with these approaches.
Similar to our goal, Chessa and Maestrini (Chessa
and Maestrini, 2005) present a fault recovery mecha-
nism to cope with node failures in single hop WSNs.
They proposed to partition the memory of sensor
nodes into two parts, one for storing its own sensed
data and the other for storing redundant data used for
recovery. By keeping redundant data of other sensor
nodes this scheme is able to recover data loss after
a node failure. The redundant concept is similar to
our work. However, their mechanism can only deal
with single node failure within not realistic single hop
WSNs, whereas our work can be applied to multi-hop
WSNs and can cope with multiple node failures at the
same time.
3 PROPOSED NODE FAILURE
RECOVERY SCHEME
To investigate the performance gain and the energy
consumption overhead under a generic network topol-
ogy setting, the first assumption we made is that the
topology of the network is flat, i.e., that there are
no hierarchical structures or cluster heads that are in
charge of other nodes within their domain. All the
nodes are considered as having the same significance
and are equipped with the same amount of storage
space, computation resources, communication capac-
ity and initial energy supply. The sink is considered as
being constantly charged by a reliable energy source.
We assume that sensor nodes are stationary after be-
ing deployed, each node operates within a fixed radio
range and, while nodes themselves can fail, links be-
tween nodes are reliable. Finally, since our approach
depends on location of sensors and neighborhood re-
lationships between sensors, we assume that the sink
stores the whole network topology and each sensor
node has a list of all the neighbors that reside within
its communication range; this can be achieved using
inexpensive GPS modules at deployment time.
3.1 System Model
We consider a WSN composed of large numbers of
sensor nodes with one single sink located at the cen-
ter of the deployment field, however, this can be eas-
ily generalized to other cases. The sensor nodes do
continuous and periodic sensing and data collection at
their locations. A sensor node can be either in an alive
or a failed state. The transition from an alive state to a
failed state is one-way and irreversible. Our premise
is that when one node fails, the remaining alive sen-
sor nodes should be able to cooperate and recover the
sensed data of the failed node by utilizing the redun-
dant information stored in the alive nodes. There-
fore, the remaining alive sensor nodes can success-
fully handle queries with regard to the failed node, as
if there were no node failure.
At the beginning of each round, the sink generates
a query message specifying the target query area. The
query message is in the format of [Center, Radius]
where Center represents the coordinates of query cen-
ter. The query message is then flooded to the whole
deployment field
1
. Upon receiving the query mes-
sage, each node determines whether it is within the
query area and will respond (or not) to this query.
We aim to incorporate in-network data redun-
dancy to achieve node failure recovery. The main
idea is about properly preserving the sensed data of
one node in its neighboring nodes so that in case this
node fails, the neighboring nodes still have access to
sufficient information for recovering the data of the
failed node. To achieve this goal, we propose to parti-
tion the storage unit of a sensor node into two separate
1
There are protocols more efficient than simple flood-
ing, but their usage is orthogonal to our purposes in this this
paper.
RAID'ingWirelessSensorNetworks-DataRecoveryforNodeFailures
299