y Nodes 1, 2, 3, 10, 17 (4 good, 1 bad)
y Nodes 4, 5, 6 (all good nodes)
y Nodes 8, 17 some erroneous data
| Period VI: Tremor detectable by Clusters 3 and 4
y Nodes 1, 2, 3, 10, 17 (4 good, 1 bad)
y Nodes 4, 5, 6 (2 good, 1 bad)
y Clusters 1 and 2 cannot detect tremor and also
each have one node with some erroneous data.
y Nodes 1, 6, 11, 15 erroneous
5.2 Results
For all of the experiments we represented the 16
nodes with their id number in matrix M = [1, 2, 3, 4,
5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. The weights
of the nodes correspond to the bandwidth that was
required to transmit the data from that node to the
sink node (in number of hops) and was given by
weights = [2, 2, 2, 1, 1, 1, 3, 3, 2, 3, 3, 3, 3, 3, 3, 2].
The SeismicDataReliabilty and the
SeismicDataPriority (both resulting from the
Bayesian network in Phase I) were each represented
in a matrix. In the interest of space we will not
display all of the SeismicDataReliabilty and the
SeismicDataPriority matrices. As an example the
matrices for Experiment I, Phase III are:
SeismicDataReliability = [3.96, 98, 98, 70.6, 70.6,
31.4, 90.2, 90.2, 98, 11.8, 90.2, 90.2, 90.2, 90.2,
11.8, 98]
SeismicDataPriority = [0, 100, 100, 100, 100, 0,
100, 100, 100, 0, 100, 100, 100, 100, 0, 100]. These
two matrices were added together to get the resulting
confidence parameter that is used as input into the
algorithms. Confidence parameter = [3.96, 198, 198,
170.6, 170.6, 31.4, 190.2, 190.2, 198, 11.8, 190.2,
190.2, 190.2, 190.2, 11.8, 198].
We used two measures to evaluate the algorithms:
accuracy (explained below) and the percentage of
the optimal data selected. It should be noted that in
our volcanic monitoring scenario all the data is not
treated equal. Rather, some of the data can be
categorized as good or error free data, while other
data is referred to as bad or erroneous data.
Additionally, the good data must further be
categorized as good data which is generated from a
node that is physically located in an area of activity,
and hence has a higher priority or good data that is
generated from a node that is in a non-active area
and has a lower priority. We must adhere to these
distinctions to correctly measure the accuracy of the
algorithms.
We designed a point system to measure the
accuracy of the algorithms such that it reflects the
information related to the type of data selected by
the algorithm. The accuracy of the algorithm is
initially set to 0. Each algorithm is then assigned
points based on the type of node it includes in the
optimal data subset. For each node in the optimal
subset that is erroneous we add “-1” to its current
accuracy. Each good node that is included in the
optimal subset “+1” or “+2” is added to the accuracy
for low priority and the high priority nodes,
respectively.
Figures 6 – 9 show the accuracy of all the
algorithms for time periods I – VI under different
network bandwidths. The available bandwidth refers
to the space available to transmit the data. Thus,
high bandwidth indicates that 83% of the data is
allowed to be transmitted, while medium high,
medium, and low can handle 53%, 26%, and 13% of
allowable data, respectively. In Period I of Figures 6
– 8, all four algorithms performed equally. This was
as expected as this time period contained no activity
and had no erroneous data; rather it was used as a
validity test. When the available bandwidth was low,
Figure 9, we performed better due to our
optimization of the nodes and their associated
bandwidth requirements. In all four figures you can
see that in relation to the other algorithms, ours
showed the most improvement in Periods III, V, and
VI. This is because those were the time period that
contained some erroneous data. Additionally, you
can see that the increase in the amount of accuracy
points that our algorithm gains versus the other
algorithms is inversely related to the available
bandwidth. Thus, while our algorithm never
performed worse than the competition, it displays
the most gains when the bandwidth resources are
restricted and erroneous data is present.
The second metric that we used to evaluate our
optimum data selection algorithm was a measure of
the percentage of the optimal data that was chosen.
By optimal data, we refer to all of the data that does
not contain errors. In order to compute the
percentage, as shown in Figures 10 – 13, we took the
ratio of the total number of good nodes that were
selected in the optimum data set to the total number
of nodes that were in the entire sample. From the
results it is evident that the percentage of increase is
directly proportional to the available bandwidth.
Thus, it is expected that when the bandwidth is
limited, the total number of nodes selected is also
reduced. However we can refer to the relative gain
between algorithms in order to compare their results.
As demonstrated in Figure 10, when the
bandwidth is limited, our algorithm outperformed
others in all cases. Again, it is demonstrated in all of
the figures that we showed the greatest improve-
DATA RELIABILITY AND DATA SELECTION IN REAL TIME HIGH FIDELITY SENSOR NETWORKS
51