module RadioController
RAction:[0..1];
counter:[0..T_LATENESS - 1];
[tick] counter = 0 -> (RAction’ = 0) & (counter’ = 1);
[tick] counter > 0 & counter <= T_LATENCY - 1 & RAction = 0 ->
(RAction’ = 0)
& (counter’ = counter + 1);
[tick] counter = 0 -> (RAction’ = 1) & (counter’ = 1);
[tick] counter > 0 & counter <= T_LATENCY - 1 & RAction = 1 ->
(RAction’ = 1) & (counter’ = counter + 1);
[tick] counter >= T_LATENCY & counter < T_LATENESS - 1 ->
(RAction’ = 0) & (counter’ = counter + 1);
[tick] counter = T_LATENESS - 1 -> (RAction’ = 0) & (counter’ = 0);
endmodule
Figure 3: The PRISM radio controller specification with
T
latency
-epochs long actions.
because the node has to synchronize with the net-
work. For the SensorScope platform, which uses the
BMAC protocol, the time the radio needs to synchro-
nize to the network and send the data is T
latency exact
=
2.25 seconds (S¸us¸u et al., 2008). For the sake of the
energy efficiency, since the radio is the most power
consuming component of the sensor node, we do
not want to turn it off while synchronizing or trans-
mitting. Therefore, under these conditions, the ra-
dio power off command should be issued at least
T
latency
= d
T
latency exact
time step
e = 3 epochs later from the mo-
ment we turned on the radio.
In order to handle the optimization with multi-
epoch actions, we add a new state variable, counter,
to the PRISM model, which keeps track of how many
epochs have passed from the beginning of the cur-
rent RAction = 1 issue. We then allow to change to
RAction = 0 only if counter ≥ T
latency
.
Adding this variable gives us the possibility to turn
on the radio only at the beginning of the T
lateness
pe-
riod for exactly T
latency
periods. This implies that the
counter variable has to take values between 0 and
T
lateness
− 1. The PRISM specification of the radio
controller with the variable counter is given in Fig-
ure 3. We compose this component with the others
of the model in Figure 2. The radio constraint in our
formulations (1) and (2), specifying to have at least
T
latency
/T
lateness
average radio reward per epoch, is in
accordance with the behavior described above.
4 OPTIMAL SOLUTION AND
COMPARISON
We call π
3
the MDP optimal policy that generates
only T
latency
-epochs long radio actions (T
latency
= 3)
and π
1
the one on which we do not impose this con-
straint.
We present in Table 1 the simulation results ob-
tained with PRISM for π
3
, together with the ones for
the Conservative and the Greedy policies, for which
Table 1: The expected total radio (R) and sensing (SDC) re-
wards of the solution policies for constrained optimization,
for various horizon length (hl) values.
module RadioController
RAction:[0..1];
counter:[0..T_LATENESS - 1];
[tick] counter = 0 -> (RAction’ = 0) & (counter’ = 1);
[tick] counter > 0 & counter <= T_LATENCY - 1 & RAction = 0 ->
(RAction’ = 0)
& (counter’ = counter + 1);
[tick] counter = 0 -> (RAction’ = 1) & (counter’ = 1);
[tick] counter > 0 & counter <= T_LATENCY - 1 & RAction = 1 ->
(RAction’ = 1) & (counter’ = counter + 1);
[tick] counter >= T_LATENCY & counter < T_LATENESS - 1 ->
(RAction’ = 0) & (counter’ = counter + 1);
[tick] counter = T_LATENESS - 1 -> (RAction’ = 0) & (counter’ = 0);
endmodule
Figure 3: The PRISM radio controller specification with
T
latency
-epochs long actions.
the radio and sensing rewards of the policies only the
epochs in which the system does not run out of en-
ergy.
3.1 Considering The Multi-epoch Radio
Transceiver Actions
So far, we did not take into consideration that turn-
ing on the radio transceiver of the sensor node fol-
lowed by transmitting the acquired data are costly
operations, which can take longer than one epoch
because the node has to synchronize with the net-
work. For the SensorScope platform, which uses the
BMAC protocol, the time the radio needs to synchro-
nize to the network and send the data is T
latency exact
=
2.25 seconds (S¸ us¸u et al., 2008). For the sake of the
energy efficiency, since the radio is the most power
consuming component of the sensor node, we do
not want to turn it off while synchronizing or trans-
mitting. Therefore, under these conditions, the ra-
dio power off command should be issued at least
T
latency
= d
T
latency exact
time step
e = 3 epochs later from the mo-
ment we turned on the radio.
In order to handle the optimization with multi-
epoch actions, we add a new state variable, counter,
to the PRISM model, which keeps track of how many
epochs have passed from the beginning of the cur-
rent RAction = 1 issue. We then allow to change to
RAction = 0 only if counter ≥ T
latency
.
Adding this variable gives us the possibility to turn
on the radio only at the beginning of the T
lateness
pe-
riod for exactly T
latency
periods. This implies that the
counter variable has to take values between 0 and
T
lateness
− 1. The PRISM specification of the radio
controller with the variable counter is given in Fig-
ure 3. We compose this component with the others
of the model in Figure 2. The radio constraint in our
formulations (1) and (2), specifying to have at least
T
latency
/T
lateness
average radio reward per epoch, is in
accordance with the behavior described above.
4 OPTIMAL SOLUTION AND
COMPARISON
We call π
3
the MDP optimal policy that generates
only T
latency
-epochs long radio actions (T
latency
= 3)
and π
1
the one on which we do not impose this con-
straint.
We present in Table 1 the simulation results ob-
tained with PRISM for π
3
, together with the ones for
the Conservative and the Greedy policies, for which
we present only the radio rewards that are T
latency
-
epochs (or more) long. As we can see, the π
3
pol-
icy has average rewards per epoch of 0.30 for the ra-
dio and 4.53 for sensing. π
1
attains on average per
epoch rewards of 0.30 for the radio (out of which only
49.57% satisfies the T
latency
constraint) and 4.96 for
sensing. To compare π
1
and π
3
we use the product be-
tween the expected T
latency
-feasible radio reward and
the expected sensing reward. This product is 0.744
for the former and 1.359 for the latter, which means
π
3
performs with 83% better than π
1
. Using the same
metric, π
3
is better with 14% than the Conservative
policy and with 154% than the Greedy one. The
Greedy policy is the only one that runs out of energy,
about 1.73% of the time.
Table 1: The expected total radio (R) and sensing (SDC) re-
wards of the solution policies for constrained optimization,
for various horizon length (hl) values.
MDP Conservative Greedy
hl policy π
3
policy policy
R SDC R SDC R SDC
10
4
3,099 45,587 1,945 61,768 1,031 52,727
10
5
30,822 453,302 19,395 615,398 10,192 524,071
10
6
308,055 4,530,441 193,893 6,151,698 101,809 5,237,512
The MDP has 39,390 reachable states. The as-
sociated LP uses six times more variables and takes
more than six days to be solved exactly on a stan-
dard computer platform. Therefore, we use competi-
tive approximation methods, the details of which we
omit in this paper, to find the solution, which reduce
the search time to a couple of hours.
The optimal policy π
3
prescribes for each reach-
able state of the system the best sensing duty cycle
and radio management that maximizes the expected
sensing quality, generates only T
latency
-epochs long
radio transmissions and does not run out of energy,
for the given harvester DTMC model.
we present only the radio rewards that are T
latency
-
epochs (or more) long. As we can see, the π
3
pol-
icy has average rewards per epoch of 0.30 for the ra-
dio and 4.53 for sensing. π
1
attains on average per
epoch rewards of 0.30 for the radio (out of which only
49.57% satisfies the T
latency
constraint) and 4.96 for
sensing. To compare π
1
and π
3
we use the product be-
tween the expected T
latency
-feasible radio reward and
the expected sensing reward. This product is 0.744
for the former and 1.359 for the latter, which means
π
3
performs with 83% better than π
1
. Using the same
metric, π
3
is better with 14% than the Conservative
policy and with 154% than the Greedy one. The
Greedy policy is the only one that runs out of energy,
about 1.73% of the time.
The MDP has 39,390 reachable states. The as-
sociated LP uses six times more variables and takes
more than six days to be solved exactly on a stan-
dard computer platform. Therefore, we use competi-
tive approximation methods, the details of which we
omit in this paper, to find the solution, which reduce
the search time to a couple of hours.
The optimal policy π
3
prescribes for each reach-
able state of the system the best sensing duty cycle
and radio management that maximizes the expected
sensing quality, generates only T
latency
-epochs long
radio transmissions and does not run out of energy,
for the given harvester DTMC model.
5 CONCLUSIONS
In this paper we have modeled and improved the func-
tionality of a wireless sensor node with Markov De-
cision Processes (MDPs). Because of the long time
the radio transceiver takes to synchronize with the
network, we have introduced MDP actions that take
longer than one epoch to complete. Optimizing with-
out taking into consideration these multi-epoch ac-
tions results in suboptimal MDP policies. We pro-
posed a method to find an optimal solution and com-
pared the result to various heuristic policies.
Our problem with multi-epoch actions has some
similarities with the Semi-Markov Decision Process
ICINCO 2010 - 7th International Conference on Informatics in Control, Automation and Robotics
242