• Finally, we assume that job execution time is neg-
ligible. So we do not try to model it, nor the
queueing effect which may result from it on de-
vices. This is a natural assumption for the kind of
device infrastructures we are targetting, where de-
vices are far from full utilisation and spend most
of their time waiting for jobs.
2.2 Formulation as a Sequential
Decision Process
The state of the system at any time is given by a pair
hσ, zi where the control state σ is the subset of indices
of the devices in
ready
mode and z ∈ Z denotes the
demand state of the infrastructure. We seek to build
a controller which takes as input the stream of job re-
quests, maintains the state of the system and uses it to
make decisions at each new job:
• First, the controller must choose the index k of the
device to which the job is assigned. We assume
here that jobs are immediately assigned upon re-
ception.
• Then, just after the assignment, the controller
must choose a family (τ
k
)
k∈σ
of non negative
timeouts, where σ is the control state at that time,
and each τ
k
denotes the sleep schedule for device
k ∈ σ i.e. the time after which device k is to be
switched to
sleep
mode if no job has been re-
ceived in between.
Thus, the problem is formulated as the optimisation of
a sequential decision process. We consider the opti-
misation at infinite horizon with discount factor γ. Let
V
!
hσ, zi and Vhσ, zi be the optimal cost to go associ-
ated with state hσ, zi, respectively before and after an
assignment. The optimality equation is given in Fig-
ure 1.
• Equation (1) concerns the total cost of assigning,
in state hσ, zi, a job with cost menu c to a device k:
it consists of the client assignment cost c
k
speci-
fied in the cost menu, plus, if device k is in
sleep
mode (i.e. k 6∈ σ), the wake up cost to lift it to
ready
mode, plus the cost to go after assignment
from the new state hσ ∪ {k}, zi in which k is now
in
ready
mode. The demand state is unchanged
because we ignore job execution time, so the job
is assumed to be completed immediately after as-
signment.
• Equation (2) concerns the cost of setting time-
outs (τ
k
)
k∈σ
for the devices in
ready
mode in
state hσ, zi, when the next job arrives after time
x in a demand state z
′
: it consists of the cost
g
k
(x, τ
k
) of the energy consumption until time x
of each device k in
ready
mode with timeout τ
k
,
plus the discounted cost to go from the new state
h{k ∈ σ|x < τ
k
}, z
′
i where the control state con-
sists exactly of those devices in σ for which the
timeout was not reached at time x (i.e. τ
k
> x).
If functions V and V
!
satisfy Equations (1) and (2),
then the optimal policy for the controller can be for-
mulated as follows:
• When receiving a job with cost menu c in state
hσ, zi, assign it to the device k which minimises
the minimisation objective of Equation (1).
• Just after assignment in state hσ, zi, set timeouts
(τ
k
)
k∈σ
which minimise the minimisation objec-
tive of Equation (2).
3 SOLVING FOR OPTIMALITY
We are looking for a solution in V
!
,V to the system of
Equations (1) and (2). We follow the general proce-
dure of value iteration, which alternates updates to V
!
from V using Equation (1) and updates to V from V
!
using Equation (2). We assume that the demand state
space Z is finite, so the overall state space is finite and
both V
!
,V can be represented as finite dimension vec-
tors. The update using Equation (1) is quite straight-
forward: minimisation can be done by enumeration
(of the devices), and the integral is turned into a sum,
assuming distribution Q is discrete. The update us-
ing Equation (2) is more involved, as it requires solv-
ing a K-dimensional optimisation. Unfortunately, the
optimisation objective has no good properties, such
as convexity or smoothness, which would make it
amenable to standard optimisation techniques. Fur-
thermore, it is important to reach a global optimum
and not just a local one. The rest of this section is
devoted to solving the optimisation in Equation (2).
3.1 Transformation of the Objective
Although the optimisation in Equation (2) occurs in
the (up to) K-dimensional space of possible timeouts,
it can in fact be turned into a sequence of (up to)
K uni-dimensional optimisations. To show this, we
introduce two side functions V
◦
ht, σ, zi and vht, σ, zi
where t is a positive scalar, and σ, z is a state. It can
then be shown that the solution in V to Equation (2)
can be obtained by solving the system of equations in
V
◦
, v,V shown in Figure 2. In that system, all the op-
timisations in the space of timeouts are captured by a
single operator ↓, defined for any function f on posi-
tive scalars by
↓ f(τ) = min
t≥τ
f(t)
ICINCO2012-9thInternationalConferenceonInformaticsinControl,AutomationandRobotics
234