(2) the corresponding “approximating” Markov con-
trol processes:
˜
X
n+1
= max{0,
˜
X
n
+ a
n
−
˜
ξ
n
}, n = 1, 2,...(4)
˜
X
n+1
= min{
˜
X
n
− a
n
+
˜
ξ
n
,M}, n = 1, 2,...(5)
where
˜
ξ
0
,
˜
ξ
1
,... are i.i.d random variables with the
d.f.
˜
F.
Suppose that using the same criterion Q one can
find (at least “theoretically”) an optimal control pol-
icy
˜
π
∗
for process (4) (or for process (5)). Neverthe-
less we insist that the “original” process is given by
equations (1) (or by (2)) and therefore the controller
aims to find
˜
π
∗
only in order to apply it to control pro-
cess (1) (or (2), respectively). This means that
˜
π
∗
is
used as an available approximation to an unavailable
policy π
∗
The below stability (continuity) index ∆ (see e.g.
(Gordienko and Salem, 2000), (Gordienko et al.,
2009)) gives a quantitative measure of the accuracy
of such approximation:
∆ := Q(
˜
π
∗
) − Q(π
∗
) ≥ 0. (6)
Here π
∗
is the optimal policy satisfying (3). In other
words, ∆ expresses the extra cost paid for using
˜
π
∗
instead of π
∗
.
There are many examples of unstable problem of
control optimization in MCP’s (see e.g. (Gordienko
and Salem, 2000), (Gordienko et al., 2009)). In these
examples the values of ∆ in (6) are greater than a posi-
tive constant (sometimes large enough), in spite of the
fact that
˜
F approaches F in certain strong sense (for
example, in the total variation metric).
For the above mentioned applied MCP’s we estab-
lish two (for two different optimization criteria) sta-
bility inequalities of the form:
∆ ≤ Kπ(F,
˜
F), (7)
where K is an explicitly calculated constant, and π is
the Prokhorov (sometimes called “Levy-Prokhorov”)
metric on the space of d.f’s. The convergence in π
is equivalent to the weak convergence ( convergence
in distribution). Inequality (7) asserts that if
˜
F is a
“good” approximation to F (in sense of closeness in
π), then if one uses the policy
˜
π
∗
then he/she pays
“almost the same amount” as applying the optimal
policy π
∗
. Note that, in spite of lack of complete in-
formation on F, sometimes it is possible to find an
upper bound for values of π(F,
˜
F). When
˜
F =
˜
F
m
is
the empirical distribution function obtained from the
sample X
1
,X
2
,..., X
m
, of size m, then Eπ(F,
˜
F
m
) ≤
const · m
−1/3
, provided that
R
∞
0
x
3/2
F(dx) < ∞.
It is worth noting that in the theory of stochas-
tic processes (controlled or not) the term “stability”
has had a variety of meanings (see, for instance,
(MacPhee and M
¨
uller, 2006), (Meyn and Tweedie,
2009)). We use this term since we did not found any-
thing better. The study of quantitative estimation of
continuity in noncontrollable queues (under perturba-
tion of “governing” d.f.’s) is a rather developed topic
(see, eg. (Zolotarev, 1976), (Kalashnikov, 1983),
(Abramov, 2008), (Gordienko and Ruiz de Chavez,
1998)). For controlled Markov processes the “sta-
bility” was studied (in the another framework), for
instance, in (Van Dijk, 1988), (Van Dijk and Sladky,
1999). “Stability inequalities” for MCP’s (using prob-
ability metrics distinct from the Prokhorov one) were
found in several relevant papers (see for instance
(Gordienko and Salem, 2000), (Gordienko et al.,
2009)). Surely, “stability” of control policies as it is
considered here in some way is connected with the
sensitivity theory, and with the profound theory of
robust control (which, in particular, are successfully
used in dam models, see e.g. (Barbu and Sritharan,
1998), (Litrico and Georges, 2001)).
The findings in this communication are new the-
oretical results on estimation of “stability” of applied
control process (in terms of the Prokhorov metric.)
Stability estimation in queueing models may play a
role in design of “reasonable” control policies under
uncertainties about probability distributions. The wa-
ter release model considered here is quite elementary,
and so has restricted applications. However, our ap-
proach can be extended to more realistic dam models.
2 STABILITY ESTIMATION
WITH RESPECT TO THE
EXPECTED TOTAL
DISCOUNTED COST
Equations (1) represent the MCP on the state space
X = [0,∞) and the sets of admissible control actions
A(x) = [γ, γ ], x ∈ X (with γ < γ being the assigned
extreme values of a service rate). In turn, equations
(2) define the MCP on the state space X = [0, M] with
the sets of admissible actions A(x) = [0, x], x ∈ X.
Suppose that for each of the above mentioned pro-
cesses a measurable real-valued one-step cost func-
tion c(x,a), x ∈ X, a ∈ A(x) is specified. Thus, at
stage t the controller “pays” c(x
t
,a
t
) if the process oc-
curs in the state x
t
and the control action a
t
is selected.
For example, for the controlled queue (1) c(x,a) can
be increasing in the variable a representing a service
rate, and for process (2) c(x,a) can be decreasing as
a function of water consumption a. In fact we do not
need such detailed specification. We will only sup-
pose throughout the rest of the paper that the function
ICINCO2012-9thInternationalConferenceonInformaticsinControl,AutomationandRobotics
310