offers a way to solve various types of expected costs.
Since a process model is available, the paradigm of
model predictive control can also be used to derive
the desired controls.
As the basic ideas are old, well-known, and
widespread, relevant literature can be found from
many fields and under different keywords: gener-
alized cell-to-cell mapping (Hsu, 1987), qualitative
modelling (Lunze et al., 2001), and reinforcement
learning (Kaelbling et al., 1996), for example. Much
of the terminology in the sections that follow origi-
nate from (Hsu, 1987): we refer to mappings between
cells as (simple or) generalized cell maps, we use a
sink cell, etc.
As pointed out in (Lee and Lee, 2004) (see also
(Ikonen, 2004) (Negenborn et al., 2005)), applications
of MDP in process control have been few; instead,
the model predictive control paradigm is very popu-
lar in the process control community. Whereas not-
so-many years ago the computations associated with
finite Markov chains were prohibitive, the computing
power available using cheap office-pc’s enables the
re-exploration of these techniques.
The work described in this paper aims at build-
ing a proper basic framework for examining the pos-
sibilities of controlled finite Markov chains in nonlin-
ear process control. A majority of current literature
on MDP examines means to overcome the problem
of curse-of-dimensionality, e.g., by means of func-
tion approximation (neuro-dynamic programming, Q-
learning, etc.). The main obstacle in such approaches
is in that the unknown properties introduced by the
mechanisms of function approximation make void the
fundamental benefit of applying finite Markov chains:
a straightforward and elegant means to describe and
analyse the dynamic characteristics of a stochastic
nonlinear system. Recall how linear systems are lim-
ited by the extremely severe assumption of linearity
(affinity), yet they have turned out to be outmost use-
ful for control design purposes. In a similar way, the
finite Markov chains are fundamentally limited by the
resolution of the problem presentation (discretization
of state-action spaces). The hypothesis of this work is
that keeping in mind this restriction (just as we keep
in mind the assumption of linearity) the obtained re-
sults can be most useful. The practical validity of this
statement is in the focus of the research.
In particular, the field of process engineering is in
our main concern, with applications characterized by:
availability of rough process models, slow sampling
rates, nonlinearities that are either smooth or appear
as discontinuities, expensive experimentation (large-
scale systems running in production), and substantial
on-site tuning due to uniqueness of products. Clearly,
this type of requirements differ from those encoun-
tered, e.g., in the field of economics (lack of reli-
able models), robotics (very precise models are avail-
able), consumer electronics (mass production of low
cost products), telecommunication (extensive use of
test signals, fast sampling), or academic toy problems
(ridiculously complex multimodal test functions).
Due to systematical errors, noise, and lack of ac-
curacy in measurements of process state variables,
among many other reasons, there is a urgent need for
extended means of learning and handling of uncer-
tainties. The finite Markov chains provide straight-
forward means for dealing with both of these issues.
This paper is organized as follows: The process
models are considered in section 2, control design in
section 3, open and closed loop system analysis in
section 4. The MATLAB toolbox, and an illustra-
tive example is provided in section 5. Discussion on
aspects relevant to learning under uncertainties, and
conclusions, are given in the final section.
2 GENERALIZED CELL
MAPPING
Let the process under study be described by the fol-
lowing discrete-time dynamic system and measure-
ment equations
x(k) = f (x(k− 1), u(k− 1), w(k− 1)) (1)
y(k) = h(x(k), v(k)) (2)
where f : ℜ
n
x
× ℜ
n
u
× ℜ
n
w
→ ℜ
n
x
and h : ℜ
n
x
×
ℜ
n
v
→ ℜ
n
y
are nonlinear functions, w
k
∈ ℜ
n
w
and
v
k
∈ ℜ
n
v
are i.i.d. white noise with probability den-
sity functions (pdf’s) p
w
and p
v
. The initial condition
is known via p
X
(0).
Let the state space be partitioned into a finite num-
ber of sets called state cells, indexed by s ∈
S =
{
1, 2, ..., S
}
. The index s is determined from
s = argmin
s∈S
x− x
ref
s
where x
ref
s
are reference points (e.g., cell centers). In
addition, let us define a ’sink cell’, s
sink
; a state is cat-
egorized into a sink cell if min
s∈S
x− x
ref
s
> x
lim
.
Similarly, let the control action and measurement
spaces be partitioned into cells indexed by a ∈
A =
{
1, 2, ..., A
}
and m ∈
M =
{
1, 2, ..., M
}
, respectively,
and determined using reference vectors u
ref
a
and y
ref
m
.
The partitioning results in
X = ∪
S
s=1
X
s
,
U = ∪
A
a=1
U
a
and
Y = ∪
M
m=1
Y
m
.
The evolution of the system can now be approxi-
mated as a finite state controlled Markov chain over
PROCESS CONTROL USING CONTROLLED FINITE MARKOV CHAINS WITH AN APPLICATION TO A
MULTIVARIABLE HYBRID PLANT
79