PROCESS CONTROL USING CONTROLLED FINITE MARKOV
CHAINS WITH AN APPLICATION TO A MULTIVARIABLE
HYBRID PLANT
Enso Ikonen
University of Oulu, Department of Process and Environmental Engineering, Systems Engineering Laboratory
4PYOSYS, FIN-90014 Oulun yliopisto, Finland
Keywords:
Markov decision process, generalized cell-to-cell mapping, qualitative modelling.
Abstract:
Predictive and optimal process control using finite Markov chains is considered. A basic procedure is outlined,
consisting of discretization of plant input and state spaces; conversion of a (a priori) plant model into a set of
finite state probability transition maps; specification of immediate costs for state-action pairs; computation of
an optimal or a predictive control policy; and, analysis of the closed-loop system behavior. An application,
using a MATLAB toolbox developed for MDP-based process control design, illustrates the approach in the
control of a multivariable plant with both discrete and continuous action variables. For problems of size of
practical significance (thousands of states), computations can be performed on a standard office PC. The aim
of the work is to provide a basic framework for examination of nonlinear control, emphasizing in on-line
learning from uncertain data.
1 INTRODUCTION
For identification and control of stochastic nonlinear
dynamic systems, no general method exists. Devel-
opment of physical models is typically far too time
consuming and knowledge intensive, and results in
models that are not well suited for process control de-
sign. Instead, in industrial practice, linear approxi-
mations have turned out to be most useful, commonly
extended by considering local linear approximations
(cf. gain scheduling, indirect adaptive control, piece-
wise (multi)linear multimodel systems, etc.), where
linear descriptions vary with state and/or time.
For nonlinear plant identification, a multitude of
efficient methods exists (Ikonen and Najim, 2002).
These include polynomial functions, neural nets,
etc. Identification of nonlinear dynamical relations is
more difficult. Time series approaches (NARMAX,
etc.) are a common and straightforward framework
for extending to dynamic nonlinear systems. They
rely in that mapping past (delayed) signals through
a nonlinear static function will enable capturing the
system dynamics. The properties of these models are,
in general, difficult to analyze, however; complicating
significantly the control design. A common simplifi-
cation is that of Wiener and Hammerstein systems: to
consider static process nonlinearities only and equip
the static nonlinear model with separate linear dynam-
ics. This is a powerful approach in that the control
design can be largely based on linear analysis, and in
that knowledge on plant static nonlinearities can be
exploited in the development of plant control. How-
ever, only linear dynamics can be dealt with.
This paper focuses on an approach that can cope
with a large class of nonlinear systems: the finite
Markov chains (Puterman, 1994) (H
¨
aggstr
¨
om, 2002)
(Poznyak et al., 2000). The basic idea is simple.
The system state space is quantized (discretized, par-
titioned, granulated) into a finite set of states (cells),
and the evolution of system state in time is mapped in
a probabilistic (frequentist) manner. With controlled
Markov chains, the mappings are constructed from
each state-action pair. Once equipped with such a
model, a control action for each state can be deduced
by minimizing a cost function defined in a future
horizon, based on specification of immediate costs
for each state-action pair. Specification of immediate
costs allows versatile means for characterising the de-
sired control behavior. Dynamic programming, stud-
ied in the field of Markov decision processes (MDP),
78
Ikonen E. (2007).
PROCESS CONTROL USING CONTROLLED FINITE MARKOV CHAINS WITH AN APPLICATION TO A MULTIVARIABLE HYBRID PLANT.
In Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics, pages 78-85
DOI: 10.5220/0001615300780085
Copyright
c
SciTePress
offers a way to solve various types of expected costs.
Since a process model is available, the paradigm of
model predictive control can also be used to derive
the desired controls.
As the basic ideas are old, well-known, and
widespread, relevant literature can be found from
many fields and under different keywords: gener-
alized cell-to-cell mapping (Hsu, 1987), qualitative
modelling (Lunze et al., 2001), and reinforcement
learning (Kaelbling et al., 1996), for example. Much
of the terminology in the sections that follow origi-
nate from (Hsu, 1987): we refer to mappings between
cells as (simple or) generalized cell maps, we use a
sink cell, etc.
As pointed out in (Lee and Lee, 2004) (see also
(Ikonen, 2004) (Negenborn et al., 2005)), applications
of MDP in process control have been few; instead,
the model predictive control paradigm is very popu-
lar in the process control community. Whereas not-
so-many years ago the computations associated with
finite Markov chains were prohibitive, the computing
power available using cheap office-pc’s enables the
re-exploration of these techniques.
The work described in this paper aims at build-
ing a proper basic framework for examining the pos-
sibilities of controlled finite Markov chains in nonlin-
ear process control. A majority of current literature
on MDP examines means to overcome the problem
of curse-of-dimensionality, e.g., by means of func-
tion approximation (neuro-dynamic programming, Q-
learning, etc.). The main obstacle in such approaches
is in that the unknown properties introduced by the
mechanisms of function approximation make void the
fundamental benefit of applying finite Markov chains:
a straightforward and elegant means to describe and
analyse the dynamic characteristics of a stochastic
nonlinear system. Recall how linear systems are lim-
ited by the extremely severe assumption of linearity
(affinity), yet they have turned out to be outmost use-
ful for control design purposes. In a similar way, the
finite Markov chains are fundamentally limited by the
resolution of the problem presentation (discretization
of state-action spaces). The hypothesis of this work is
that keeping in mind this restriction (just as we keep
in mind the assumption of linearity) the obtained re-
sults can be most useful. The practical validity of this
statement is in the focus of the research.
In particular, the field of process engineering is in
our main concern, with applications characterized by:
availability of rough process models, slow sampling
rates, nonlinearities that are either smooth or appear
as discontinuities, expensive experimentation (large-
scale systems running in production), and substantial
on-site tuning due to uniqueness of products. Clearly,
this type of requirements differ from those encoun-
tered, e.g., in the field of economics (lack of reli-
able models), robotics (very precise models are avail-
able), consumer electronics (mass production of low
cost products), telecommunication (extensive use of
test signals, fast sampling), or academic toy problems
(ridiculously complex multimodal test functions).
Due to systematical errors, noise, and lack of ac-
curacy in measurements of process state variables,
among many other reasons, there is a urgent need for
extended means of learning and handling of uncer-
tainties. The finite Markov chains provide straight-
forward means for dealing with both of these issues.
This paper is organized as follows: The process
models are considered in section 2, control design in
section 3, open and closed loop system analysis in
section 4. The MATLAB toolbox, and an illustra-
tive example is provided in section 5. Discussion on
aspects relevant to learning under uncertainties, and
conclusions, are given in the final section.
2 GENERALIZED CELL
MAPPING
Let the process under study be described by the fol-
lowing discrete-time dynamic system and measure-
ment equations
x(k) = f (x(k 1), u(k 1), w(k 1)) (1)
y(k) = h(x(k), v(k)) (2)
where f :
n
x
×
n
u
×
n
w
n
x
and h :
n
x
×
n
v
n
y
are nonlinear functions, w
k
n
w
and
v
k
n
v
are i.i.d. white noise with probability den-
sity functions (pdfs) p
w
and p
v
. The initial condition
is known via p
X
(0).
Let the state space be partitioned into a finite num-
ber of sets called state cells, indexed by s
S =
{
1, 2, ..., S
}
. The index s is determined from
s = argmin
sS
x x
ref
s
where x
ref
s
are reference points (e.g., cell centers). In
addition, let us define a ’sink cell’, s
sink
; a state is cat-
egorized into a sink cell if min
sS
x x
ref
s
> x
lim
.
Similarly, let the control action and measurement
spaces be partitioned into cells indexed by a
A =
{
1, 2, ..., A
}
and m
M =
{
1, 2, ..., M
}
, respectively,
and determined using reference vectors u
ref
a
and y
ref
m
.
The partitioning results in
X =
S
s=1
X
s
,
U =
A
a=1
U
a
and
Y =
M
m=1
Y
m
.
The evolution of the system can now be approxi-
mated as a finite state controlled Markov chain over
PROCESS CONTROL USING CONTROLLED FINITE MARKOV CHAINS WITH AN APPLICATION TO A
MULTIVARIABLE HYBRID PLANT
79
the cell space (however, see (Lunze, 1998)). In sim-
ple cell mapping (SCM), one trajectory is computed
for each cell. Generalized cell mapping (GCM) con-
siders multiple trajectories starting from within each
cell, and can be interpreted in a probabilistic sense as
a finite Markov chain.
2.1 Evolution of States
Let the state pdf be approximated as a S×1 cell prob-
ability vector p
X
(k) = [p
X,s
(k)] where p
X,s
(k) is the
cell probability mass. The evolution of cell probabil-
ity vectors is described by a Markov chain represented
by a set of linear equations
p
X
(k+ 1) = P
a(k)
p
X
(k)
or, equivalently,
p
X,s
(k+ 1) =
sS
p
a
s
,s
p
X,s
(k)
where P
a
is the transition probability matrix under ac-
tion a, P
a
=
h
p
a
s
,s
i
.
3 CONTROL DESIGN
Using a GCM model of the plant, an optimal control
action for each state can be solved by minimizing a
cost function. In both optimal (Kaelbling et al., 1996)
and predictive control (Ikonen and Najim, 2002) the
cost function is defined in a future horizon, based on
specification of immediate costs for each state-action
pair. Whereas optimal control considers (discounted)
infinite horizons and solves the problem using dy-
namic programming, nonlinear predictive control ap-
proaches rely on computation of future trajectories
(predictions) and exhaustive search.
3.1 Optimal Control
In optimal control, the control task is to find an ap-
propriate mapping (optimal policy or control table)
π from states (x) to control actions (u), given the
immediate costs r(x(k), u(k)). The infinite-horizon
discounted model attempts to minimize the geometri-
cally discounted immediate costs
J(x) =
k=0
γ
k
r(x(k), π(x(k)))
under initial conditions x(0) = x. The optimal control
policy π
is the one that minimizes J. The optimal
cost-to-go is given by J
= min
π
J.
Bellman’s principle of optimality states that
J
(x) = min
u
[r(x, u) + γJ
( f (x, u))]
i.e., the optimal solution (value) for state x is the sum
of immediate costs r and the optimal cost-to-go from
the next state, J
( f (x, u)). Application of the Bell-
man equation leads to methods of dynamic program-
ming.
3.1.1 Value Iteration
In value iteration, the optimal value function is de-
termined by a simple iterative algorithm derived di-
rectly from the Bellman equation. Let the immediate
costs be given in matrix R = [r
a
], with column vec-
tors r
a
= [r
a
s
], and collect the values of the cost-to-go
at iteration i into a vector J
(i) = [J
s
(i)]. Given ar-
bitrary initial values J
s
(0), the costs are updated for
i = 0, 1, 2, ...:
Q
a
s
(i) = r
a
s
+ γ
s
S
p
a
s
,s
J
s
(i)
J
s
(i+ 1) = min
aA
Q
a
s
(i)
s, a, until the values of J
s
(i) converge. Denote the
converged values by J
s
. The optimal policy is then
obtained from
π
s
= argmin
aA
"
r
a
s
+ γ
s
S
p
a
s
,s
J
s
#
.
3.2 Predictive Control
Given a system model and the associated costs, we
can easily set up a predictive control type of a prob-
lem. In predictive control, the costs are minimized in
an open loop in a fixed horizon
J
x(k) , ..., x
k+ H
p
, u(k) , ..., u
k+ H
p

=
H
p
h=0
r(x(k + h), u(k+ h))
under initial conditions x. In practice it is useful to in-
troduce a control horizon, where it is assumed that the
control action will remain fixed after a given number
of steps, H
c
. Often only one step is allowed and the
optimization problem reduces to the minimization of
J
x(k) , ..., x
k+ H
p
, u(k)
.
Under control action a, the costs are given by
J
a
=
H
p
h=0
[r
a
]
T
p
X
(k+ h)
=
H
p
h=0
[r
a
]
T
[P
a
]
h
p
X
(k)
where r
a
= [r
a
s
] is a column vector of immediate costs
and p
X
(k) is current state cell pdf. In order to solve
the problem, it suffices to evaluate the costs for all
a
A and select the one minimizing the costs. The
prediction horizon H
p
is a useful tuning parameter; a
ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics
80
long prediction horizon leads to mean level type of
control.
The control policy mapping π
can be obtained by
solving the above problem in each state s and tabulat-
ing the results: π
s
= argmin
a
J
a
.
For many practical cases, a good controller de-
sign can be obtained using either the optimal con-
trol approach, or the predictive control approach with
H
c
= 1. In some cases, however, an engineer may be
interested in extending the controller design possibil-
ities to larger control horizons. In principle, this is
straightforward to realize in the GCM context: One
simply creates A different sequences of control ac-
tions, simulates the system accordingly, and selects
the sequence that minimizes the cost function.
4 SYSTEM ANALYSIS
The generalized cell-to-cell mapping is a powerful
tool for analysis of nonlinear systems. In what fol-
lows, it is assumed that the system map (Markov
chain) is described by transition probabilities P. This
may correspond to the process output under a fixed
(open loop) control action a (P := P
a
) or the sys-
tems closed loop behavior obtained from the con-
struction of transition probabilities under u = π(x):
P := P
π
=
h
p
π
s
,s
i
.
4.1 Characterization of Cells
In GCM, an useful characterization of cells is ob-
tained by studying the long term behavior of the
Markov chain. A state is said to be recurrent (Na-
jim et al., 2004) iff starting at a given state there will
be a return to the state with probability 1. Otherwise
the state is said to be transient. If p
s,s
= 1, a state is
said to be absorbing.
Decomposing the probability vector into recurrent
cells (i
r
I
r
) and transient cells (i
t
I
t
), the Markov
chain can be written as follows:
p
r
(k+ 1)
p
t
(k+ 1)
=
P
rr
P
rt
0 P
tt
p
r
(k)
p
t
(k)
As k , the recurrent cells are visited infinitely of-
ten, whereas the transient cells are visited only finitely
often. Among the recurrent cells, we can further clas-
sify the absorbing cells, (i
a
I
a
): P
aa
= I. The ab-
sorbing states are never left, once visited.
The recurrent cells form communicating classes
(closed subsets), where the cells within each com-
municating class (inter)communicate with each other,
i.e., the probability of transition from one state to the
other is nonzero, and do not communicate with other
states. Each absorbing state only communicates with
itself. A closed communicating class constitutes a
sub-Markov chain, which can be studied separately.
A stationary probability distribution satisfies
p
X
=
P
p
X
and, consequently, the distribution must be an
eigenvector of P; for the distribution to be a probabil-
ity distribution, the eigenvalue must be one. There-
fore, the recurrent cells are found by searching for
the unit amplitude eigenvalues of P; the nonzero el-
ements of the associated eigenvectors
p
X
point to the
recurrent cells.
4.2 Stability and Size of
Basin-of-attraction
Examination of the behavior of transient cells as they
enter the recurrent cells reveals the dynamics of the
nonlinear system. We have that
p
r
(k+ 1) = P
rr
p
r
(k) + P
rt
p
t
(k)
= P
rr
p
r
(k) + P
rt
P
k
tt
p
t
(0)
where P
rt
P
k
tt
represents the conditional probability
that a solution starting from a transient cell will pass
into an recurrent cell at time k + 1. The probability
that this will eventually happen, P
t2r
, is given by
P
t2r
=
k=0
P
rt
P
k
tt
= P
rt
(1 P
tt
)
1
Each recurrent cell belongs to a communicating
class, for absorbing cells this class consists of a sin-
gle cell. The probability of transition into a partic-
ular communicating class is obtained by summing
(column-wise) the entries in P
t2r
.
The sink cell (Hsu, 1987) is an absorbing cell that
represents the entire region outside the domain of in-
terest. A nonzero probability to enter the sink cell
indicates unstability of the system (given the resolu-
tion of the model). In the experimental section, the
stationary probabilities of entering the sink cell are
examined.
High probability cells determine the basin-of-
attraction. The size of the basin-of-attraction’ for
each state was characterized by taking the sum of
probabilities for entering a recurrent cell (from any
transient cell, or from any recurrent cell) and weight-
ing it with the probability of occurance within a com-
municating class (i.e., multiplying this with the sta-
tionary mapping P
):
B = P
"
iI
t
[P
t2r
]
j,i
+
iI
r
[P
rr
]
j,i
#
where P
is a mapping to stationary distribution:
P
= lim
n
1
n
P
n
rr
, and [x]
a,b
denotes an element
of x in a’th row and b’th column. Elements of B take
PROCESS CONTROL USING CONTROLLED FINITE MARKOV CHAINS WITH AN APPLICATION TO A
MULTIVARIABLE HYBRID PLANT
81
values in the interval [0,S],
iI
r
[B]
i
= S. A large
value in B indicates the presence of the following
ingredients: the communicating class to which a re-
current state belongs to can be accessed from a large
number of states; the probability of entering the class
from these states is high; the stationary probability
for the occurance of a recurrent state (within a com-
municating class) is high. Recall that when working
with large state-action spaces, separate examination
of all states is hopeless. In the experimental section
we hope to bring some light to some control relevant
properties of the (open or closed-loop) system by pro-
jecting B to dimensions of x.
Based on the above analysis, some communicat-
ing classes can then be taken under closer exami-
nation: simulation of (expected or random) trajecto-
ries, examination of cells in basins of attraction, etc.
In assessing control performance, examination of the
speed of the system (lengths of trajectories converg-
ing to communicating classes) is of great interest. The
absorbtion time from the i’th state to the j’th state
(i
I
t
, j
I
r
), E
{
k
}
, is obtained from:
E
{
k
}
= P
rt
k=0
kP
k
tt
= P
rt
(1 P
tt
)
2
.
5 SIMULATION STUDY
In this section some numerical results based on sim-
ulations ar given. The following control design prob-
lem set-up was envisioned: A nonlinear state-space
model of the plant is available (a set of ordinary dif-
ferential equations, for example), and a decision on
input, state, and output variables has been made. A
controller is now seeked for, such that desired transi-
tions between plant output set points would be opti-
mal. A typical GCM control design procedure would
involve the following (iterative) steps:
Set model resolution by specifying discretization
of plant inputs, states, outputs and output set
points; and sampling time.
Set control targets by specifying immediate costs.
Build a GCM plant model (by successive evalua-
tions of the original model, and counting the oc-
curred state transitions) and analyze its behavior.
Design a controller (e.g., optimal or predictive),
based on the GCM plant model.
Build a GCM closed-loop model and analyze its
behavior.
5.1 Experiment Setup
Let us consider a simple example of a two-tank
MIMO system (see (
˚
Akesson et al., 2006) and ref-
erences there). In this hybrid system the action space
(space of control inputs) consists of both real-valued
and discrete-valued variables. The objective is to keep
the temperature (T
2
) in the second tank at it setpoint,
while keeping the levels of both tanks (h
1
, h
2
) within
preset limits. The system is controlled by a valve
for the first tank input flow, a pump between the two
tanks, a heater in the second tank, and a valve for
the second tank output flow. The heater (u
1
) is con-
strained to continuous values in the interval between 0
and 560 kW, the pump (u
2
) has three operational lev-
els {off, medium, high}, the valves (u
3
, u
4
) are binary
{on/off}.
The system is described by the mass and energy
balances
d
dt
h
1
=
1
A
1
(v
1
u
3
αu
2
)
d
dt
T
1
=
1
A
1
h
1
(v
2
T
1
)v
1
u
3
d
dt
h
1
=
1
A
2
(αu
2
v
3
u
4
)
d
dt
T
2
=
1
A
2
h
2
(T
1
T
2
)αu
2
+
u
1
c
l
ρ
l
where subscript 1 refers to the first tank (buffer), sub-
script 2 to the second tank (supply); v
1
is the inflow,
v
2
the inflow temperature, and v
3
the outflow; A is
the tank area (A
1
= 3.5 m
2
, A
2
= 2 m
2
), c
l
and ρ
l
are
the liquid specific heat capacity and density (c
l
= 4.2
kJ
kg K
, ρ
l
= 1000
kg
m
3
); α is a pump capacity factor
(α = 1
m
3
min
).
A discrete time Markov model (1)–(2) x(k) =
f (x(k) , u(k)) for the system was constructed by
forming the system state and controls as follows: x(k)
= [ h
1
(k), T
1
(k), h
2
(k), T
2
(k), v
1
(k), v
2
(k), v
3
(k)]
and u(k) = [u
1
(k) , u
2
(k), u
3
(k) , u
4
(k)], y(k) =
x
4
(k). Since the model is based on another (determin-
istic) model we omit the disturbances here, w v 0
in (1)–(2). The state space was discretized by form-
ing a grid, where tank levels and temperatures were
quantized as follows:
x
ref
1
= x
ref
3
=
{
0, 1, 2, ..., 9
}
[m]
x
ref
2
=
{
17, 18, 19
}
, x
ref
4
=
{
17, 18, ..., 24
}
[C]
disturbances into one and three values:
v
1
= v
3
=
{
1
}
, v
2
=
{
17, 18, 19
}
,
and heating action in five values :
u =
{
0, 140, ..., 560
}
[kW].
Roughly, the above states that deviations less than 0.5
C in the supply tank temperature are out of our in-
terest. Since we also want to place constraints on
ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics
82
high and low tank levels, with the above discretiza-
tion we can set the switching point from allowed to
non-desirable to 0.5 and 8.5 m. The quantizations
in the disturbances allow to take known (step-wise)
disturbances into account when deriving the optimal
controls. The immediate costs were set based on the
Euclidean norm between desired and reference tem-
peratures,
k
w y
s
k
and deviation from nominal con-
trols for u
2
, u
3
and u
4
at 1 with weights 0.1, 2 and
2 respectively (see (
˚
Akesson et al., 2006)). For the
states where reference points execeeded either upper
or lower limits for the tank level, an additional large
cost was added (ten times larger than largest cost so
far). A 100 times larger cost was set for the sink cell.
5.2 Computer Simulations
The selected discretization results in a finite state–
action space of 7201 states and 60 actions, including
the sink cell. For the considered computational plat-
form (a standard office PC: 3GHz Pentium 4 CPU,
1GB RAM, MATLAB R12) this posed no problem.
In fact, up to ten times larger state–action spaces were
experimented successfully.
A GCM model was built by evaluating the state
transitions ve hundered times for each possible state-
action pair (s, a). The starting state was generated
from a random uniform distribution from within the
state hypercube. This resulted in roughly 2 × 10
8
evaluations of the plant model; within each evalua-
tion the ode were solved one sampling time (15 sec-
onds) ahead using a standard ode-solver (MATLAB
ode23). While most of the computing time was spent
on solving the ode, the computations took a couple of
hours. Clearly this presented a significant burden both
in terms of computing power and memory, but not
excessive at all. As long as the discretization of the
state-action spaces is kept fixed during latter stages of
control design, the model need not be re-evaluated,
even if other parameters such as the immediate costs
(R) or controller design parameters (H
p
or γ) would be
modified.
Given the GCM model, a predictive controller was
designed using H
p
= 5. Using the obtained control
policy π
, a closed loop GCM map was constructed.
The stability of the closed loop system is revealed
by examination of the probabilities of entering the
sink cell. The probabilities for entering the sink cell
from any other cell were zero. Consequently, the
closed loop system was stable for all initial states and
set points.
Figure 1 illustrates the sizes of the basins-of-
attraction, projected to three different dimensions of
x: level of tank 1 (x
1
), level of tank 2 (x
3
), and tem-
0 1 2 3 4 5 6 7 8 9
0
20
40
60
80
100
x
1
# b−of−a (%)
Σ 7201
Size of basin−of−attraction
0 1 2 3 4 5 6 7 8 9
0
20
40
60
80
100
x
3
# b−of−a (%)
Σ 7201
17 18 19 20 21 22 23 24
0
20
40
60
80
100
x
4
# b−of−a (%)
Σ 7201
Figure 1: Size of basin-of-attraction. The probability mass
for entering a particular state, projected to dimensions of x.
Top plots: tank levels, bottom plot: tank 2 temperature.
perature of tank 2 (x
4
). The bars in the plots show the
size of basin-of-attraction (as a percentage of whole
state space), i.e., the cumulative number of states that
are mapped to a recurrent state, recurrent states being
sorted according to the their projection to dimension
of x. From top plots in Fig 1, it can be immediately
observed that the constraints on tank levels are ful-
filled: the basin-of-attraction is empty for projections
to levels 0 or 9, to both tanks.
The projections to tank 2 temperature, see bot-
tom plot in Fig 1, show the success (controllability)
of the plant to its set point (in steady state). For
set points 19
C... 23
C, almost 100% success is
obtained. For low temperatures, the smaller sizes
of basins-of-attraction are explained by the lack of
means to cool the incoming feed. Since one third of
the states characterizes states with input feed equal to
19
C, one third in 18
C and one third in 17
C, it is
easy to understand that the setpoint of 17 degrees is
attained only when the input feed is 17
C, etc.
In few cases (three initial states), the model pre-
dicted transitions that could be judged as impossible
using physical arguments. Closer examination of the
plant model statistics revealed that these cases were
due to the random sampling when building the GCM
model, and the slow dynamics of the system. For ex-
ample, an input feed in 19
C with no heating resulted
the 18
C steady state temperature in both tanks. Ex-
amining the plant model, it was straightforward to at-
tribute this to the fact that during the 500 simulations,
none of the simulated trajectories had lead to another
discrete state. Consequently, this state was catego-
rized as absorbing. The remedy for this problem is
to increase either the sampling rate, or the number of
evaluations.
PROCESS CONTROL USING CONTROLLED FINITE MARKOV CHAINS WITH AN APPLICATION TO A
MULTIVARIABLE HYBRID PLANT
83
p(x
4
)
0 10
17
18
19
20
21
22
23
24
0 2 4 6 8 10
17
18
19
20
21
22
23
24
x
4
Figure 2: A communicating class consisting of two states.
The closed loop system is ’ringing’.
For set points 19
C... 23
C, full 100% sizes of
basin-of-attraction were not obtained. Instead, from
bottom plot in Fig 1 it can be observed that a small
percentage of the probability mass is distributed to
the neighboring projections. A closer examination of
these cases reveals a typical reason for this. Top plot
in Fig. 2 shows the evolution of state in closed-loop
from a particular initial state. The top plot shows the
state probabilities projected towards the dimension x
4
(tank 2 temperature), for the case of setpoint in 20
C.
As suggested by the plot, and as can be detected by
examination of the communicating classes, the sta-
tionary distribution is a communicating class formed
by two states. A sample trajectory is illustrated in the
bottom plot, Fig. 2, showing that a phenomenon of
ringing’ clearly takes place. Most of the time is spent
in the desired set point, but occasionnally the system
crosses the border and visits the state 19
C.
For a particular communicating class, or a state
in it, very precise information can be obtained. For
example, for the closed-loop system with set point in
20
C there were 182 communicating classes of which
67 were absorbing. For the ringing’-class example
the basin-of-attraction contained 201 states (there was
a nonzero probability of entering this class from 201
states). The expected time of absorbtion within these
states ranged from 0 (from the recurrent states) to 8.85
minutes. Evolution of the transition probability dis-
p(x
2
)
1 30
17
18
19
p(x
4
)
1 30
17
18
19
20
21
22
23
24
Figure 3: Evolution of probability distribution. The ex-
pected time of absorbtion is 9 minutes.
tribution (projected to two dimensions of x) from the
slowest initial state is shown in Fig. 3, confirming the
exactness of this result. Unfortunately, it is not feasi-
ble to examine all states with this much care.
As a final example of system analysis, let us ex-
amine if there were any cases when a plant shutdown
would happen (heating off, pump off, valves off). Ex-
amination of the policies (for all setpoints) revealed
that only the sink cell resulted in plant shutdown. An-
other set of interesting control commands could be the
one characterized by high pumping and closing of the
output valve. It was observed that for each set point,
the policy table contained roughly 800 occurances of
this control (i.e., in 800 out of 7201 states, this was
the control to apply). These states were characterized
by low levels in tank 2. Again, this appeared to make
sense from an engineering point of view.
It can be concluded that convenient tools for anal-
ysis of the closed loop system were found, including
examination of stability and steady state performance.
The characterization of system performance in terms
of speed of response was more tedious. It is not clear
what could be done, as –for a nonlinear system– be-
havior differs from state to state, and computation of
expectations and worst case scenarios does not nec-
essarily reveal feasible information. A partial rem-
edy is provided by the simple –and extremely com-
monly used– approach of simulating the closed loop
ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics
84
0 50 100 150 200
17
18
19
20
21
22
23
24
y
0 50 100 150 200
0
140
280
420
560
u
1
0 50 100 150 200
0
1
2
u
2
time
Figure 4: Closed-loop simulation. The set point trajectory
consists of a series of steps and ramps. A the end of simu-
lation, an input disturbance affects the system.
system under typical operating conditions. Figure 4
illustrates a trajectory following simulation with (a
known) input disturbance. It can be concluded that
the behavior of the closed loop system is adequate.
6 DISCUSSION AND
CONCLUSIONS
In this preliminary work we have focused on using
Markov chains and MDP as process control design
tool, to be used bearing in mind the resolution of
the problem set up (discretization into a finite state
space). Continuing in the same direction, the prob-
lem of identication is then related to keeping the orig-
inal model up-to-date (the ode, for example), or –at
least– approximating the original model using func-
tion approximation techniques, rather than looking
for clever tricks to make counting feasible in the fi-
nite state space, doomed to be huge. If doable, the
benefits are clear: physical interpretation of estimated
parameters. In many process engineering problems,
this may turn out to be more fruitful than pure ma-
chine learning approaches.
Instead, the problem of uncertainty in measure-
ments can potentially be handled in a very elegant and
efficient fashion using finite Markov chains (Ikonen,
2004). Given the finite state probabilistic descrip-
tion of the plant, it is straightforward to construct cost
functions taking into account the uncertainty in the
predictions (other than discounted conditional expec-
tations). Under the predictive control paradigm, also
uncertainties in current state can be taken into account
in plant predictions (i.e., there’s no need to restrict to
ML estimates, etc.). Conducting a literature review on
these topics is a major direction in our future research.
REFERENCES
˚
Akesson, B. M., Nikus, M. J., and Toivonen, H. T. (2006).
Explicit model predictive control of a hybrid system
using support vector machines. In Proceedings of the
1st IFAC Workshop on Applications of Large Scale
Industrial Systems (ALSIS’06), Helsinki-Stockholm,
Finland-Sweden.
H
¨
aggstr
¨
om, O. (2002). Finite Markov Chains and Algorith-
mic Applications. Cambridge University Press, Cam-
bridge.
Hsu, C. S. (1987). Cell-to-Cell Mapping - A Method of
Global Analysis for Nonlinear Systems. Springer-
Verlag, New York.
Ikonen, E. (2004). Learning predictive control using prob-
abilistic models. In IFAC Workshop on Advanced
Fuzzy/Neural Control (AFNC’04), Oulu, Finland.
Ikonen, E. and Najim, K. (2002). Advanced Process Identi-
fication and Control. Marcel Dekker, New York.
Kaelbling, L. P., Littman, M. L., and Moore, A. W. (1996).
Reinforcement learning: A survey. Journal of Artifi-
cial Intelligence Research, 4:237–285.
Lee, J. M. and Lee, J. H. (2004). Approximate dynamic
programming strategies and their applicability for pro-
cess control: A review and future directions. Interna-
tional Journal of Control, Automation, and Systems,
2(3):263–278.
Lunze, J. (1998). On the Markov property of quantised state
measurement sequences. Automatica, 34(11):1439–
1444.
Lunze, J., Nixdorf, B., and Richter, H. (2001). Process su-
pervision by means of a hybrid model. Journal of Pro-
cess Control, 11:89–104.
Najim, K., Ikonen, E., and Ait-Kadi, D. (2004). Stochas-
tic Processes - Estimation, Optimization and Analysis.
Kogan Page Science, London.
Negenborn, R. R., De Schutter, B., Wiering, M. A., and Hel-
lendoorn, H. (2005). Learning-based model predictive
control for Markov decision processes. In 16th IFAC
World Congress.
Poznyak, A. S., Najim, K., and G
´
omez-Ram
´
ırez, E. (2000).
Self-Learning Control of Finite Markov Chains. Mar-
cel Dekker, New York.
Puterman, M. L. (1994). Markov Decision Processes
Discrete Stochastic Dynamic Programming. Wiley et
Sons, New York.
PROCESS CONTROL USING CONTROLLED FINITE MARKOV CHAINS WITH AN APPLICATION TO A
MULTIVARIABLE HYBRID PLANT
85