PROCESS CONTROL USING CONTROLLED FINITE MARKOV

CHAINS WITH AN APPLICATION TO A MULTIVARIABLE

HYBRID PLANT

Enso Ikonen

University of Oulu, Department of Process and Environmental Engineering, Systems Engineering Laboratory

4PYOSYS, FIN-90014 Oulun yliopisto, Finland

Keywords:

Markov decision process, generalized cell-to-cell mapping, qualitative modelling.

Abstract:

Predictive and optimal process control using ﬁnite Markov chains is considered. A basic procedure is outlined,

consisting of discretization of plant input and state spaces; conversion of a (a priori) plant model into a set of

ﬁnite state probability transition maps; speciﬁcation of immediate costs for state-action pairs; computation of

an optimal or a predictive control policy; and, analysis of the closed-loop system behavior. An application,

using a MATLAB toolbox developed for MDP-based process control design, illustrates the approach in the

control of a multivariable plant with both discrete and continuous action variables. For problems of size of

practical signiﬁcance (thousands of states), computations can be performed on a standard ofﬁce PC. The aim

of the work is to provide a basic framework for examination of nonlinear control, emphasizing in on-line

learning from uncertain data.

1 INTRODUCTION

For identiﬁcation and control of stochastic nonlinear

dynamic systems, no general method exists. Devel-

opment of physical models is typically far too time

consuming and knowledge intensive, and results in

models that are not well suited for process control de-

sign. Instead, in industrial practice, linear approxi-

mations have turned out to be most useful, commonly

extended by considering local linear approximations

(cf. gain scheduling, indirect adaptive control, piece-

wise (multi)linear multimodel systems, etc.), where

linear descriptions vary with state and/or time.

For nonlinear plant identiﬁcation, a multitude of

efﬁcient methods exists (Ikonen and Najim, 2002).

These include polynomial functions, neural nets,

etc. Identiﬁcation of nonlinear dynamical relations is

more difﬁcult. Time series approaches (NARMAX,

etc.) are a common and straightforward framework

for extending to dynamic nonlinear systems. They

rely in that mapping past (delayed) signals through

a nonlinear static function will enable capturing the

system dynamics. The properties of these models are,

in general, difﬁcult to analyze, however; complicating

signiﬁcantly the control design. A common simpliﬁ-

cation is that of Wiener and Hammerstein systems: to

consider static process nonlinearities only and equip

the static nonlinear model with separate linear dynam-

ics. This is a powerful approach in that the control

design can be largely based on linear analysis, and in

that knowledge on plant static nonlinearities can be

exploited in the development of plant control. How-

ever, only linear dynamics can be dealt with.

This paper focuses on an approach that can cope

with a large class of nonlinear systems: the ﬁnite

Markov chains (Puterman, 1994) (H

aggstr

om, 2002)

(Poznyak et al., 2000). The basic idea is simple.

The system state space is quantized (discretized, par-

titioned, granulated) into a ﬁnite set of states (cells),

and the evolution of system state in time is mapped in

a probabilistic (frequentist) manner. With controlled

Markov chains, the mappings are constructed from

each state-action pair. Once equipped with such a

model, a control action for each state can be deduced

by minimizing a cost function deﬁned in a future

horizon, based on speciﬁcation of immediate costs

for each state-action pair. Speciﬁcation of immediate

costs allows versatile means for characterising the de-

sired control behavior. Dynamic programming, stud-

ied in the ﬁeld of Markov decision processes (MDP),

Ikonen E. (2007).

PROCESS CONTROL USING CONTROLLED FINITE MARKOV CHAINS WITH AN APPLICATION TO A MULTIVARIABLE HYBRID PLANT.

In Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics, pages 78-85

DOI: 10.5220/0001615300780085

 SciTePress

offers a way to solve various types of expected costs.

Since a process model is available, the paradigm of

model predictive control can also be used to derive

the desired controls.

As the basic ideas are old, well-known, and

widespread, relevant literature can be found from

many ﬁelds and under different keywords: gener-

alized cell-to-cell mapping (Hsu, 1987), qualitative

modelling (Lunze et al., 2001), and reinforcement

learning (Kaelbling et al., 1996), for example. Much

of the terminology in the sections that follow origi-

nate from (Hsu, 1987): we refer to mappings between

cells as (simple or) generalized cell maps, we use a

sink cell, etc.

As pointed out in (Lee and Lee, 2004) (see also

(Ikonen, 2004) (Negenborn et al., 2005)), applications

of MDP in process control have been few; instead,

the model predictive control paradigm is very popu-

lar in the process control community. Whereas not-

so-many years ago the computations associated with

ﬁnite Markov chains were prohibitive, the computing

power available using cheap ofﬁce-pc’s enables the

re-exploration of these techniques.

The work described in this paper aims at build-

ing a proper basic framework for examining the pos-

sibilities of controlled ﬁnite Markov chains in nonlin-

ear process control. A majority of current literature

on MDP examines means to overcome the problem

of curse-of-dimensionality, e.g., by means of func-

tion approximation (neuro-dynamic programming, Q-

learning, etc.). The main obstacle in such approaches

is in that the unknown properties introduced by the

mechanisms of function approximation make void the

fundamental beneﬁt of applying ﬁnite Markov chains:

a straightforward and elegant means to describe and

analyse the dynamic characteristics of a stochastic

nonlinear system. Recall how linear systems are lim-

ited by the extremely severe assumption of linearity

(afﬁnity), yet they have turned out to be outmost use-

ful for control design purposes. In a similar way, the

ﬁnite Markov chains are fundamentally limited by the

resolution of the problem presentation (discretization

of state-action spaces). The hypothesis of this work is

that keeping in mind this restriction (just as we keep

in mind the assumption of linearity) the obtained re-

sults can be most useful. The practical validity of this

statement is in the focus of the research.

In particular, the ﬁeld of process engineering is in

our main concern, with applications characterized by:

availability of rough process models, slow sampling

rates, nonlinearities that are either smooth or appear

as discontinuities, expensive experimentation (large-

scale systems running in production), and substantial

on-site tuning due to uniqueness of products. Clearly,

this type of requirements differ from those encoun-

tered, e.g., in the ﬁeld of economics (lack of reli-

able models), robotics (very precise models are avail-

able), consumer electronics (mass production of low

cost products), telecommunication (extensive use of

test signals, fast sampling), or academic toy problems

(ridiculously complex multimodal test functions).

Due to systematical errors, noise, and lack of ac-

curacy in measurements of process state variables,

among many other reasons, there is a urgent need for

extended means of learning and handling of uncer-

tainties. The ﬁnite Markov chains provide straight-

forward means for dealing with both of these issues.

This paper is organized as follows: The process

models are considered in section 2, control design in

section 3, open and closed loop system analysis in

section 4. The MATLAB toolbox, and an illustra-

tive example is provided in section 5. Discussion on

aspects relevant to learning under uncertainties, and

conclusions, are given in the ﬁnal section.

2 GENERALIZED CELL

MAPPING

Let the process under study be described by the fol-

lowing discrete-time dynamic system and measure-

ment equations

x(k) = f (x(k− 1), u(k− 1), w(k− 1)) (1)

y(k) = h(x(k), v(k)) (2)

where f : ℜ

× ℜ

→ ℜ

and h : ℜ

ℜ

→ ℜ

are nonlinear functions, w

∈ ℜ

and

∈ ℜ

are i.i.d. white noise with probability den-

sity functions (pdf’s) p

and p

. The initial condition

is known via p

(0).

Let the state space be partitioned into a ﬁnite num-

ber of sets called state cells, indexed by s ∈

S =

{

1, 2, ..., S

}

. The index s is determined from

s = argmin

s∈S



x− x

ref



where x

ref

are reference points (e.g., cell centers). In

addition, let us deﬁne a ’sink cell’, s

sink

; a state is cat-

egorized into a sink cell if min

s∈S



x− x

ref



> x

lim

Similarly, let the control action and measurement

spaces be partitioned into cells indexed by a ∈

A =

{

1, 2, ..., A

}

and m ∈

M =

{

1, 2, ..., M

}

, respectively,

and determined using reference vectors u

ref

and y

ref

The partitioning results in

X = ∪

s=1

U = ∪

a=1

and

Y = ∪

m=1

The evolution of the system can now be approxi-

mated as a ﬁnite state controlled Markov chain over

PROCESS CONTROL USING CONTROLLED FINITE MARKOV CHAINS WITH AN APPLICATION TO A

MULTIVARIABLE HYBRID PLANT

the cell space (however, see (Lunze, 1998)). In sim-

ple cell mapping (SCM), one trajectory is computed

for each cell. Generalized cell mapping (GCM) con-

siders multiple trajectories starting from within each

cell, and can be interpreted in a probabilistic sense as

a ﬁnite Markov chain.

2.1 Evolution of States

Let the state pdf be approximated as a S×1 cell prob-

ability vector p

(k) = [p

X,s

(k)] where p

X,s

(k) is the

cell probability mass. The evolution of cell probabil-

ity vectors is described by a Markov chain represented

by a set of linear equations

(k+ 1) = P

a(k)

(k)

or, equivalently,

X,s

′

(k+ 1) =

∑

s∈S

′

X,s

(k)

where P

is the transition probability matrix under ac-

tion a, P

′

3 CONTROL DESIGN

Using a GCM model of the plant, an optimal control

action for each state can be solved by minimizing a

cost function. In both optimal (Kaelbling et al., 1996)

and predictive control (Ikonen and Najim, 2002) the

cost function is deﬁned in a future horizon, based on

speciﬁcation of immediate costs for each state-action

pair. Whereas optimal control considers (discounted)

inﬁnite horizons and solves the problem using dy-

namic programming, nonlinear predictive control ap-

proaches rely on computation of future trajectories

(predictions) and exhaustive search.

3.1 Optimal Control

In optimal control, the control task is to ﬁnd an ap-

propriate mapping (optimal policy or control table)

π from states (x) to control actions (u), given the

immediate costs r(x(k), u(k)). The inﬁnite-horizon

discounted model attempts to minimize the geometri-

cally discounted immediate costs

J(x) =

∑

∞

k=0

r(x(k), π(x(k)))

under initial conditions x(0) = x. The optimal control

policy π

∗

is the one that minimizes J. The optimal

cost-to-go is given by J

∗

= min

Bellman’s principle of optimality states that

∗

(x) = min

[r(x, u) + γJ

∗

( f (x, u))]

i.e., the optimal solution (value) for state x is the sum

of immediate costs r and the optimal cost-to-go from

the next state, J

∗

( f (x, u)). Application of the Bell-

man equation leads to methods of dynamic program-

ming.

3.1.1 Value Iteration

In value iteration, the optimal value function is de-

termined by a simple iterative algorithm derived di-

rectly from the Bellman equation. Let the immediate

costs be given in matrix R = [r

], with column vec-

tors r

= [r

], and collect the values of the cost-to-go

at iteration i into a vector J

∗

(i) = [J

∗

(i)]. Given ar-

bitrary initial values J

∗

(0), the costs are updated for

i = 0, 1, 2, ...:

(i) = r

+ γ

∑

′

∈S

′

∗

′

(i)

∗

(i+ 1) = min

a∈A

(i)

∀s, a, until the values of J

∗

(i) converge. Denote the

converged values by J

∗

. The optimal policy is then

obtained from

∗

= argmin

a∈A

+ γ

∑

′

∈S

′

∗

′

3.2 Predictive Control

Given a system model and the associated costs, we

can easily set up a predictive control type of a prob-

lem. In predictive control, the costs are minimized in

an open loop in a ﬁxed horizon



x(k) , ..., x



k+ H



, u(k) , ..., u



k+ H



∑

h=0

r(x(k + h), u(k+ h))

under initial conditions x. In practice it is useful to in-

troduce a control horizon, where it is assumed that the

control action will remain ﬁxed after a given number

of steps, H

. Often only one step is allowed and the

optimization problem reduces to the minimization of



x(k) , ..., x



k+ H



, u(k)



Under control action a, the costs are given by

∑

h=0

]

(k+ h)

∑

h=0

]

(k)

where r

= [r

] is a column vector of immediate costs

and p

(k) is current state cell pdf. In order to solve

the problem, it sufﬁces to evaluate the costs for all

a ∈

A and select the one minimizing the costs. The

prediction horizon H

is a useful tuning parameter; a

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

long prediction horizon leads to mean level type of

control.

The control policy mapping π

⋄

can be obtained by

solving the above problem in each state s and tabulat-

ing the results: π

⋄

= argmin

For many practical cases, a good controller de-

sign can be obtained using either the optimal con-

trol approach, or the predictive control approach with

= 1. In some cases, however, an engineer may be

interested in extending the controller design possibil-

ities to larger control horizons. In principle, this is

straightforward to realize in the GCM context: One

simply creates A different sequences of control ac-

tions, simulates the system accordingly, and selects

the sequence that minimizes the cost function.

4 SYSTEM ANALYSIS

The generalized cell-to-cell mapping is a powerful

tool for analysis of nonlinear systems. In what fol-

lows, it is assumed that the system map (Markov

chain) is described by transition probabilities P. This

may correspond to the process output under a ﬁxed

(open loop) control action a (P := P

) or the sys-

tems closed loop behavior obtained from the con-

struction of transition probabilities under u = π(x):

P := P

′

4.1 Characterization of Cells

In GCM, an useful characterization of cells is ob-

tained by studying the long term behavior of the

Markov chain. A state is said to be recurrent (Na-

jim et al., 2004) iff starting at a given state there will

be a return to the state with probability 1. Otherwise

the state is said to be transient. If p

s,s

= 1, a state is

said to be absorbing.

Decomposing the probability vector into recurrent

cells (i

∈

) and transient cells (i

∈

), the Markov

chain can be written as follows:



(k+ 1)





0 P



(k)



As k → ∞, the recurrent cells are visited inﬁnitely of-

ten, whereas the transient cells are visited only ﬁnitely

often. Among the recurrent cells, we can further clas-

sify the absorbing cells, (i

∈

): P

= I. The ab-

sorbing states are never left, once visited.

The recurrent cells form communicating classes

(closed subsets), where the cells within each com-

municating class (inter)communicate with each other,

i.e., the probability of transition from one state to the

other is nonzero, and do not communicate with other

states. Each absorbing state only communicates with

itself. A closed communicating class constitutes a

sub-Markov chain, which can be studied separately.

A stationary probability distribution satisﬁes

and, consequently, the distribution must be an

eigenvector of P; for the distribution to be a probabil-

ity distribution, the eigenvalue must be one. There-

fore, the recurrent cells are found by searching for

the unit amplitude eigenvalues of P; the nonzero el-

ements of the associated eigenvectors

point to the

recurrent cells.

4.2 Stability and Size of

Basin-of-attraction

Examination of the behavior of transient cells as they

enter the recurrent cells reveals the dynamics of the

nonlinear system. We have that

(k+ 1) = P

(k) + P

(k)

= P

(k) + P

(0)

where P

represents the conditional probability

that a solution starting from a transient cell will pass

into an recurrent cell at time k + 1. The probability

that this will eventually happen, P

t2r

, is given by

t2r

∑

∞

k=0

= P

(1− P

)

−1

Each recurrent cell belongs to a communicating

class, for absorbing cells this class consists of a sin-

gle cell. The probability of transition into a partic-

ular communicating class is obtained by summing

(column-wise) the entries in P

t2r

The sink cell (Hsu, 1987) is an absorbing cell that

represents the entire region outside the domain of in-

terest. A nonzero probability to enter the sink cell

indicates unstability of the system (given the resolu-

tion of the model). In the experimental section, the

stationary probabilities of entering the sink cell are

examined.

High probability cells determine the basin-of-

attraction. The ’size of the basin-of-attraction’ for

each state was characterized by taking the sum of

probabilities for entering a recurrent cell (from any

transient cell, or from any recurrent cell) and weight-

ing it with the probability of occurance within a com-

municating class (i.e., multiplying this with the sta-

tionary mapping P

∞

B = P

∞

∑

i∈I

t2r

]

j,i

∑

i∈I

]

j,i

where P

∞

is a mapping to stationary distribution:

∞

= lim

n→∞

∑

, and [x]

a,b

denotes an element

of x in a’th row and b’th column. Elements of B take

PROCESS CONTROL USING CONTROLLED FINITE MARKOV CHAINS WITH AN APPLICATION TO A

MULTIVARIABLE HYBRID PLANT

values in the interval [0,S],

∑

i∈I

[B]

= S. A large

value in B indicates the presence of the following

ingredients: the communicating class to which a re-

current state belongs to can be accessed from a large

number of states; the probability of entering the class

from these states is high; the stationary probability

for the occurance of a recurrent state (within a com-

municating class) is high. Recall that when working

with large state-action spaces, separate examination

of all states is hopeless. In the experimental section

we hope to bring some light to some control relevant

properties of the (open or closed-loop) system by pro-

jecting B to dimensions of x.

Based on the above analysis, some communicat-

ing classes can then be taken under closer exami-

nation: simulation of (expected or random) trajecto-

ries, examination of cells in basins of attraction, etc.

In assessing control performance, examination of the

speed of the system (lengths of trajectories converg-

ing to communicating classes) is of great interest. The

absorbtion time from the i’th state to the j’th state

(i ∈

, j ∈

), E

{

}

, is obtained from:

{

}

= P

∑

∞

k=0

= P

(1− P

)

−2

5 SIMULATION STUDY

In this section some numerical results based on sim-

ulations ar given. The following control design prob-

lem set-up was envisioned: A nonlinear state-space

model of the plant is available (a set of ordinary dif-

ferential equations, for example), and a decision on

input, state, and output variables has been made. A

controller is now seeked for, such that desired transi-

tions between plant output set points would be opti-

mal. A typical GCM control design procedure would

involve the following (iterative) steps:

• Set model resolution by specifying discretization

of plant inputs, states, outputs and output set

points; and sampling time.

• Set control targets by specifying immediate costs.

• Build a GCM plant model (by successive evalua-

tions of the original model, and counting the oc-

curred state transitions) and analyze its behavior.

• Design a controller (e.g., optimal or predictive),

based on the GCM plant model.

• Build a GCM closed-loop model and analyze its

behavior.

5.1 Experiment Setup

Let us consider a simple example of a two-tank

MIMO system (see (

Akesson et al., 2006) and ref-

erences there). In this hybrid system the action space

(space of control inputs) consists of both real-valued

and discrete-valued variables. The objective is to keep

the temperature (T

) in the second tank at it setpoint,

while keeping the levels of both tanks (h

, h

) within

preset limits. The system is controlled by a valve

for the ﬁrst tank input ﬂow, a pump between the two

tanks, a heater in the second tank, and a valve for

the second tank output ﬂow. The heater (u

) is con-

strained to continuous values in the interval between 0

and 560 kW, the pump (u

) has three operational lev-

els {off, medium, high}, the valves (u

, u

) are binary

{on/off}.

The system is described by the mass and energy

balances

− αu

)

− T

(αu

− v

)

− T

)αu

where subscript 1 refers to the ﬁrst tank (buffer), sub-

script 2 to the second tank (supply); v

is the inﬂow,

the inﬂow temperature, and v

the outﬂow; A is

the tank area (A

= 3.5 m

, A

= 2 m

), c

and ρ

are

the liquid speciﬁc heat capacity and density (c

= 4.2

kg K

, ρ

= 1000

); α is a pump capacity factor

(α = 1

min

A discrete time Markov model (1)–(2) x(k) =

f (x(k) , u(k)) for the system was constructed by

forming the system state and controls as follows: x(k)

= [ h

(k), T

(k), h

(k), T

(k), v

(k)]

and u(k) = [u

(k) , u

(k), u

(k) , u

(k)], y(k) =

(k). Since the model is based on another (determin-

istic) model we omit the disturbances here, w ≡ v ≡ 0

in (1)–(2). The state space was discretized by form-

ing a grid, where tank levels and temperatures were

quantized as follows:

ref

= x

ref

{

0, 1, 2, ..., 9

}

[m]

ref

{

17, 18, 19

}

, x

ref

{

17, 18, ..., 24

}

[C]

disturbances into one and three values:

= v

{

}

, v

{

17, 18, 19

}

and heating action in ﬁve values :

u =

{

0, 140, ..., 560

}

[kW].

Roughly, the above states that deviations less than 0.5

◦

C in the supply tank temperature are out of our in-

terest. Since we also want to place constraints on

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

high and low tank levels, with the above discretiza-

tion we can set the switching point from allowed to

non-desirable to 0.5 and 8.5 m. The quantizations

in the disturbances allow to take known (step-wise)

disturbances into account when deriving the optimal

controls. The immediate costs were set based on the

Euclidean norm between desired and reference tem-

peratures,

w− y

and deviation from nominal con-

trols for u

, u

and u

at 1 with weights 0.1, 2 and

2 respectively (see (

Akesson et al., 2006)). For the

states where reference points execeeded either upper

or lower limits for the tank level, an additional large

cost was added (ten times larger than largest cost so

far). A 100 times larger cost was set for the sink cell.

5.2 Computer Simulations

The selected discretization results in a ﬁnite state–

action space of 7201 states and 60 actions, including

the sink cell. For the considered computational plat-

form (a standard ofﬁce PC: 3GHz Pentium 4 CPU,

1GB RAM, MATLAB R12) this posed no problem.

In fact, up to ten times larger state–action spaces were

experimented successfully.

A GCM model was built by evaluating the state

transitions ﬁve hundered times for each possible state-

action pair (s, a). The starting state was generated

from a random uniform distribution from within the

state hypercube. This resulted in roughly 2 × 10

evaluations of the plant model; within each evalua-

tion the ode were solved one sampling time (15 sec-

onds) ahead using a standard ode-solver (MATLAB

ode23). While most of the computing time was spent

on solving the ode, the computations took a couple of

hours. Clearly this presented a signiﬁcant burden both

in terms of computing power and memory, but not

excessive at all. As long as the discretization of the

state-action spaces is kept ﬁxed during latter stages of

control design, the model need not be re-evaluated,

even if other parameters such as the immediate costs

(R) or controller design parameters (H

or γ) would be

modiﬁed.

Given the GCM model, a predictive controller was

designed using H

= 5. Using the obtained control

policy π

⋄

, a closed loop GCM map was constructed.

The stability of the closed loop system is revealed

by examination of the probabilities of entering the

sink cell. The probabilities for entering the sink cell

from any other cell were zero. Consequently, the

closed loop system was stable for all initial states and

set points.

Figure 1 illustrates the sizes of the basins-of-

attraction, projected to three different dimensions of

x: level of tank 1 (x

), level of tank 2 (x

), and tem-

0 1 2 3 4 5 6 7 8 9

100

# b−of−a (%)

Σ 7201

Size of basin−of−attraction

0 1 2 3 4 5 6 7 8 9

100

# b−of−a (%)

Σ 7201

17 18 19 20 21 22 23 24

100

# b−of−a (%)

Σ 7201

Figure 1: Size of basin-of-attraction. The probability mass

for entering a particular state, projected to dimensions of x.

Top plots: tank levels, bottom plot: tank 2 temperature.

perature of tank 2 (x

). The bars in the plots show the

size of basin-of-attraction (as a percentage of whole

state space), i.e., the cumulative number of states that

are mapped to a recurrent state, recurrent states being

sorted according to the their projection to dimension

of x. From top plots in Fig 1, it can be immediately

observed that the constraints on tank levels are ful-

ﬁlled: the basin-of-attraction is empty for projections

to levels 0 or 9, to both tanks.

The projections to tank 2 temperature, see bot-

tom plot in Fig 1, show the success (controllability)

of the plant to its set point (in steady state). For

set points 19

◦

C... 23

◦

C, almost 100% success is

obtained. For low temperatures, the smaller sizes

of basins-of-attraction are explained by the lack of

means to cool the incoming feed. Since one third of

the states characterizes states with input feed equal to

◦

C, one third in 18

◦

C and one third in 17

◦

C, it is

easy to understand that the setpoint of 17 degrees is

attained only when the input feed is 17

◦

C, etc.

In few cases (three initial states), the model pre-

dicted transitions that could be judged as impossible

using physical arguments. Closer examination of the

plant model statistics revealed that these cases were

due to the random sampling when building the GCM

model, and the slow dynamics of the system. For ex-

ample, an input feed in 19

◦

C with no heating resulted

the 18

◦

C steady state temperature in both tanks. Ex-

amining the plant model, it was straightforward to at-

tribute this to the fact that during the 500 simulations,

none of the simulated trajectories had lead to another

discrete state. Consequently, this state was catego-

rized as absorbing. The remedy for this problem is

to increase either the sampling rate, or the number of

evaluations.

PROCESS CONTROL USING CONTROLLED FINITE MARKOV CHAINS WITH AN APPLICATION TO A

MULTIVARIABLE HYBRID PLANT

p(x

)

0 10

0 2 4 6 8 10

Figure 2: A communicating class consisting of two states.

The closed loop system is ’ringing’.

For set points 19

◦

C... 23

◦

C, full 100% sizes of

basin-of-attraction were not obtained. Instead, from

bottom plot in Fig 1 it can be observed that a small

percentage of the probability mass is distributed to

the neighboring projections. A closer examination of

these cases reveals a typical reason for this. Top plot

in Fig. 2 shows the evolution of state in closed-loop

from a particular initial state. The top plot shows the

state probabilities projected towards the dimension x

(tank 2 temperature), for the case of setpoint in 20

◦

As suggested by the plot, and as can be detected by

examination of the communicating classes, the sta-

tionary distribution is a communicating class formed

by two states. A sample trajectory is illustrated in the

bottom plot, Fig. 2, showing that a phenomenon of

’ringing’ clearly takes place. Most of the time is spent

in the desired set point, but occasionnally the system

crosses the border and visits the state 19

◦

For a particular communicating class, or a state

in it, very precise information can be obtained. For

example, for the closed-loop system with set point in

◦

C there were 182 communicating classes of which

67 were absorbing. For the ’ringing’-class example

the basin-of-attraction contained 201 states (there was

a nonzero probability of entering this class from 201

states). The expected time of absorbtion within these

states ranged from 0 (from the recurrent states) to 8.85

minutes. Evolution of the transition probability dis-

p(x

)

1 30

p(x

)

1 30

Figure 3: Evolution of probability distribution. The ex-

pected time of absorbtion is 9 minutes.

tribution (projected to two dimensions of x) from the

slowest initial state is shown in Fig. 3, conﬁrming the

exactness of this result. Unfortunately, it is not feasi-

ble to examine all states with this much care.

As a ﬁnal example of system analysis, let us ex-

amine if there were any cases when a plant shutdown

would happen (heating off, pump off, valves off). Ex-

amination of the policies (for all setpoints) revealed

that only the sink cell resulted in plant shutdown. An-

other set of interesting control commands could be the

one characterized by high pumping and closing of the

output valve. It was observed that for each set point,

the policy table contained roughly 800 occurances of

this control (i.e., in 800 out of 7201 states, this was

the control to apply). These states were characterized

by low levels in tank 2. Again, this appeared to make

sense from an engineering point of view.

It can be concluded that convenient tools for anal-

ysis of the closed loop system were found, including

examination of stability and steady state performance.

The characterization of system performance in terms

of speed of response was more tedious. It is not clear

what could be done, as –for a nonlinear system– be-

havior differs from state to state, and computation of

expectations and worst case scenarios does not nec-

essarily reveal feasible information. A partial rem-

edy is provided by the simple –and extremely com-

monly used– approach of simulating the closed loop

ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics

0 50 100 150 200

140

280

420

560

0 50 100 150 200

time

Figure 4: Closed-loop simulation. The set point trajectory

consists of a series of steps and ramps. A the end of simu-

lation, an input disturbance affects the system.

system under typical operating conditions. Figure 4

illustrates a trajectory following simulation with (a

known) input disturbance. It can be concluded that

the behavior of the closed loop system is adequate.

6 DISCUSSION AND

CONCLUSIONS

In this preliminary work we have focused on using

Markov chains and MDP as process control design

tool, to be used bearing in mind the resolution of

the problem set up (discretization into a ﬁnite state

space). Continuing in the same direction, the prob-

lem of identication is then related to keeping the orig-

inal model up-to-date (the ode, for example), or –at

least– approximating the original model using func-

tion approximation techniques, rather than looking

for clever tricks to make counting feasible in the ﬁ-

nite state space, doomed to be huge. If doable, the

beneﬁts are clear: physical interpretation of estimated

parameters. In many process engineering problems,

this may turn out to be more fruitful than pure ma-

chine learning approaches.

Instead, the problem of uncertainty in measure-

ments can potentially be handled in a very elegant and

efﬁcient fashion using ﬁnite Markov chains (Ikonen,

2004). Given the ﬁnite state probabilistic descrip-

tion of the plant, it is straightforward to construct cost

functions taking into account the uncertainty in the

predictions (other than discounted conditional expec-

tations). Under the predictive control paradigm, also

uncertainties in current state can be taken into account

in plant predictions (i.e., there’s no need to restrict to

ML estimates, etc.). Conducting a literature review on

these topics is a major direction in our future research.

REFERENCES

Akesson, B. M., Nikus, M. J., and Toivonen, H. T. (2006).

Explicit model predictive control of a hybrid system

using support vector machines. In Proceedings of the

1st IFAC Workshop on Applications of Large Scale

Industrial Systems (ALSIS’06), Helsinki-Stockholm,

Finland-Sweden.

aggstr

om, O. (2002). Finite Markov Chains and Algorith-

mic Applications. Cambridge University Press, Cam-

bridge.

Hsu, C. S. (1987). Cell-to-Cell Mapping - A Method of

Global Analysis for Nonlinear Systems. Springer-

Verlag, New York.

Ikonen, E. (2004). Learning predictive control using prob-

abilistic models. In IFAC Workshop on Advanced

Fuzzy/Neural Control (AFNC’04), Oulu, Finland.

Ikonen, E. and Najim, K. (2002). Advanced Process Identi-

ﬁcation and Control. Marcel Dekker, New York.

Kaelbling, L. P., Littman, M. L., and Moore, A. W. (1996).

Reinforcement learning: A survey. Journal of Artiﬁ-

cial Intelligence Research, 4:237–285.

Lee, J. M. and Lee, J. H. (2004). Approximate dynamic

programming strategies and their applicability for pro-

cess control: A review and future directions. Interna-

tional Journal of Control, Automation, and Systems,

2(3):263–278.

Lunze, J. (1998). On the Markov property of quantised state

measurement sequences. Automatica, 34(11):1439–

1444.

Lunze, J., Nixdorf, B., and Richter, H. (2001). Process su-

pervision by means of a hybrid model. Journal of Pro-

cess Control, 11:89–104.

Najim, K., Ikonen, E., and Ait-Kadi, D. (2004). Stochas-

tic Processes - Estimation, Optimization and Analysis.

Kogan Page Science, London.

Negenborn, R. R., De Schutter, B., Wiering, M. A., and Hel-

lendoorn, H. (2005). Learning-based model predictive

control for Markov decision processes. In 16th IFAC

World Congress.

Poznyak, A. S., Najim, K., and G

omez-Ram

ırez, E. (2000).

Self-Learning Control of Finite Markov Chains. Mar-

cel Dekker, New York.

Puterman, M. L. (1994). Markov Decision Processes –

Discrete Stochastic Dynamic Programming. Wiley et

Sons, New York.

PROCESS CONTROL USING CONTROLLED FINITE MARKOV CHAINS WITH AN APPLICATION TO A

MULTIVARIABLE HYBRID PLANT