A New Inverse Optimal Control Method for Discrete-time Systems
Moayed Almobaied, Ibrahim Eksin and Mujde Guzelkaya
Department of Control and Automation, Istanbul Technical University, Maslak, Istanbul, Turkey
Keywords:
Control Lyapunov Function (CLF), Extended Kalman Filter (EKF), Hamilton-Jacobi-Bellman (HJB) Equa-
tion, Inverse Optimal Control.
Abstract:
This paper presents a new approach based on extended kalman filter (EKF) to construct a control lyapunov
function (CLF). This function will be used in establishing the control law of inverse optimal control for
discrete-time nonlinear systems. The main aim of the inverse optimal control is to avoid the solution of the
difficult Hamilton-Jacobi-Bellman (HJB) equation which is resulted from the traditional solution of nonlinear
optimal control problem. The relevance of the proposed scheme is illustrated through MATLAB simulation.
The results show the effectiveness of the proposed method.
1 INTRODUCTION
The design of optimal controllers for nonlinear sys-
tems has been an area of intense research interest in
control theory. Optimal nonlinear control deals with
the problem of finding a stabilizing control law for a
given nonlinear system and achieving a certain opti-
mality criterion . In general, solving the nonlinear op-
timal control problem leads to the Hamilton-Jacobi-
Bellman (HJB) equation. This equation has no ex-
act analytical solution for the general nonlinear case
(Sanchez and Ornelas-Tellez, 2013; Ornelas et al.,
2011; Freeman and Kokotovic, 1996). The HJB equa-
tion is reduced to the Riccati equation in the case of
linear quadratic regulator (LQR) (Kalman, 1964).
The inverse optimal control problem, which ini-
tially presented by Kalman for linear systems, deals
with the question of whether a given state feedback
can be the optimal control with respect to some use-
ful performance index (Kalman, 1964). In nonlinear
case, the inverse optimal control approach circum-
vents the task of solving a Hamilton-Jacobi-Bellman
equation. The main idea behind the theory of inverse
optimal control is that it is required to construct a sta-
bilizing feedback control law based on a priori knowl-
edge of a control lyapunov function (CLF) as a first
step, then this control law will be used to optimize
a meaningful cost functional (Sanchez and Ornelas-
Tellez, 2013; Ornelas et al., 2011; Freeman and Koko-
tovic, 1996). This definition can be a bit confusing if
it is compared to the definition of optimal control the-
ory, where the cost function should be known before
designing the control law.
In this paper, the inverse optimality approach de-
pends on defining a control lyapunov function (CLF).
Unfortunately, there are no systemic techniques to de-
fine a CLF for general nonlinear systems. In the liter-
ature, it is well known that the existence of a control
lyapunov function leads to lyapunov stability in the
system (Sanchez and Ornelas-Tellez, 2013; Khalil,
1996). Moreover, any CLF can be considered as
meaningful cost function in optimal control problems
(Sanchez and Ornelas-Tellez, 2013; Ornelas et al.,
2011). In (Ornelas-Tellez et al., 2011) a quadratic
CLF was proposed for the inverse optimal control
problem ; this function depends on a time- variant
parameter, where the speed-gradient (SG) algorithm
was proposed to adjust this parameter.
In this research, the same quadratic control lya-
punov function which proposed in (Ornelas et al.,
2011; Ornelas-Tellez et al., 2011) is used, then the
overall parameters of this function are adjusted in a
recursive way by using the mean of extended kalman
filter (EKF) Algorithm. The researchers in the field
of nonlinear estimation problems used the EKF algo-
rithm as an estimator in both the state of a nonlinear
dynamic system and in parameters estimation process
for many applications, such as induction motor con-
trol and Fuzzy modeling control problems (Simon,
2002; Yazid et al., 2011).
The novel contribution of this paper is that the
EKF algorithm is used as on-line parameters identi-
fier in order to construct the CLF within the control
loop of the inverse optimal control.
275
Almobaied M., Eksin I. and Guzelkaya M..
A New Inverse Optimal Control Method for Discrete-time Systems.
DOI: 10.5220/0005562902750280
In Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2015), pages 275-280
ISBN: 978-989-758-122-9
Copyright
c
2015 SCITEPRESS (Science and Technology Publications, Lda.)
The remainder of this paper is organized as fol-
lows: Section 2 briefly describes the nonlinear opti-
mal control and the discrete time HJB equation. Sec-
tion 3 introduces some mathematical notations and
definitions related to the inverse optimal control and
control lyapunov function. In Section 4, the extended
kalman filter algorithm is presented. In Section 5, an
in-depth explanation on the proposed design method
is given. Section 6 presents a nonlinear test exam-
ple and the simulation results. Some conclusions are
drawn in section 7.
2 DISCRETE TIME HAMILTON-
JACOBI-BELLMAN EQUATION
FOR NONLINEAR OPTIMAL
CONTROL
Considering an affine-in-input nonlinear dynamical
system of the form:
x
k+1
= f (x
k
) + g(x
k
)u
k
(1)
Where x R
n
is the state of the system, u R
m
is
the control input. f (x
k
) R
n
, g(x
k
) R
n×m
. With-
out loss of generality, It can be assumed that the ori-
gin (x = 0) is the equilibrium point of the system (1),
f (0) = 0 and g(x
k
) 6= 0 for all x
k
6= 0. system (1) is
assumed to be stabilizable on a predefined compact
set R
n
.
Definition 1: Stabilizable System; A nonlinear
dynamical system is said to be a stabilizable sys-
tem on a compact set R
n
if there exists a con-
trol input U R
m
such that, for all initial condi-
tions x
0
, the state x
k
0 as k (Khalil,
1996).
It is desired to determine a control law u
k
, which
minimizes the following cost functional:
V (x
k
) =
n=k
(L(x
n
) + u
T
n
Eu
n
) (2)
Where V : R
n
R
+
is the cost functional, L : R
n
R
+
is positive semi-definite function, and E : R
n
R
m×m
is a real symmetric positive definite weight-
ing matrix which could be a function of the system’s
states. Equation (2) can be written as:
V (x
k
) = L(x
k
) + u
T
k
Eu
k
+V (x
k+1
) (3)
It is assumed that the boundary condition of the func-
tion is equal to zero (i.e. V (x = 0) = 0) in order
to use V (x
k
) as a lyapunov function in the next sec-
tion. From Bellman’s optimality principle, it is known
that the V
(x
k
) value function is time invariant and
satisfies the discrete time (DT) bellman equation for
the infinite horizon optimization case (Sanchez and
Ornelas-Tellez, 2013; Nakamura et al., 2007):
V
(x
k
) = min
u
k
{L(x
k
) + u
T
k
Eu
k
+V
(x
k+1
)} (4)
The formula of the optimal control u
k
can be calcu-
lated by taking the gradient of the right-hand side of
(4) with respect to u
k
. Therefore, the optimal control
u
k
will be:
u
k
=
1
2
E
1
g
T
(x
k
)
V
(x
k+1
)
x
k+1
(5)
By substituting the optimal control formula u
k
in
V
(x
k
) at (4), the DT HJB equation will be:
V
(x
k
) = L(x
k
) +V
(x
k+1
)
+
1
4
V
T
(x
k+1
)
x
k+1
g(x
k
)E
1
g
T
(x
k
)
V (x
k+1
)
x
k+1
(6)
3 FUNDAMENTAL OF INVERSE
OPTIMAL CONTROL
The proposed inverse optimal control approach in this
research depends on the control lyapunov (CLF). Due
to this, some important properties for lyapunov func-
tion from the literature are stated here.
3.1 Control Lyapunov Function
Definition 2: A positive definite function M(x
k
)
satisfying the condition M(x
k
) as kx
k
k
is said to be radially unbounded (Sanchez and
Ornelas-Tellez, 2013; Khalil, 1996).
Definition 3: Let M(x
k
) be radially unbounded,
with c > 0, x
k
6= 0, M(0) = 0.
If for any x
k
R
n
, there exit real values u
k
such that M M(x
k
, u
k
) < 0 where the difference
M M(x
k
, u
k
) is defined as:
M(x
k+1
) M(x
k
) = M( f (x
k
) + g(x
k
)u
k
) M(x
k
)
Then M(.) is said to be ”Discrete-time control
lyapunov function for the system” (Sanchez and
Ornelas-Tellez, 2013; Ornelas et al., 2011).
Definition 4: The equilibrium point x
k
= 0 is glob-
ally asymptotically stable if there exists a func-
tion M : R
n
R such that (i) M is a positive defi-
nite function, decrescent and radially unbounded,
and (ii) M M(x
k
, u
k
) is a positive definite func-
tion (Sanchez and Ornelas-Tellez, 2013; LaSalle,
1986).
ICINCO2015-12thInternationalConferenceonInformaticsinControl,AutomationandRobotics
276
Definition 5: Suppose that there exists a positive
definite function M and three constants as follow-
ing:
c1, c2, c3 > 0, P > 1
Such that:
c1kxk
P
M(x
k
) c2kxk
P
M M(x
k
) c3kxk
P
, k 0, x
k
R
n
Then x
k
= 0 is an exponentially stable equilibrium
for the system (Sanchez and Ornelas-Tellez, 2013;
LaSalle, 1986).
3.2 Inverse Optimal Control
In the literature of inverse optimal control, a theorem
for the discrete-time inverse optimal control has been
published and the proof of this theorem is well-done
in (Sanchez and Ornelas-Tellez, 2013; Ornelas et al.,
2011). This research depends directly on this theo-
rem. For that, the theorem is just stated here without
any proof.
Theorem 1: The control law u
k
in (5) can be as-
sumed to be inverse optimal control if:
a) It achieves a global exponential stability of the
equilibrium point x
k
= 0 for the system.
b) It minimizes the cost functional in (2). For
which L(x
k
) := M Where:
M := M(x
k+1
) M(x
k
) + u
T
k
Eu
k
0 (7)
The inverse optimal control is based on the knowledge
of M(x
k
). Hence, a CLF M(x
k
) is proposed such that
(a) and (b) are guaranteed. That is, instead of solving
HJB in (6) for V (x
k
), a candidate quadratic control
lyapunov function M(x
k
) is proposed with the form
M(x
k
) =
1
2
x
T
k
Px
k
P = P
T
> 0 (8)
Hence, The CLF M(x
k
) is used instead of V (x
k
). It
is required to select an appropriate matrix P in order
to achieve stability. Moreover, the control law u
k
with
the proposed quadratic control lyapunov function will
optimize the cost functional. The state feedback con-
trol law can be rewritten as:
u
k
=
1
2
(E +
1
2
g
T
(x
k
)Pg(x
k
))
1
g(x
k
)
T
P f (x
k
)
(9)
The process of finding an appropriate P matrix to sat-
isfy the required performance is still a hot research
topic (Sanchez and Ornelas-Tellez, 2013; Ornelas
et al., 2011; Ornelas-Tellez et al., 2011). Figure 1
illustrates the distinction between the traditional solu-
tion for the nonlinear optimal control problem and the
inverse optimal control approach.
Nonlinear Optimal control
Inverse Optimal control HJB Equations Method
M(x
k
) =
1
2
x
T
k
Px
k
Select P = P
T
> 0
V (x
k
) =
n=k
(L(x
n
) + u
T
n
Eu
n
)
M := M(x
k+1
) M(x
k
) + u
T
k
Eu
k
0
Select L(x
k
) := M
u
k
=
1
2
E
1
g
T
(x
k
)
V
(x
k+1
)
x
k+1
V (x
k
) =
n=k
(L(x
n
) + u
T
n
Eu
n
)
Meaningful cost functional
V
(x
k
) = L(x
k
) +V
(x
k+1
)
+
1
4
V
T
(x
k+1
)
(x
k+1
)
g
(x
k
)
E
1
g
T
(x
k
)
V (x
k+1
)
(x
k+1
)
Figure 1: The inverse optimal control approach and the tra-
ditional solution for optimal control problem.
4 EXTENDED KALMAN FILTER
The Kalman filter (KF) has become a standard tech-
nique to be used as an optimal estimator and a quite
easy method for estimating the un-measurable states
of the linear systems. For nonlinear systems, the ex-
tended Kalman filter can be used if the nonlinearity of
the system were sufficiently smooth (Simon, 2002).
4.1 Extended Kalman Filter Equations
The nonlinear process model is described as:
X
k
= g(u
k1
, x
k1
) + w
k1
(10)
Z
k
= h(x
k
) + v
k
(11)
State transition probability and measurement proba-
bility are the nonlinear functions g and h, respec-
tively. w
k
and v
k
are the process and observation
noises. These noises are assumed to be zero mean
multivariate Gaussian noises with covariance Q
k
and
R
k
, respectively. Here, u
k
is the input control vector.
The discrete-time equations of extended kalman fil-
ter are illustrated in Figure 2. Where the matrix G
k
is the Jacobian of the state function and it is defined
as the derivatives of each component of g w.r.t. each
component of x
k1
. Moreover, the matrix H
k
is the Ja-
cobian of the measurement function and it is defined
as the derivatives of each component of h w.r.t. each
component of x
k
.
Recently, the Extended Kalman Filter has been
utilized in parameters estimation for real-time control
in nonlinear system (Simon, 2002; Yazid et al., 2011).
4.2 Stability Analysis in EKF
Applications
Since the covariance matrices which used in EKF are
approximations and the estimation is based on the lin-
earization of nonlinear functions g and h, there is no
ANewInverseOptimalControlMethodforDiscrete-timeSystems
277
Filter’s Input [Σ
+
k1
,
*
x
+
k1
, u
k
, Z
k
]
Predication
*
x
k
= g(u
k
,
*
x
+
k1
)
Σ
t
= G
k
Σ
+
k1
G
T
k
+ Q
k
Z
k
= h(
*
x
k
)
Kalman gain {K
k
= Σ
k
H
T
k
[H
k
Σ
k
H
T
k
+ R
k
]
1
Estimation
(
*
x
+
k
=
*
x
k
+ K
k
(Z
k
Z
k
)
Σ
+
k
= [I K
k
H
k
]Σ
k
Jacobian
G
k
=
g(
*
x
+
k1
, u
k
)
x
k1
H
k
=
h(
*
x
k
)
x
k
Filter’s Outputs [
*
x
+
k
, Σ
+
k
]
Figure 2: Extended Kalman Filter Equations.
guarantee of stability and performance for the system
prior to experimental data analysis. Indeed, the ap-
proach seems to work well if the linearization is suf-
ficiently smooth and a proper tuning for filter param-
eters is achieved (Raol et al., 2004). Section 5 shows
how to modify the EKF equations in order to estimate
the parameters in the proposed inverse optimal con-
trol law.
5 EKF FOR INVERSE OPTIMAL
CONTROL
In the proposed approach, the EKF equations are used
in order to estimate the parameters of the P matrix.
This matrix will be used in establishing the quadratic
control lyapunov function as following:
M(x
k
) =
1
2
x
T
k
Px
k
P = P
T
> 0 (12)
Where
*
x
+
k1
= [P1 P2 ... Pn], and
P1, P2, ...Pn are the elements of matrix P
to be estimated. It can be defined the state function as
one to one mapping of those parameters:
*
x
k
= g(u
k
,
*
x
+
k1
) =
*
x
+
k1
=
P1
P2
.
.
Pn
(13)
The state Jacobian matrix G
k
is equal to the
identity matrix.
G
k
=
P
1
P
1
P
1
P
2
···
P
1
P
n
P
2
P
1
P
2
P
2
···
P
2
P
n
.
.
.
.
.
.
.
.
.
.
.
.
P
n
P
1
P
n
P
2
···
P
n
P
n
= I (14)
For simplicity, it can be assumed that Q
k
is constant
during the process:
Qk = q
0
× I (15)
*
x
+
k
= S
0
× I (16)
For the following estimator’s equation in EKF:
*
x
+
k
=
*
x
k
+ K
k
(Z
k
Z
k
) (17)
The term (Z
k
Z
k
) is used to calculate the difference
between the measurement value and the estimated
one. Hence, this term can be adapted in order to
be suitable for the proposed research as following:
Z
k
In the EKF equations can be used as error
indicator, Z
k
can be set to be equal to zero in order
to minimize the total error (Z
k
Z
k
). Moreover, the
Root Mean Square Error (RMSE) of all states output
will be used as error observer instead of measurement
error Z
k
. (i.e. Z
k
= RMSE), which equal to h(
*
x
k
) as
shown in predication equation in figure 2.
Z
k
= h(
*
x
k
) = RMSE =
s
(X
1
X
1re f
)
2
+ (X
2
X
2re f
)
2
+ . . . + (X
n
X
nre f
)
2
n
(18)
To calculate the Jacobian Hk , it is required
to define h(
*
x
k
) as a function of the P parameters
[P1 P2 Pn]. Then the Jacobian matrix can be
found as the following equation:
Hk =
"
h(
*
x
k
)
P
1
h(
*
x
k
)
P
2
···
h(
*
x
k
)
P
n
#
(19)
Figure 3 shows the block diagram of the proposed
method.
The steps of the proposed approach are illustrated
here:
1. Find suitable initial values for the parameters of
matrix P.
2. Choose suitable values for the covariance matri-
ces: Q
k
and R
k
.
ICINCO2015-12thInternationalConferenceonInformaticsinControl,AutomationandRobotics
278
EKF
M(x
k
) =
1
2
x
T
k
Px
k
u
k
=
1
2
E
1
g
T
(x
k
)
M
(x
k+1
)
x
k+1
L(x
k
) =
h
M(x
k+1
) M(x
k
) + u
T
k
Eu
k
i
V (x
k
) =
n=k
L(x
n
) + u
T
n
Eu
n
x
k+1
= f (x
k
) + g(x
k
)u
k
RSME Value
Z
t
P
1
P
2
··· P
n
Lyapunov Function
Control Law
State Penalty Term
Meaningful Cost Functional
Nonlinear System
Figure 3: EKF based Inverse Optimal Control for Discrete-Time Nonlinear System.
3. Calculate the error observer (RMSE) from the
current states which equal to h(
*
x
k
).
4. Apply the proposed EKF equations to get the es-
timated values of the parameters.
5. Construct the control lyapunov function (CLF),
and then establish the control law of the inverse
optimal control.
6. Calculate the penalty term L(x
k
) for the meaning-
ful cost functional.
7. Test the system’s states and exit if the states ar-
rived to your target.
8. Return to step 3 to calculate the new states.
9. The values of initial parameters and the covari-
ance matrices can be changed in order to get better
performance.
In summary, the EKF algorithm will estimate a
new P matrix at each step. This new P matrix should
minimize the RMSE value if the filter is well adjusted.
As it is mentioned before in 4, the stability and the
performance of the nonlinear system can be adjusted
by tuning the filter’s parameters.
6 EXAMPLE AND SIMULATION
RESULTS
The performance of the proposed method is illus-
trated in the following nonlinear example:
f (x
k
) =
x
1,k
x
2,k
0.8x
2,k
x
2
1,k
+ 1.8x
2,k
g(x
k
) =
0
2 + cos(x
2,k
)
The stabilizing optimal control law can be calcu-
lated according to (9). Matrix P is estimated by the
proposed method, as in section 5. Where E = 0.5 is
the constant in the cost function equation. The initial
condition for the states is X
0
= [2 2]. The EKF al-
gorithm constants are selected to be: Q
0
= 100; R
0
=
0.01; P
0
= 100. The phase portrait for both unstable
nonlinear system and the stabilized nonlinear system
is illustrated in figure 4. The behavior of both states
response and the control law for the stabilizing non-
linear system with respect to the time step is illus-
trated in figure 5. Figure 6 displays the evaluation of
the cost functional V (x
k
).
Figure 4: The phase portrait for the unstable system (a), and
for stabilized nonlinear system (b).
The previous example is used by the authors in
(Sanchez and Ornelas-Tellez, 2013) in order to test
the effectiveness of the main theorem. This theorem is
used in this research (theorem 1 in section 3). More-
over, the same example is used in (Ornelas-Tellez
et al., 2011) to test the performance of speed gradient
algorithm for inverse optimal control. In (Ornelas-
Tellez et al., 2011) a quadratic function of the form
V (x
k
) =
1
2
x
T
k
Px
k
was proposed as a CLF for the in-
ANewInverseOptimalControlMethodforDiscrete-timeSystems
279
Figure 5: States response and the control law.
Figure 6: Cost function evaluation.
verse optimal control problem, this CLF depends only
on one time-variant parameter P
k
, where P = P
k
P
0
and P
0
is a predefined matrix. Then this parameter
P
k
is adjusted by the mean of speed-gradient (SG) al-
gorithm. In this research, the simulation results indi-
cate that the proposed method has better performance
compared to the existing method as shown in Table 1.
Table 1: A Comparison between EKF based Approach and
Other approaches.
Methods X
1
0 X
2
0 Cost functional
Main theorem 10 Steps 8 Steps 40
Speed Gradient 8 Steps 7 Steps 10
EKF Based 3 Steps 2 Steps 4
7 CONCLUSIONS
In this paper, a new approach related to inverse opti-
mal control problem for discrete-time nonlinear sys-
tems is proposed. By using inverse optimal control
technique, there is no need to solve the Hamilton-
Jacobi-Bellman (HJB) equation which is resulted
from the traditional solution of nonlinear optimal con-
trol. For this new approach, a discrete-time con-
trol lyapunov function (CLF) in a quadratic form is
proposed, whose parameters is determined by using
extended kalman filter (EKF) algorithm. This CLF
will be used to establish the inverse optimal control
law. The validation of the proposed method is made
through MATLAB simulation. The results illustrate
that the proposed controller ensures stabilization of
nonlinear systems and minimizes a cost functional.
REFERENCES
Freeman, R. A. and Kokotovic, P. V. (1996). Inverse opti-
mality in robust stabilization. SIAM Journal on Con-
trol and Optimization, 34(4):1365–1391.
Kalman, R. E. (1964). When is a linear control system op-
timal? Journal of Fluids Engineering, 86(1):51–60.
Khalil, H. K. (1996). Nonlinear Systems. Prentice-Hall,
Englewood Cliffs, NJ, 2nd edition.
LaSalle, J. P. (1986). The Stability and Control of Dis-
crete Processes. Springer-Verlag New York, Inc., New
York, NY, USA.
Nakamura, N., Nakamura, H., Yamashita, Y., and Nishitani,
H. (2007). Inverse optimal control for nonlinear sys-
tems with input constraints. In Control Conference
(ECC), 2007 European, pages 5376–5382.
Ornelas, F., Sanchez, E. N., and Loukianov, A. G. (2011).
Discrete-time nonlinear systems inverse optimal con-
trol: A control lyapunov function approach. In Con-
trol Applications (CCA), 2011 IEEE International
Conference on, pages 1431–1436.
Ornelas-Tellez, F., Sanchez, E. N., Loukianov, A., and
Navarro-Lopez, E. (2011). Speed-gradient inverse op-
timal control for discrete-time nonlinear systems. In
Decision and Control and European Control Confer-
ence (CDC-ECC), 2011 50th IEEE Conference on,
pages 290–295.
Raol, J., Girija, G., Singh, J., and of Electrical Engineers, I.
(2004). Modelling and Parameter Estimation of Dy-
namic Systems. IEE control engineering series. Insti-
tution of Engineering and Technology.
Sanchez, E. N. and Ornelas-Tellez, F. (2013). Discrete-Time
Inverse Optimal Control for Nonlinear Systems. CRC
Press, Inc., Boca Raton, FL, USA.
Simon, D. (2002). Training fuzzy systems with the ex-
tended kalman filter. Fuzzy Sets and Systems, in print,
132:189–199.
Yazid, K., Bouhoune, K., Menaa, M., and Larabi, A. (2011).
Application of ekf to parameters estimation for speed
sensorless vector control of two-phase induction mo-
tor. In Electrical Machines and Power Electronics and
2011 Electromotion Joint Conference (ACEMP), 2011
International Aegean Conference on, pages 357–361.
ICINCO2015-12thInternationalConferenceonInformaticsinControl,AutomationandRobotics
280