Nonlinear Second Cumulant/H-infinity Control with Multiple Decision
Makers
Chukwuemeka Aduba
Arris Group Inc., Horsham, PA 19044, U.S.A.
Keywords:
Cumulant Game Control, Nash Equilibrium, Nonlinear System, Optimization, Statistical Game Control.
Abstract:
This paper studies a second cumulant/h-infinity control problem with multiple players for a nonlinear stochas-
tic system on a finite-horizon. The second cumulant/h-infinity control problem, which is a generalization of the
higher-order multi-objective control problem, involves a control method with multiple performance indices.
The necessary condition for the existence of Nash equilibrium strategies for the second cumulant/h-infinity
control problem is given by the coupled Hamilton-Jacobi-Bellman (HJB) equations. In addition, a three-
player Nash strategy is derived for the second cumulant/h-infinity control problem. A simulation example is
given to illustrate the application of the proposed theoretical formulations.
1 INTRODUCTION
Higher-order control problems (Won et al., 2010)
for stochastic systems have been investigated in re-
cent years and related to multi-objective control the-
oretical game formulations (Lee et al., 2010). In
multi-objective control problems, the control method
must concern itself with multiple performance in-
dices. A typical multi-objective control problem for
both stochastic and deterministic systems can be for-
mulated as mixed H
2
/H
control, where the control
wishes to minimize an H
2
norm while keeping the H
norm constrained. In fact, H
2
/H
control problem is
a robust control method which requires a controller to
minimize the H
2
performance while attenuating the
worst case external disturbance. This approach was
investigated in (Bernstein and Hassas, 1989), while
the Nash game approach to the problem was given in
(Limebeer et al., 1994). In (Basar and Olsder, 1999),
a two-player game involving control and disturbance
was analyzed, where both players wished to optimize
their respective performance indices when the other
player plays their equilibrium strategy.
In this paper, mixed second cumulant/h-infinity
(second cumulant/H
) control problem with multiple
players is investigated for a nonlinear stochastic sys-
tem. Why second cumulant/H
as compared to first
cumulant/H
or (H
2
/H
). Earlier studies in (Won
et al., 2010) have shown that higher-order cumu-
lants offer the control engineer additional degrees of
freedom to improve system performance through the
shaping of the cost function distribution. As a result
of this opportunity, there is need to investigate higher-
order cumulant to worst case disturbance effects on
dynamic systems. The second cumulant/h-infinity
control problem involves simultaneous optimization
of the higher-order statistical properties of each indi-
vidual player’s cost function distribution through cu-
mulants while keeping the H
norm constrained. The
optimization of cost function distribution through cost
cumulant was initiated by Sain (Sain, 1966), (Sain
and Liberty, 1971). Linear quadratic statistical game
with related application such as satellite systems was
investigated in (Lee et al., 2010) while an output feed-
back approach to higher-order statistical game was
studied in (Aduba and Won, 2015).
As an extension of the foregoing studies in (Lee
et al., 2010), (Aduba and Won, 2015) and the ref-
erences there in, a nonlinear system of three players
with quadratic cost function which is a non trivial ex-
tension is considered. Typical multi-objective con-
trol problem applications are in large-scale systems
such as computer communications networks, electric
power grid networks and manufacturing plant net-
works (Bauso et al., 2008), (Charilas and Panagopou-
los, 2010) while the higher-order multi-objectivecon-
trol application has been reported for satellite network
(Lee et al., 2010). The rest of this paper is organized
as follows. In Section 2, the mathematical prelim-
inaries and second cumulant/h-infinity control prob-
lem for a completely observed nonlinear system with
multiple players; which is formulated as a nonzero-
Aduba, C.
Nonlinear Second Cumulant/H-infinity Control with Multiple Decision Makers.
DOI: 10.5220/0005955400310037
In Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2016) - Volume 1, pages 31-37
ISBN: 978-989-758-198-4
Copyright
c
2016 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
31
sum differential game problem are given. Section 3
states and proves the necessary condition for the exis-
tence of Nash equilibrium strategies while Section 4
derives the optimal players strategy based on solving
coupled Hamilton-Jacobi-Bellman equations which is
the main result of this paper. Section 5 gives the
numerical approximate method for solving the cou-
pled Nash game Hamilton-Jacobi-Bellman equations
while a numerical example is demonstrated in Section
6. Finally, the conclusions are drawn in Section 7.
2 PROBLEM FORMULATION
Consider a 3-player nonlinear stochastic state dynam-
ics given by the following itˆo-type differential equa-
tion:
dx(t) = f(t,x(t),u
1
(t), u
2
(t), v(t))dt + σ(x(t))dw(t),
z(t) = Cx(t) + D
1
u
1
(t) + D
2
u
2
(t),
(1)
where t [t
0
,t
F
] = T, x(t) R
n
is the state and
x(t
0
) = x
0
, u
k
(t) U
k
R
m
is the k-th player strategy,
k = 1,2, v
k
(t) V
k
R
m
is the external disturbance
player and dw(t) is a Gaussian random process of di-
mension d with zero mean, covariance of W(t)dt. Let
Q
0
= [t
0
,t
F
) × R
n
and
¯
Q
0
is the closure of Q
0
.
f and σ are Borel measurable functions given as
f : C
1
(
¯
Q
0
× U
k
× U
k
× V
k
) and σ : C
1
(
¯
Q
0
). In ad-
dition, f and σ satisfy Lipschitz and linear growth
conditions (Arnold, 1974) while z(t) is the regu-
lated output of the stochastic system. Let u
k
(t) =
µ
k
(t, x),v(t) = ν(t,x),t T be memoryless state feed-
back strategies with µ
k
(t, x),ν(t,x) satisfying Lips-
chitz and linear growth condition and thus are admis-
sible strategies. It is shown in (Fleming and Rishel,
1975) that a process x(t) from (1) having admissible
strategies together with polynomial growth condition
ensures that Ekx(t)k
2
is finite.
The backward evolution operator, O(µ
1
,µ
2
,ν)
(Sain et al., 2000): O = O
1
+ O
2
is introduced
O
1
(µ
1
,µ
2
,ν) =
t
+ f
(t, x,µ
1
,µ
2
,ν)
x
,
O
2
(µ
1
,µ
2
,ν) =
1
2
tr
σWσ
2
x
2
,
(2)
where tr is the trace operator The cost function (J
k
)
for the k-th player is given as:
J
k
(t, x,µ
1
,µ
2
,ν) =
Z
t
F
t
L
k
(s,x(s),µ
1
,µ
2
,ν)ds
+ ψ
k
(x(t
F
)) or
J
k
(t, x,µ
1
,µ
2
,ν) =
Z
t
F
t
z
k
(t)z
k
(t)ds+ ψ
k
(x(t
F
)),
(3)
where k = 1, 2, L
k
is the running cost and ψ
k
is the
terminal cost with both (L
k
,ψ
k
) satisfying polyno-
mial growth condition. The z
k
in (3) is defined as
z
k
(t) = x
(t)Q(t)x(t) + u
k
R
k
u
k
(t), Q(t) = Q
(t) 0,
R
k
= R
k
> 0.
The cost function (J) for ν is given as:
J(t,x,µ
1
,µ
2
,ν) =
Z
t
F
t
L(s,x(s),µ
1
,µ
2
,ν)ds
+ ψ(x(t
F
)) or
J(t,x,µ
1
,µ
2
,ν) =
Z
t
F
t
ρ
2
ν
(t)ν(t) z
(t)z(t)
ds
+ ψ(x(t
F
)),
(4)
where L is the running cost and ψ is the terminal cost
with both (L,ψ) satisfying polynomial growth condi-
tion. Also, ρ > 0 is the constraint on the H
of the
system.
To study the cumulant game of cost function, the
m-th moments of cost functions M
k
m
of the k-th player
is defined as:
M
k
m
(t, x,µ
1
,µ
2
) = E
n
(J
k
)
m
(t, x,µ
1
,µ
2
)|x(t) = x
o
,
(5)
where m = 1,2. The m-th cost cumulant function
V
k
m
(t, x) of the k-th player is defined by (Smith, 1995),
V
k
m
(t, x) = M
k
m
m2
i=0
(m 1)!
i!(m 1 i)!
M
k
m1i
V
k
i+1
,
(6)
where t T = [t
0
,t
F
], x(t
0
) = x
0
, x(t) R
n
. Next, the
following definitions are given:
Definition 2.1: A function M
k
i
,V
k
i
:Q
0
R
+
is an
admissible i-th moment cost function if there exists a
strategy µ
k
such that
M
k
i
(t, x) = M
k
i
(t, x;µ
1
,µ
2
,ν),
V
k
i
(t, x) = V
k
i
(t, x;µ
1
,µ
2
,ν),
(7)
for t T, x R
n
, i = 1,2.
Definition 2.2: The players equilibrium strategy
µ
1
,µ
2
is such that
M
1
i
(t, x) = M
1
i
(t, x,µ
1
,µ
2
,ν
) M
1
i
(t, x,µ
1
,µ
2
,ν),
V
1
i
(t, x) = V
1
i
(t, x,µ
1
,µ
2
,ν
) V
1
i
(t, x,µ
1
,µ
2
,ν),
M
2
i
(t, x) = M
1
i
(t, x,µ
1
,µ
2
,ν
) M
1
i
(t, x,µ
1
,µ
2
,ν),
V
2
i
(t, x) = V
1
i
(t, x,µ
1
,µ
2
,ν
) V
1
i
(t, x,µ
1
,µ
2
,ν).
(8)
The moment (5), moment-cumulant relationship (6),
definition 2.1 (7) and definition 2.2 (8) all hold for the
external disturbance player (ν) as well.
Problem Definition: Consider an open set Q
Q
0
and let the k-th player and disturbance cost cumu-
lant functions V
k
1
(t, x),
¯
V
1
(t, x) C
1,2
p
(Q) C(
¯
Q) be
ICINCO 2016 - 13th International Conference on Informatics in Control, Automation and Robotics
32
an admissible cumulant function. Assume the exis-
tence of optimal players strategies µ
1
,µ
2
,ν
and op-
timal players value functions V
k
2
(t, x),
¯
V
2
(t, x), thus,
the multi-playersecond cumulant/H
control problem
is to find the Nash strategies µ
1
,µ
2
,ν
which result in
the minimal second value functions V
k
2
(t, x),
¯
V
2
(t, x)
while satisfying the system H
constraint. Thus, µ
k
is the second cumulant/H
optimal strategy and ν
is
the external disturbance strategy.
Remark: To find the Nash strategies µ
1
,µ
2
,ν
, we
constrain the candidates of the optimal players strat-
egy to U
M
1
,U
M
2
,U
¯
M
and the optimal value functions
V
k
2
(t, x),
¯
V
2
(t, x) are found with the assumption that
lower order cumulants, V
k
1
,
¯
V
1
are admissible.
3 SECOND CUMULANT HJB
EQUATION
Theorem 3.1: Let M
k
j
(t, x) C
1,2
p
(Q) C(
¯
Q) be the
admissible moment cost function, if there exists an
optimal k-th player strategy µ
k
such that M
k
j
(t, x) =
M
k
j
(t, x,µ
1
,µ
2
,ν
), t T = [t
0
,t
F
] then,
O
h
M
k
j
(t, x)
i
+ jM
k
j1
(t, x)L
k
(t, x,µ
1
,µ
k
,ν) = 0,
(9)
where M
k
j
(t
F
,x) = ψ
j
k
(x(t
F
)), j = 1, 2 and k = 1,2.
Remark: This theorem is an extension of Theo-
rem 3.1 in (Won et al., 2010) which considered only a
single player in a statistical optimal control problem.
This theorem is applied in this multi-player game.
Theorem 3.2: The necessary condition for Nash
equilibrium using the k-th player; k = 1,2 as refer-
ence is stated and proven. However, similar prove
holds with disturbance ν as reference. Consider a
3-player nonlinear system (1) with cost functional
(3),(4) of fixed duration [t
0
,t
F
]. LetV
k
1
(t, x),V
k
2
(t, x)
C
1,2
p
(Q) C(
¯
Q) be admissible value functions for the
k-th player.
Similarly, let
¯
V
1
(t, x),
¯
V
2
(t, x) C
1,2
p
(Q)C(
¯
Q) be
admissible value functions for the disturbance player.
Assume the existence of optimal player strategy µ
k
and an optimal value function V
k
2
(t, x). Then, the
minimal 2
nd
value function V
k
2
(t, x) satisfies in com-
pact form the following HJB equation for the k-th
player.
0 = min
µ
k
U
M
k
(
O(µ
1
,µ
2
,ν
)
h
V
k
2
(t, x)
i
+
V
k
1
(t, x)
x
σ(t, x)W(t)σ(t,x)
V
k
1
(t, x)
x
)
,
(10)
with V
k
j
(t
F
,x
F
) = 0, j = 1, 2, x(t) R
n
.
Proof: Let V
k
2
be a class ofC
1,2
p
(Q) C(
¯
Q) where
the argumentsfor the cumulant and moment functions
are suppressed. From (2), (6), the second cost cumu-
lant V
k
2
satisfies
O
h
V
k
2
i
= O
h
M
k
2
i
O
h
(V
k
1
)
2
i
. (11)
From (9), the function M
k
2
and running cost L
k
satisfy
O
h
M
k
2
i
+ 2M
k
1
L
k
(t, x,µ
1
,µ
2
,ν) = 0. (12)
Using (12) in (11) gives
O
h
V
k
2
i
+ O
h
(V
k
1
)
2
i
+ 2M
k
1
L
k
(t, x,µ
1
,µ
2
,ν) = 0.
(13)
Replacing (M
k
1
)
2
with (V
k
1
)
2
in (13) gives
O
h
V
k
2
i
+ O
h
(V
k
1
)
2
i
+ 2V
k
1
L
k
(t, x,µ
1
,µ
2
,ν) = 0.
(14)
Further expansion of (14) gives
O
h
V
k
2
i
+V
k
1
O
h
V
k
1
i
+V
k
1
O
h
V
k
1
i
+
V
k
1
x
σWσ
V
k
1
x
+ 2V
k
1
L
k
(t, x,µ
1
,µ
2
,ν) = 0.
(15)
Then, applying (9) to (15) gives
O
h
V
k
2
i
2V
k
1
L
k
(t, x,µ
1
,µ
2
,ν)
+
V
k
1
x
σWσ
V
k
1
x
+ 2V
k
1
L
k
(t, x,µ
1
,µ
2
,ν) = 0.
(16)
Rearranging and eliminating terms in (16) gives
0 = min
µ
k
U
M
k
(
O(µ
1
,µ
k
,ν
)
h
V
k
2
(t, x)
i
+
V
k
1
(t, x)
x
σ(t, x)W(t)σ(t,x)
V
k
1
(t, x)
x
)
,
(17)
The theorem is proved.
Remark: The HJB equation (17) provides a nec-
essary condition for the existence of equilibrium so-
lution of the 3-player 2
nd
cost cumulant game. The
equilibrium solution is achieved under the constraint
that V
1
1
,V
2
1
C
1,2
p
(Q) C(
¯
Q) are admissible value
functions.
4 3-PLAYER NASH STRATEGY
Theorem 4: Let V
k
1
(t, x), C
1,2
p
(Q) C(
¯
Q) be ad-
missible value functions for the k-th player; k =
Nonlinear Second Cumulant/H-infinity Control with Multiple Decision Makers
33
1,2. Also,
¯
V
1
(t, x) C
1,2
p
(Q) C(
¯
Q) is the admissible
value function for the external disturbance (ν). The
players full state-feedback Nash strategies are given
as
µ
k
(t, x) =
1
2
R
1
k
B
k
V
k
1
x
+ γ
k
2
(t)
V
k
2
x
,
ν
(t, x) =
1
2ρ
2
B
3
¯
V
1
x
+ γ(t)
¯
V
2
x
,
(18)
with V
k
j
(t
F
,x
F
) =
¯
V
j
(t
F
,x
F
) = 0 where j = 1, 2, ρ > 0
and γ
k
2
(t), γ(t) are the Lagrange multipliers. From (1),
f(.) = g(x(t)) + B
1
(x)u
1
(t) + B
2
(x)u
2
(t) + B
3
(x)v(t)
and from (3), L
k
= x(t)
Q(t)x(t) + µ
k
(x)R
k
(t)µ
k
(x),
with g :
¯
Q
0
R
n
is C
1
(
¯
Q
0
), B
i
(x(t)), i = 1,2,3 are
continuous real matrices and R
k
(t) > 0 are symmetric
matrices.
In addition, L = ρ
2
ν
(t)ν(t) z
(t)z(t) in (4) with
z(t) given in (1) and the matrices C,D
1
,D
2
are con-
tinuous real matrices of appropriate dimensions with
C
C = D
1
D
1
,D
2
D
2
= I and D
1
C = D
2
C = D
2
D
1
= 0.
Proof: The minimal 3-player 2
nd
value functions
V
1
2
(t, x),V
2
2
(t, x),
¯
V
2
(t, x) satisfy (10) with the con-
straint condition that V
1
1
(t, x),V
1
2
(t, x),
¯
V
1
(t, x) are ad-
missible value functions. Then, the value functions
V
1
1
,V
1
2
satisfy the following coupled partial differen-
tial equations for first player µ
1
:
O(µ
1
,µ
2
,ν)
V
1
1
(t, x)
+ L
1
(t, x,µ
1
,µ
2
,ν) = 0,
O(µ
1
,µ
2
,ν)
V
1
2
(t, x)
+
V
1
1
x
σWσ
V
1
1
x
= 0,
(19)
with V
1
1
(t
F
,x
F
) = V
1
2
(t
F
,x
F
) = 0. Similarly, the value
functions V
2
1
,V
2
2
satisfy the following coupled partial
differential equations for second player µ
2
:
O(µ
1
,µ
2
,ν)
V
2
1
(t, x)
+ L
2
(t, x,µ
1
,µ
2
,ν) = 0,
O(µ
1
,µ
2
,ν)
V
2
2
(t, x)
+
V
2
1
x
σWσ
V
2
1
x
= 0,
(20)
with V
2
1
(t
F
,x
F
) = V
2
2
(t
F
,x
F
) = 0. Similarly, the value
functions
¯
V
1
,
¯
V
2
satisfy the following coupled partial
differential equations for disturbance player ν:
O(µ
1
,µ
2
,ν
)[
¯
V
1
(t, x)] + L(t, x,µ
1
,µ
2
,ν) = 0,
O(µ
1
,µ
2
,ν
)[
¯
V
2
(t, x)] +
¯
V
1
x
σWσ
¯
V
1
x
= 0,
(21)
with
¯
V
1
(t
F
,x
F
) =
¯
V
2
(t
F
,x
F
) = 0.
Applying Lagrange multiplier method, let
G
1
(µ
1
,µ
2
,ν) be formulated by converting the con-
strained coupled HJB equations (19) to unconstrained
coupled HJB equations as follows:
G
1
(µ
1
,µ
2
,ν) = O
V
1
2
+
V
1
1
x
σWσ
V
1
1
x
+ λ
1
1
(t)
O
V
1
1
+ L
1
(t, x,µ
1
,µ
2
,ν)
,
(22)
where λ
1
1
(t) is time-varying Lagrange multiplier.
Similarly, let G
2
(µ
1
,µ
2
,ν) be formulated by convert-
ing the constrained coupled HJB equation (20) to un-
constrained coupled HJB equations as follows:
G
2
(µ
1
,µ
2
,ν) = O
V
2
2
+
V
2
1
x
σWσ
V
2
1
x
+ λ
2
1
(t)
O
V
2
1
+ L
2
(t, x,µ
1
,µ
2
,ν)
,
(23)
where λ
2
1
(t) is time-varying Lagrange multiplier.
Similarly, let G(µ
1
,µ
2
,ν) be formulated by con-
verting the constrained coupled HJB equation (21) to
unconstrained coupled HJB equations as follows:
G(µ
1
,µ
2
,ν) = O [
¯
V
2
] +
¯
V
1
x
σWσ
¯
V
1
x
+ λ(t)
O [
¯
V
1
] + L(t,x,µ
1
,µ
2
,ν)
,
(24)
where λ(t) is time-varying Lagrange multiplier.
At equilibrium state, the stationary con-
ditions are given by the partial derivative of
G
1
(µ
1
,µ
2
,ν), G
2
(µ
1
,µ
2
,ν), G(µ
1
,µ
2
,ν) in (22), (23),
(24), with respect to µ
1
,λ
1
1
(t), µ
2
,λ
2
1
(t), ν,λ(t), which
is zero. Thus, the full-state feedback Nash strategies
µ
1
,µ
2
,ν
become
µ
1
(t, x) =
1
2
R
1
1
B
1
V
1
1
x
+
1
λ
1
1
(t)
V
1
2
x
,
µ
2
(t, x) =
1
2
R
1
2
B
2
V
2
1
x
+
1
λ
2
1
(t)
V
2
2
x
,
ν
(t, x) =
1
2ρ
2
B
3
¯
V
1
x
+
1
λ(t)
¯
V
2
x
.
(25)
Now, let the Lagrange multipliers in (25) be defined
as
γ
1
2
(t) =
1
λ
1
1
(t)
,γ
2
2
(t) =
1
λ
2
1
(t)
,γ(t) =
1
λ(t)
.
(26)
Then, substituting (26) in (25) gives
µ
1
(t, x) =
1
2
R
1
1
B
1
V
1
1
x
+ γ
1
2
(t)
V
1
2
x
,
µ
2
(t, x) =
1
2
R
1
2
B
2
V
2
1
x
+ γ
2
2
(t)
V
2
2
x
,
ν
(t, x) =
1
2ρ
2
B
3
¯
V
1
x
+ γ(t)
¯
V
2
x
.
(27)
ICINCO 2016 - 13th International Conference on Informatics in Control, Automation and Robotics
34
Thus, substituting for µ
1
,µ
2
,ν
to the 3-player 2
nd
cost cumulant HJB equations (10) gives the closed
loop system form of the second cumulant/H
control.
The theorem is proved.
Remark: The coupled cost cumulant HJB equa-
tion (10) provides the necessary condition for the
Nash equilibrium solution of the 3-player second
cumulant/H
control.
However, substituting for µ
k
in (19) or (20) for the
first cumulant HJB equation (first line of (19) or (20))
gives
V
k
1
t
+ g
(x)
V
k
1
x
+
1
4
V
k
1
x
B
k
R
1
k
B
k
V
k
1
x
1
2
V
1
1
x
+ γ
1
2
V
1
2
x
!
B
1
R
1
1
B
1
V
k
1
x
1
2
V
2
1
x
+ γ
2
2
V
2
2
x
!
B
2
R
1
2
B
2
V
k
1
x
1
2ρ
2
¯
V
1
x
B
3
B
3
V
k
1
x
γ
2ρ
2
¯
V
2
x
×
B
3
B
3
V
k
1
x
+
(γ
k
2
)
2
4
V
k
2
x
B
k
R
1
k
B
k
V
k
2
x
+
γ
k
2
2
V
k
1
x
B
k
R
1
k
B
k
V
k
2
x
+ x
Qx+
1
2
tr
σWσ
2
V
k
1
x
2
= 0.
(28)
Also, substituting for µ
k
in (19) or (20) for the second
cumulant HJB equation (second line of (19) or (20))
gives
V
k
2
t
+ g
(x)
V
k
2
x
1
2ρ
2
¯
V
1
x
B
3
B
3
V
k
2
x
γ
2ρ
2
¯
V
2
x
B
3
B
3
V
k
2
x
1
2
V
1
1
x
+ γ
1
2
V
1
2
x
!
B
1
R
1
1
B
1
V
k
2
x
1
2
V
2
1
x
+ γ
2
2
V
2
2
x
!
B
2
R
1
2
B
2
V
k
2
x
γ
k
2
2
V
k
2
x
B
k
R
1
k
B
k
V
k
2
x
+
V
k
1
x
σWσ
V
k
1
x
+
1
2
tr
σWσ
2
V
k
2
x
2
= 0.
(29)
Similarly, substituting ν
in (21) for the first and sec-
ond cumulant HJB equations will yield closed-loop
equations as in (28) and (29). Thus, the resulting six
(6) coupled HJB equations are solved for the value
functions V
k
1
,V
k
2
,
¯
V
1
,
¯
V
2
.
Remark: The minimal second cumulant strate-
gies are found under constrained first cumulant at
constrained worst case disturbance related by the cost
function (4).
5 APPROXIMATE SOLUTION
The analytical solutions of HJB equations (19), (20),
(21) are difficult to find for nonlinear systems. Several
approximate methods such as power series, spectral
and pseudo-spectral, wavelength, path integral and
neural network methods have been utilized to solve
coupled HJB equations (Al’brekht, 1961), (Beard
et al., 1998), (Song and Dyke, 2011), (Kappen, 2005),
(Chen et al., 2007). In this paper, neural network ap-
proximate method is applied to solve the HJB equa-
tion. A polynomial series function is utilized to ap-
proximate the value function using the method of least
squares on a pre-defined region. The value functions
V
k
i
,
¯
V
i
in (19), (20), (21) can be approximated as
V
k
i
(t, x) = V
k
iL
(t, x) = w
iL
(t)Λ
iL
(x) =
L
i=1
w
i
(t)γ
i
(x)
on t over a compact set R
n
. Using the ap-
proximated value functions V
k
iL
(t, x) in the HJB equa-
tions result in residual error equations. Then weighted
residual method (Finlayson, 1972) is applied to min-
imize the residual error equations and then numeri-
cally solve for the least square w
iL
(t) weights. See
(Chen et al., 2007) for details.
6 SIMULATION RESULTS
Consider a 3-player nonlinear stochastic system with
full-state feedback information. The stochastic sys-
tem is represented as
dx(t) =
5x(t) + x
3
(t) + 3u
1
(t) + 2u
2
(t) + 1.5v(t)
dt
+ x(t)dw(t),
(30)
with the state variable defined as x(t). The three
players are u
1
(t), u
2
(t), v(t), where u
1
(t), u
2
(t) are the
controls while v(t) is the external disturbance. The
initial state condition is given as x(0) = 0.5 and dw(t)
in (30) is a Gaussian process with mean E{dw(t)} =
0, and covariance E{dw(t)dw(t)
} = 0.01. The first
player cost function J
1
is
J
1
(t
0
,x(t),u
1
(t)) =
Z
t
F
t
0
h
x
2
(t) + u
2
1
(t)
i
dt + ψ
1
(x(t
F
)),
(31)
Nonlinear Second Cumulant/H-infinity Control with Multiple Decision Makers
35
Time (sec)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
State
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
State, γ = 10, γ
1
2
= γ
2
2
= 1
x1
(a) State trajectory
Time (sec)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Control and Disturbance Input
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Inputs, γ = 10, γ
1
2
= γ
2
2
= 1
first player - u
1
second player - u
2
third player - v
(b) Input trajectory
Figure 1: 3-Player State Trajectory and Optimal Input
Strategies.
where ψ
1
(x(t
F
)) = 0 is the terminal cost and the sec-
ond player cost function J
2
is
J
2
(t
0
,x(t),u
2
(t)) =
Z
t
F
t
0
h
x
2
(t) + u
2
2
(t)
i
dt + ψ
2
(x(t
F
)),
(32)
where ψ
2
(x(t
F
)) = 0 is the terminal cost and the third
player cost function J is
J(t
0
,x(t), u
1
(t),u
2
(t),v(t)) =
Z
t
F
t
0
h
ρ
2
v
2
x
2
(t) + u
2
1
(t)
+ u
2
2
(t)
i
dt +ψ(x(t
F
)),
(33)
where ψ(x(t
F
)) = 0 is the terminal cost. The attenua-
tion level is set at ρ = 1. In the simulation, the asymp-
totic stability region for state was arbitrarily chosen as
1 x 1. The final time t
F
was 5 seconds and ex-
ternal disturbance was v(t) = 0.5cos(t)exp(-t).
Fig. 1(a) shows the state trajectory for noise influ-
ence with variance σ
2
= 0.01 for the 2
nd
cumulant/H
game control. The state is bounded and converged to
value close to the origin. It should be noted from Fig.
1(b), that the Nash equilibrium controls for the two
player is solved by selecting γ, γ
1
2
and γ
2
2
where the
value functions are minimum which in our case were
γ = 10, γ
1
2
= 1 and γ
2
2
= 1. In addition, we have the de-
sign freedom in γ
k
2
and γ values selection to enhance
system performance at the chosen attenuation level.
Remark: The second cumulant Nash strategy is
found within all admissible first cumulant strategy. A
closer look at the state trajectory 1(a) and players tra-
jectory 1(b) show that convergence to the origin is
gradual. Additional investigation is required to ver-
ify convergence rate at different attenuation levels.
7 CONCLUSION
In this paper, finite-time higher-order control with
multiple players was investigated for a nonlinear
stochastic system. The second cumulant/H
con-
trol problem which is a generalization of higher-order
multi-objective control problem was analyzed and the
necessary condition for the existence of Nash equi-
librium solution was given. A 3-player optimal strat-
egy was derived where a Nash game approach was
taken to minimize the different orders of the cost cu-
mulants of the players. A nonlinear example problem
was solved to evaluate the theoretical concepts. As
a future work, a more practical system example and
improved numerical approaches for fast convergence
will be explored.
REFERENCES
Aduba, C. and Won, C.-H. (2015). Two-Player Ad Hoc
Output-Feedback Cumulant Game Control. In Pro-
ceedings of the 12
th
International Conference On In-
formatics in Control, Automation and Robotics, pages
53–59, INSTICC, IFAC, Colmar, Alsace, France.
Al’brekht, E. G. (1961). On the Optimal Stabilization of
Nonlinear Systems. Journal of Applied Mathematics
and Mechanics, 25(5):836–844.
Arnold, L. (1974). Stochastic Differential Equations: The-
ory and Applications. John Wiley & Sons Inc., New
York, NY.
Basar, T. and Olsder, G. J. (1999). Dynamic Noncooperative
Game Theory. SIAM, Philadelphia, PA.
Bauso, D., Giarr´e, L., and Pesenti, R. (2008). Consensus in
non-cooperative dynamic games: A multiretailer in-
ventory application. IEEE Transactions on Automatic
Control, 53(4):998–1003.
Beard, R. W., Saridis, G. N., and Wen, J. T. (1998). Ap-
proximate Solutions to the Time-Invariant Hamilton-
ICINCO 2016 - 13th International Conference on Informatics in Control, Automation and Robotics
36
Jacobi-Bellman Equation. PMM - Journal of Opti-
mization Theory and Applications, 96(3):589–626.
Bernstein, D. S. and Hassas, W. M. (1989). LQG Control
with an H
Performance Bound: A Riccati Equation
Approach. IEEE Transactions on Automatic Control,
34(3):293–305.
Charilas, D. E. and Panagopoulos, A. D. (2010). A sur-
vey on game theory applications in wireless networks.
Computer Networks, 54(18):3421–3430.
Chen, T., Lewis, F. L., and Abu-Khalaf, M. (2007). A
Neural Network Solution for Fixed-Final Time Op-
timal Control of Nonlinear Systems. Automatica,
43(3):482–490.
Finlayson, B. A. (1972). The Method of Weighted Residu-
als and Variational Principles. Academic Press, New
York, NY.
Fleming, W. H. and Rishel, R. W. (1975). Determinis-
tic and Stochastic Optimal Control. Springer-Verlag,
New York, NY.
Kappen, H. J. (2005). A Linear Theory for Control of Non-
linear Stochastic Systems. Physical Review Letters,
95(20).
Lee, J., Won, C., and Diersing, R. (2010). Two Player Sta-
tistical Game with Higher Order Cumulants. In Proc.
of the American Control Conference, pages 4857–
4862, Baltimore, MD.
Limebeer, D. J. N., Anderson, B. D. O., and Hendel, D.
(1994). A Nash Game Approach to Mixed H
2
/H
control. IEEE Transactions on Automatic Control,
39(1):69–82.
Sain, M. K. (1966). Control of Linear Systems According
to the Minimal Variance Criterion—A New Approach
to the Disturbance Problem. IEEE Transactions on
Automatic Control, AC-11(1):118–122.
Sain, M. K. and Liberty, S. R. (1971). Performance Measure
Densities for a Class of LQG Control Systems. IEEE
Transactions on Automatic Control, AC-16(5):431–
439.
Sain, M. K., Won, C.-H., Spencer, Jr., B. F., and Liberty,
S. R. (2000). Cumulants and risk-sensitive control:
A cost mean and variance theory with application to
seismic protection of structures. In Filar, J., Gaitsgory,
V., and Mizukami, K., editors, Advances in Dynamic
Games and Applications, volume 5 of Annals of the
International Society of Dynamic Games, pages 427–
459. Birkhuser Boston.
Smith, P. J. (1995). A Recursive Formulation of the Old
Problem of Obtaining Moments from Cumulants and
Vice Versa. The American Statistician, (49):217–219.
Song, W. and Dyke, S. J. (2011). Application of Pseu-
dospectral Method in Stochastic Optimal Control of
Nonlinear Structural Systems. In Proc. of the Ameri-
can Control Conference, pages 4857–4862, San Fran-
cisco, CA.
Won, C.-H., Diersing, R. W., and Kang, B. (2010). Sta-
tistical Control of Control-Affine Nonlinear Systems
with Nonquadratic Cost Function: HJB and Verifica-
tion Theorems. Automatica, 46(10):1636–1645.
Nonlinear Second Cumulant/H-infinity Control with Multiple Decision Makers
37