A FORWARD-BACKWARD ALGORITHM FOR STOCHASTIC

CONTROL PROBLEMS

Using the Stochastic Maximum Principle as an Alternative to Dynamic

Programming

Stephan E. Ludwig

, Justin A. Sirignano

, Ruojun Huang

and George Papanicolaou

Department of Mathematics, Heidelberg University, INF 288, Heidelberg, Germany

Department of Management Science and Engineering, Stanford University, Stanford, U.S.A.

Department of Statistics, Stanford University, Stanford, U.S.A.

Department of Mathematics, Stanford University, 380 Sloan Hall, Stanford, U.S.A.

Keywords:

Optimal stochastic control, Stochastic maximum principle, Forward-Backward stochastic differential equa-

tions.

Abstract:

An algorithm for solving continuous-time stochastic optimal control problems is presented. The numeri-

cal scheme is based on the stochastic maximum principle (SMP) as an alternative to the widely studied dy-

namic programming principle (DDP). By using the SMP, (Peng, 1990) obtained a system of coupled forward-

backward stochastic differential equations (FBSDE) with an external optimality condition. We extend the

numerical scheme of (Delarue and Menozzi, 2006) by a Newton-Raphson method to solve the FBSDE system

and the optimality condition simultaneously. As far as the authors are aware, this is the ﬁrst fully explicit

numerical scheme for the solution of optimal control problems through the solution of the corresponding

extended FBSDE system. We discuss possible numerical advantages to the DDP approach and consider an

optimal investment-consumption problem as an example.

1 INTRODUCTION

We consider continuous-time stochastic control prob-

lems where the state variable is a controlled stochastic

process of Markovian type and the objective function

depends on the state and on the control. These type

of problems typically appear in mathematical ﬁnance

and economics. The most common method to solve

these problems is the dynamic programming princi-

ple (DPP), which leads to the well-known Hamilton-

Jacobi-Bellman (HJB) equation. Various numerical

schemes take advantage of the DDP’s discrete version

by performing a backward algorithms or directly solv-

ing the HJB partial differential equation using a ﬁnite

difference scheme.

In this paper, we consider an alternative approach

to the problem based on the stochastic maximum

principle (SMP), which leads to a system of cou-

pled forward-backward stochastic differential equa-

tions (FBSDE) plus an external optimality condition.

This was ﬁrst studied by (Peng, 1990). It is well

known that a quasilinear PDE has a FBSDE repre-

sentation, which is an extension of the well-known

Feynman-Kac formula. However, the FBSDE repre-

sentation cannot be directly applied to the HJB equa-

tion unless the optimal control is known as an explicit

function. Instead, by using the SMP, we obtain a cou-

pled FBSDE system for the adjoint equations. The

coupling arises through the additional optimality con-

dition only.

In addition to reviewing brieﬂy the connection be-

tween stochastic control problems and FBSDE sys-

tems, the main objective of this paper is to present a

complete numerical algorithm by obtaining approx-

imate solutions to a certain class of optimal control

problems. We need to use advanced numerical meth-

ods for the FBSDE because of the coupling which

arises from dependence of the state process on the

controls and is therefore connected to the controlled

objective function. Therefore, we take advantage of

an existing numerical scheme for coupled FBSDEs,

initially proposed by (Delarue and Menozzi, 2006)

and extend it to satisfy the optimality condition.

The paper’s outline is as follows. After the prob-

E. Ludwig S., A. Sirignano J., Huang R. and Papanicolaou G..

A FORWARD-BACKWARD ALGORITHM FOR STOCHASTIC CONTROL PROBLEMS - Using the Stochastic Maximum Principle as an Alternative to

Dynamic Programming.

DOI: 10.5220/0003885900830089

In Proceedings of the 1st International Conference on Operations Research and Enterprise Systems (ICORES-2012), pages 83-89

ISBN: 978-989-8425-97-3

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

lem deﬁnition in section 2, we brieﬂy derivethe corre-

sponding FBSDE representation for the adjoint equa-

tions and state a veriﬁcation theorem in section 3. In

section 4 we discretize the time-continuous problem

and provide the numerical scheme using a Markov

chain approximation and a Newton-Raphson method

for the optimization. A brief comparison of the com-

putational costs between our method and the stan-

dard dynamic programming approach together with

ﬁrst results from an application on an investment-

consumption problem are presented in section 5. The

paper concludes with some outlook to further devel-

opment.

2 PROBLEM STATEMENT

Throughout the paper we assume a given probability

space (Ω, F, P), endowed with a d-dimensional Brow-

nian motion (W

)

t≥0

, whose natural ﬁltration is de-

noted by {F

}

t≥0

Consider the following problem. The dynamics of

a controlled diffusion process X

, which represents the

state of our system, are given by:

= b(X

, π

)dt+σ(X

, π

)dW

, X

= x∈R

, (1)

and the goal is to maximize a given objective function

with ﬁnite time horizon [0, T] over admissible con-

trols

π = {π

}

t∈[0,T]

∈ A:

J(t, x,

π) := E



f(s, X

, π

)ds+ g(X

)



= x



(2)

Here, A is the set of all progressively F

-measurable

controls which takes its values π

in a compact set

A ⊂R

. If it exists, we will denote the optimal control

by:

∗

:= argmax

π∈A

J(t, x,

π), ∀(t, x) ∈ [0, T] ×R

, (3)

and the value function by:

v(t, x) := sup

∈A

J(t, x,

π), ∀(t, x) ∈ [0, T] ×R

(4)

2.1 General Conditions

For each ﬁxed π ∈ A, we ensure the existence of a

unique solution to the controlled forward SDE (1) by

the following assumptions:

• b(·, π), σ(·, π) are uniformly Lipschitz continuous

with respect to x:

∀π ∈ A, d|b(x

, π) −b(x

, π)|+ |σ(x

, π) −σ(x

, π)|

≤C|x

−x

(5)

• b(·, π), σ(·, π) satisfy a linear growth condition

with respect to x:

∀π ∈ A, |b(x, π)|+ |σ(x, π)| ≤C(1+ |x|).

(6)

To ensure the boundedness of the objective function

(2) we further assume:

• f(t,·,π),g(·) satisfy a quadratic growth condition:

∀t ∈ [0, T], ∀π ∈ A,

|g(x)|+ |f(t, x, π)| ≤C(1+ |x|

), ∀x ∈R

(7)

Both proofscan be found in (Pham, 2009) chapter 3.2.

For further proceedings we assume also that:

• b, σ, f, g are twice continuously differentiable

with respect to x and π:

∀t ∈ [0, T], (b, σ, f, g)(t, ·, ·) ∈C

1,2

, A),

(8)

• f, g are uniformly concave with respect to x and

π.

To be explicit, we use a Markov Chain approxima-

tion in section 4.1. In order to calculate the Brown-

ian increments for this Markov chain approximation

in (36), σ needs to be invertible. This also means that

d = n. Otherwise, we can use Quantization methods

or Monte Carlo simulations to calculate expectations

instead. This would make no difference to the general

scheme.

3 THE STOCHASTIC MAXIMUM

PRINCIPLE

3.1 Derivation of the FBSDE

Following (Pham, 2009), let us suppose there exists

a unique solution v ∈ C

1,3

([0, T) ×R

) ∩C

([0, T] ×

) to (4) and an optimal control

∗

∈ A described in

(3) with associated controlled diffusion

satisfying

(1).

The adjoint equations can be derived in two basic

steps, namely 1) derive the HJB equation at the opti-

mal control with respect to x:

∂



∂

v(t,

) + G(t,

, π

∗

, ∇

v(t,

), ∇

v(t,

))



= 0,

(9)

where G : [0, T] ×R

×A ×R

×R

n×n

→ R is given

by:

G(t,x,π, p, M) := b(x, π)p+

tr(σσ

′

(x, π)M)

+ f(t, x, π).

(10)

ICORES 2012 - 1st International Conference on Operations Research and Enterprise Systems

2) Apply Ito’s formula to ∇

v(t,

) and plug in the

above relation (9). After a few calculations, the ad-

joint equations come out as:

−d ∇

v(t,

) =

∇

H(t,

, π

∗

, ∇

v(t,

), ∇

v(t,

)σ(

, π

∗

))dt

−∇

v(t,

)σ(

, π

∗

)dW

∇

v(T,

) = ∇

where the so called Hamiltonian H : [0, T] ×R

×A×

×R

n×d

→ R is deﬁned by:

H(t, x, π, y, z) := b(x, π)y+ tr[σ

′

(x, π)z] + f(t, x, π).

(11)

Furthermore, since G is continuously differen-

tiable with respect to π, we get:

0 = ∂

G(t,

, π

∗

, ∇

v(t,

), ∇

v(t,

))

= ∂

H(t,

, π

∗

, ∇

v(t,

), ∇

v(t,

)σ(x, π)).

Assuming concavity of H with respect to π, H must

attain a maximum at π

∗

Let us summarize the above results. Under the

above assumptions and H being concave with respect

to π, the triple:

(

) :=



, ∇

v(t,

), ∇

v(t,

)σ(

, π

∗

)



is the unique solution to the coupled FBSDE system:

= x+

b(X

, π

∗

)ds+

σ(X

, π

∗

)dW

= ∇

g(X

) +

∇

H(s, X

, π

∗

, Z

)ds

−

(12)

such that the following optimality condition holds:

∗

= argmax

∈A

H(t,

, π

). (13)

3.2 Veriﬁcation Theorem

Theorem 1. Suppose that there exists a unique so-

lution v ∈ C

1,3

([0, T] ×R

, R) of the value function

(4). Let (

) and the control

∗

:= {π

∗

}

t∈[0,T]

associated solutions to the FBSDE system (12) such

that the optimality condition (13) holds. Additionally

assume that g(·) and H(t, ˙,·,

) are uniformly con-

cave in (x, π). Then:

•

∗

is the optimal control of the stochastic control

problem (4) and

is the solution of the associated

controlled state process (1),

• (

) =



∇

v(t,

), ∇

v(t,

)σ(

, π

∗

)



Proof. The proof of the ﬁrst statement is given in

(Pham, 2009) Theorem 6.5.4.

For the second step we deﬁne a function u(t,

)

via its ﬁrst derivative ∇

u(t,

) :=

and its terminal

value u(T,

) := g(

) and apply Ito’s formula to

∇

u. Comparing the diffusion term with the backward

SDE for

we get

= ∇

u(t,

)σ(

, π

∗

). Compar-

ing the drift terms we get a third order PDE which

is exactly the derivative of the HJB equation with re-

spect to x. Since the solution of this PDE is unique,

the veriﬁcation is completed by using the veriﬁcation

theorem for the HJB equation.

The concavity of H is an important condition for

the connection of the optimal control problem and the

optimality condition for the FBSDE. This condition

mainly speciﬁes the problem class for applications.

Recall that we already assumed f, g to be concave in

section 2.1.

4 A NUMERICAL SCHEME FOR

SOLVING THE FBSDE

Let us state the complete (coupled) FBSDE problem

one more time:

= X

b(X

, π

∗

)ds+

σ(X

, π

∗

)dW

= Y

∇

H(s, X

, π

∗

, Z

)ds−

∗

= argmax

∈A

H(t, X

, π

, Z

(14)

where X

= x and Y

= ∇

g(X

4.1 The Discrete Problem

Following (Kushner and Dupuis, 1992), let us deﬁne

a ﬁxed, scalar approximation parameter h > 0. In the

following, the superscript h denotes the dependency

on this approximation parameter. Let ∆t

> 0, for k =

0, ..., N −1, N < ∞, discretize the time interval [0, T]

by deﬁning:

:= 0, t

:= ∆t

, t

k−1

∑

i=0

∆t

, t

:= T.

Suppose that ∆t

→ 0, as h →0. Let:



(ξ

)

j∈I

, I

⊂ N



⊂ R

, ∀k = 0..N, (15)

be a spatial grid satisfying C

⊂ C

, for all

j < i. Furthermore, we denote the set of ad-

missible, piecewise constant controls by A

{

π ∈ A |π

is constant over [t

k+1

), ∀k ≤N −1}.

To calculate the propagation of the state process

from time t

to t

k+1

, we choose the Markov chain

approximation method here. As mentioned above,

we can use different methods like Quantization. The

Markov chain is deﬁned by its transition probabilities:

∀k = 0, ..., N −1, ∀ξ

∈ C

, ∀ξ

∈ C

k+1

P(ξ

, ξ

|π

(ξ

)) = p

(π

(ξ

)).

(16)

A FORWARD-BACKWARD ALGORITHM FOR STOCHASTIC CONTROL PROBLEMS - Using the Stochastic

Maximum Principle as an Alternative to Dynamic Programming

We denote ∆ξ

= ξ

k+1

−ξ

. The discrete Markov

chain approximation ξ

converges to the real state

process (1) as h → 0, if the following local consis-

tency conditions hold:

∆ξ

= b(ξ

, π

)∆t

(ξ

, π

) + o(∆t

(ξ

, π

)),

Var

∆ξ

= [σσ

′

](ξ

, π

)∆t

(ξ

, π

) + o(∆t

(ξ

, π

)),

sup

k,ω

|∆ξ

| → 0, as h → 0,

(17)

where E

is the conditioned expectation given all in-

formation up to time k and Var

is the variance accord-

ingly. For methods to derive proper transition proba-

bilities see (Kushner and Dupuis, 1992).

For any control

∈ A

, we deﬁne the following

approximation of the objective function (2):

, x,

) =



∑

N−1

i=k

f(t

, ξ

, π

)∆t

+ g(ξ

)



= x



(18)

and the approximation of the value function (4) by:

(x) = max

∈A

, x,

). (19)

4.2 A Controlled Forward-Backward

Algorithm

Starting from ﬁnal time T going backwards, let us

suppose we have already calculated the approxima-

tions Y

k+1

(·), Z

k+1

(·) for (41). Now, let us use these

functions as natural predictors for the still unknown

(·) and Z

(·) respectively. Then ∀ξ

∈ C

we cal-

culate:

∗

(ξ

) = argmax

π∈A

H(t

, ξ

, π,Y

k+1

(ξ

), Z

k+1

(ξ

)).

(20)

For notational simplicity we denote π

∗

(ξ

) by π

j∗

. If

we think of the control π

∗

in (41) as being a function

of t, X

, Z

∗

(t, X

, Z

) = argmax

π∈A

H(t

, X

, π,Y

, Z

), (21)

we can replace the control variable by this function in

the FBSDE system (41) and receive a fully coupled

FBSDE system without control.

To solve the coupled FBSDE system we make

use of existing numerical schemes. In particular, we

choose the algorithm for coupled FBSDE systems

proposed by (Delarue and Menozzi, 2006) using the

pre-calculated controls π

j∗

(20). The algorithm then

reads as follows:

(ξ

) = f(t

, ξ

, π

j∗

)∆t

∑

∈C

k+1

(π

j∗

k+1

(ξ

(22)

(ξ

) = ∇

H(t

, ξ

, π

j∗

k+1

(ξ

), Z

k+1

(ξ

))∆t

∑

∈C

k+1

(π

j∗

k+1

(ξ

(23)

(ξ

) =

∆t

∑

∈C

k+1

(π

j∗

k+1

(ξ

)∆W

, (24)

where ∆W

is calculated via the Euler relationship:

∆W

= σ

−1

(ξ

, π

j∗

)(ξ

−ξ

−b(ξ

, π

j∗

)∆t

). (25)

The main contribution in our paper is the explicit

pre-calculation of π

∗

via (20). This is the essential

step in our scheme to solve an optimal control prob-

lem through a FBSDE representation. The signiﬁ-

cance is that the optimization is performed externally

from the backward calculations for V

, Z

in (22),

(23), (24) and does not include the calculation of ex-

pectations. This will be outlined more precisely in the

following sections.

4.3 Optimization

At ﬁrst appearance, the difference of the above

method to dynamic programming is that in the for-

mer method the optimization does not have to be per-

formed over an expectation operator. Instead, opti-

mization is performed over a known explicit function

in (20) where H is given by:

b(ξ

, π)Y

k+1

+ tr[σ

′

(ξ

, π)Z

k+1

] + f(t

, ξ

, π), (26)

where Y

k+1

:= Y

k+1

(ξ

). To be explicit, we consider

the Newton method in a line search algorithm, see

(Nocedal and Wright, 2006) for details. For ﬁxed

points in space and time k, j we guess a starting point

∈ A and perform the following iteration:

i+1

= π

+ p

, for i = 1, .., m. (27)

A careful reader might notice that the step length is

1 here, which is enough in Newton’s method to hold

the Wolf conditions. In the exact Newton method, the

search directions p

∈R

are calculated by solving the

linear system:

∇

H(π

= ∇

H(π

). (28)

Since H is smooth enough and the Hessian ∇

H(π

) is

positive deﬁnite - see above assumptions - the method

converges to a local minimum and the rate of conver-

gence of {π

} is quadratic. For a proof see (Nocedal

and Wright, 2006), Theorem 3.2 and Theorem 3.5.

In detail, to solve the optimization in (20) for one

point (t

, ξ

) we denote:

H(π) := −H(t

, ξ

, π,Y

k+1

(ξ

), Z

k+1

(ξ

)) =

−b(ξ

, π)Y

k+1

−tr[σ

′

(ξ

, π)Z

k+1

] − f(t

, ξ

, π),

(29)

and calculate:

= −b

k+1

−

∑

i, j

k+1

− f

ICORES 2012 - 1st International Conference on Operations Research and Enterprise Systems

ππ

= −b

ππ

k+1

−

∑

i, j

ππ

k+1

− f

ππ

where we denote the gradients for all functions g =

b, σ, f, F by g

= ∇

g(π) and the Hessians by g

ππ

accordingly. Note that b

, b

ππ

, σ

ππ

, f

ππ

are

known continuous functions, given by the problem

deﬁnition. Then we solve the system:

ππ

(π

= H

(π

and repeat this procedure until the error is smaller

than a certain speciﬁed ε

. If b, σ, f are too complex

to derive the ﬁrst or second derivatives by hand, one

can use methods of ’automatic differentiation’ to cal-

culate them. An introduction into these methods can

be found for example in (Nocedal and Wright, 2006).

4.4 Full Algorithm

Recall that the the space grid at time point k is

indexed by I

in (15) and G is deﬁned by (10).

Therefore, the full algorithm reads as follows:

∀j ∈ I

j∗

= argmax

π∈A

G(T, ξ

, π, ∇

g(ξ

), ∇

g(ξ

)), (30)

(ξ

) = g(ξ

), Y

= ∇

g(ξ

= ∇

g(ξ

)σ(ξ

, π

j∗

(31)

∀k = N −1..0,

∀j ∈ I

set π

= π

j∗

k−1

. For i = 1..i

max

solve:

ππ

(π

= H

(π

j,i+1

= π

+ p

(32)

until ||p

|| < cε

. Set π

j∗

= π

j,i+1

(ξ

) = f(t

, ξ

, π

j∗

)∆t

∑

∈C

k+1

(π

j∗

k+1

(ξ

(33)

(ξ

) = ∇

H(t

, ξ

, π

j∗

k+1

(ξ

), Z

k+1

(ξ

))∆t

∑

∈C

k+1

(π

j∗

k+1

(ξ

(34)

(ξ

) =

∆t

∑

∈C

k+1

(π

j∗

k+1

(ξ

)∆W

, (35)

where ∆W

is calculated via the Euler relationship:

∆W

= σ

−1

(ξ

, π

j∗

)(ξ

−ξ

−b(ξ

, π

j∗

)∆t

). (36)

4.5 Known Convergence Results

For a speciﬁc subclass of stochastic control problems,

the convergence of the proposed FBSDE scheme can

be rigorously proved. Suppose that only the drift is

controlled, i.e. σ(x, ·) = σ(x). We also drop the time-

dependence in the coefﬁcients. Furthermore, suppose

that

• σσ

′

is positive deﬁnite,

• ∇

H(x, π

∗

(x, y, z), y, z) and ∇

g(x) are Lipschitz

continuous in x,

• ∇

g ∈ C

2+α

), for α > 0, and its norm in

bounded.

Note that H = by+ tr[σ

′

z] + f, is linear and Lipschitz

continuous in y, z by deﬁnition.

Also remember that we have written the optimal

control variable π

∗

as a function of the state variables

x, y, z. Then, according to (Delarue and Menozzi,

2006), the numerical FBSDE scheme converges. We

remark that a general proof of convergence for con-

trolled diffusion processes, i.e. σ(x, π), would be

analogous to proving the existence and uniqueness of

a fully nonlinear PDE. Such a proof is beyond the

scope of this paper.

5 APPLICATIONS

5.1 An Investment-consumption Model

As an example, let us consider an investment-

consumption model with convex transaction costs.

Convex transaction costs preserve the problem from

the usual bang-bang control, see (Davis and Norman,

1990). It is a reasonable assumption in certain mar-

kets, e.g. scarce commodities.

Let A

∈ R denote the portfolio owner’s mone-

tary amount of assets and let α

, β

> 0 denote his in-

vestment and deinvestment rate at time t respectively.

Let the convex functions f

, f

∈C

(R, R) determine

the transaction costs, respectively. Furthermore, let

∈ R denote the portfolio owner’s bank account,

which pays a deterministic interest rate r ≥ 0. Let

the dynamics of the states be given by:

= (µA

+ α

−β

)dt + σA

, A

= a

= (rB

−c

−α

+ β

)dt

−( f

(α

) + f

(β

))dt, B

= b

(37)

where µ, σ > 0. W

denotes a standard Brownian mo-

tion and c

≥ 0 denotes the portfolio owner’s con-

sumption at time t.

The goal of the risk averse portfolio owner is to

A FORWARD-BACKWARD ALGORITHM FOR STOCHASTIC CONTROL PROBLEMS - Using the Stochastic

Maximum Principle as an Alternative to Dynamic Programming

maximize his expected, concave utility u ∈C

(R, R)

from consumption over a given time horizon T:

J(t, a

, b

, α, β, c) =

−δt

u(c

)dt + e

−δT

u(A

+ B

)



, b

(38)

v(t, a, b) = max

(α,β,c)∈A

J(, a, b, α, β, c), (39)

where A is the set of all F

-measurable control strate-

gies, δ ≥ 0 denotes the owner’s impatience to con-

sume and E

denotes the expectation operator with

information set at time zero.

Using this problem setup, the Hamiltonian

H(t, x, y, z, α

, β

, c

) in (11) takes the following form:

H = [α

−β

+ µA



−c

−α

+ β

−( f

(α

) + f

(β

))



+(σA

+ e

−δt

u(c

(40)

For the example calculations below, we choose log-

utility and quadratic transaction costs:

u(x) = ln(x),

(a) = ca

, f

(b) = cb

, for c > 0.

Then, the adjoint equations become:

−dY



µY

+ σZ



dt −Z

, Y

−δT

−dY

= rY

dt, Y

−δT

(41)

There are similar problems that have a strictly con-

cave Hamiltonian with respect to the controls. One

example is when the trading activities α, β inﬂuence

the asset drift µ(α, β), which is called feedback con-

trol.

5.2 Numerical Results

To produce the results below, we implement the nu-

merical algorithm with Matlab using parallel process-

ing on an eight core machine.

We implemented the forward-backward (FB) al-

gorithm of section 4.4 in two ways: using a

Markov chain approximation and using a Quantiza-

tion method. Since the latter method attained bet-

ter results, we present the results of this method

here. Therefore, let ∆W

, i = 1, ..., L denote the pre-

calculated Brownian increments and let p

denote

their probabilities. For simplicity, we used a log-

scaled, time-constant grid C. Then, for a ﬁxed time

point t

, we calculated for each grid point (A

, B

) ∈C

its propagation A

j,i

k+1

= A

+ ∆A

j,i

, for each i = 1...L,

where:

∆A

= (µA

+ α

−β

)∆t + σA

√

∆t∆W

. (42)

In order to evaluate the function V,Y

, Z

the points (A

j,i

k+1

, B

k+1

) we used a linear interpola-

tion/extrapolation.

In the Figures below, we used M

= |I| = 100

space points, N = 100 time steps, L = 20 quantiza-

tion points, C ∈ [0.2, 2.2] × [0.2, 2.2] as space grid

and [t

, T] = [0, 2] as time interval. Moreover, we set

µ = 5%, r = 3%, δ = 2%, σ = 0.2 and c = 0.1%.

Figure 1 shows the consumption c

for ﬁve differ-

ent space points (A, B) over time. Figure 2 shows the

surface of the combined control α

∗

−β

∗

at time step

k = 98. The average calculation time for one time step

was 2.6 seconds.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0.5

1.5

2.5

time

consumption

(A,B)= (0.35,0.35)

(A,B)= (0.66,0.66)

(A,B)= (1.2,1.2)

(A,B)= (1.2,0.35)

(A,B)= (0.35,1.2)

Figure 1: Optimal consumption c

∗

for ﬁve different space

points, using the FB algorithm. Plots of the optimal con-

sumption of the original Merton problem are added in red

color.

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

0.5

1.5

−0.3

−0.2

−0.1

0.1

0.2

0.3

Figure 2: Optimal control α

∗

−β

∗

at time t

= 1.96 using

the FB algorithm.

The consumption in this problem is close to the

consumption of the original Merton problem without

transaction costs. A slight irregularity can be seen

for (A, B) = (0.35, 0.35) for small t. This shows the

truncation error that propagates inside the space grid

while going backward in time. The selected point is

affected ﬁrst since it is close to the lower space grid

boundary (0.2,0.2). Since

µ−r

= 0.5, the optimal con-

trols α

∗

, β

∗

should be zero whenever A = B, as in the

original Merton problem.

To compare the results with the dynamic pro-

gramming (DP) approach, we used the Matlab op-

timization toolbox. We obtained the best results

with the provided Sequential Quadratic Program-

ming (SQP) method, which is based on a line search

ICORES 2012 - 1st International Conference on Operations Research and Enterprise Systems

quasi-Newton method. Details can be found at

http://www.mathworks.com/help/toolbox/optim/. We

needed to set L = 40 and used a spline interpolation

method to get reasonable results at all.

Figure 2 show the surface of the combined, opti-

mal control α

∗

−β

∗

at time step k = 98. The calcula-

tion time for one time step was 450 seconds.

In this example, our FB algorithm is 170 times

faster than the DP method. This is due to 1) the

smaller amount of function evaluations and 2) the dif-

ferent interpolation method needed.

Moreover, the α

∗

−β

∗

surface of the FB method

in Figure 2 is smooth, while the surface of the DP

method in Figure 3 already has become rough at time

step k = 98, indicating instability. One reason may be

that in the DP method, the optimization is performed

over the highly nonlinear value function V. In our

FB algorithm, the optimization step depends on the

functions Y

and Z

only.

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

2.2

−0.3

−0.2

−0.1

0.1

0.2

0.3

Figure 3: Optimal control α

∗

−β

∗

at time t

= 1.96 using

the DP method.

5.3 Comparison of Computational Cost

We brieﬂy compare our FB algorithm with the stan-

dard dynamic programming approach. Let M

= |I

that is, M is the number of grid points in each dimen-

sion of space where we have n dimensions. Also,

let L be the number of calculated transition proba-

bilities (or, alternatively, the number of quantization

points/simulations if we use a Quantization/Monte

Carlo method). N is the number of time-steps and

let m be the number of iterations for the Newton-

Raphson method to reach a given level of accuracy

. Assuming that for a given level of accuracy,

the same time grid and space grid may be used for

both algorithms, the computational cost for the dy-

namic programming approach is NM

Lm(1 + 2r

)

while the computational cost for the FB algorithm is

[L(1+ n+ nd) + mr]. Assuming a large number

of simulations L are needed for each evaluation, the

FB algorithm is superior if:

2mr

> n(d + 1) + 1. (43)

It is clear that the FB algorithm has a signiﬁcantly

lower computational cost if a nontrivialnumber of op-

timization iterations are required. For example, in one

dimension the FB algorithm has

the computational

cost of the dynamic programming approach. The ad-

vantage of the FB algorithm is that we do not need

to optimize over the entire value function, which re-

quires one to recalculate the expectation in the value

function for at least m times. This is very computa-

tionally expensive. Instead, one only has to optimize

the Hamiltonian, which is a much simpler procedure.

6 CONCLUSIONS

We have proposed a complete numerical algorithm to

solve optimal control problems through the associated

FBSDE system. By complete we mean that the algo-

rithm explicitly includes the optimization step. Our

numerical approach is an alternative to standard dy-

namic programming methods. A comparison of com-

putational cost between the dynamic programming

method and the FBSDE method illustrate the advan-

tages of the FBSDE approach.

We included results of a numerical example that

commonly appears in ﬁnance and economics. These

results conﬁrm the advantage in accuracy and compu-

tational efﬁciency of the FB algorithm compared to

the dynamic programming method for certain prob-

lem classes.

A next step would be to analyze the convergence

speed and the convergenceerror in theory and practice

in detail.

REFERENCES

Davis, M. and Norman, A. (1990). Portfolio selection with

transaction costs. In Mathematics of Operation Re-

search, Vol. 15, No. 4. The lnstitute of Management

Sciences/Operations Research Society of America.

Delarue, F. and Menozzi, S. (2006). A forward-backward

stochastic algorithm for quasi-linear pdes. In The An-

nals of Applied Probability, Vol. 16, No. 1, 140184.

Universite Paris.

Kushner, H. and Dupuis, P. (1992). Numerical Methods

for Stochastic Control Problems in Continuous Time.

Springer, London, 1st edition.

Nocedal, J. and Wright, S. (2006). Numerical Optimization.

Springer Series in Operation Research and Financial

Engineering, London, 2nd edition.

Peng, S. (1990). A general stochastic maximum principle

for optimal control problems. In Journal of Control

and Optimization, Vol. 28, No. 4, pp.966-979. SIAM.

Pham, H. (2009). Continuous-time stochastic control and

optimization with ﬁnancial applications. In Stochas-

tic Modeling and Applied Probability 61. Springer-

Verlag Berlin Heidelberg.

A FORWARD-BACKWARD ALGORITHM FOR STOCHASTIC CONTROL PROBLEMS - Using the Stochastic

Maximum Principle as an Alternative to Dynamic Programming