termined transition law, the system gets to a new state.
The sequence of controls is called policy, and a way of
assessing their quality is through a performance crite-
rion. The Optimal Control Problem (OCP) consists
in determining a policy which optimizes the perfor-
mance criterion. One way to solve the OCP is using
the technique of dynamic programming introduced by
Bellman in the middle of the last century.
From this perspective, the problem of dividends
is modeled here by using discrete-time MDPs. It is
proposed to work within MDPs since similar con-
trol problems of dams or inventories, sample stor-
age problems, have been resolved successfully, see
(Finch, 1960) and (Ghosal, 1970). On the other hand,
discrete-time is used here as it was suggested in (Li
et al., 2009). This type of analysis is important in it-
self as it presents an approximation of the continuous
problem and as it is also more realistic from the ap-
plications point of view. One approach that will be
followed in this work is to study the problem of div-
idends by fixing an objective capital, (barrier) Z > 0.
If the reserve exceeds Z, then the dividends are dis-
tributed. A model with a fixed barrier reserve of an
insurance company is proposed. The reserve process
is modelled as an MDP whose admissible control be-
longs to a compact subset. The bounds of this sub-
set depend on two principles for premium calculation:
the expectation principle and the standard deviation
principle (see (Dickson, 2005)). The distribution of
the total amount of claims, by time interval, repre-
sents a compound process which is supposed to be
general, in the sense that it only requires for its den-
sity to be continuous almost everywhere. The pro-
posed performance criterion is the expected total dis-
counted cost, where the cost penalizes both the fail-
ure to pay dividends and the difference between the
admissible premiums and a constant which depends
on the standard deviation principle to premium calcu-
lation. In addition, the dynamic programming tech-
nique explicitly determines the optimal solutions, and
on the other hand, a rate for the ruin probability is
established, which aims to determine long periods of
sustainability of the company.
The paper is organized as follows: in the sec-
ond section the mathematical tools that will be used
throughout this work (mainly MDPs and stochastic
orders) are presented. The reserve process with a
fixed barrier is presented in the third section with an
analysis of dividend policies. In the fourth and fifth
sections the main results are given: the optimal pre-
mium and a rate for the ruin probability with a couple
of examples where the theory obtained in this work is
applied. Finally, research conclusions are presented.
2 PRELIMINARIES
This section presents some results on the theory that
will be used to solve the problem stated in the paper.
2.1 Stochastic Orders
Let X be a Borel space (i.e., a Borel subset of a separa-
ble metric space) and suppose that X is complete and
partially ordered. The partial order in X is denoted by
≺. Moreover a function g : X →R is considered to be
increasing if x, y ∈ X, x ≺ y, imply that g(x) ≤ g(y),
where ≤ is the usual order in R. Besides, the Borel
σ-algebra of X is denoted by B(X).
Definition 2.1. Let X be a complete Borel space and
suppose that X is partially ordered. Let P and P
0
be
probability measures on (X,B (X )). It is said that P
0
dominates P stochastically if
R
gdP ≤
R
gdP
0
for all
g : X → R measurable, bounded and increasing, so
write P
st
≤ P when this holds.
Remark 2.2. Let P and P
0
be probability measures on
(R,B(R)). In this case, P
st
≤P
0
if F
0
(x) ≤F(x), for all
x ∈R, where F and F
0
are the distribution functions of
P and P
0
, respectively, (see (Lindvall, 1992) p. 127).
Lemma 2.3. ((Cruz-Su
´
arez et al., 2004), Lemma 2.6)
Let X be a complete Borel space, and suppose also
that X is partially ordered. Let P and P
0
be proba-
bility measures on (X, B (X)), such that, P
st
≤P
0
. Then
R
H
∗
dP ≤
R
H
∗
dP
0
, for H
∗
: X → R which is measur-
able, nonnegative, nondecreasing, and (possibly) un-
bounded.
2.2 Discounted Markov Decision
Processes
Let X and Y be complete Borel spaces. A stochas-
tic kernel on X given Y is a function P(·|·) such that
P(·|y) is a probability measure on X for each fixed
y ∈ Y, and P(B|·) is a measurable function on Y for
each fixed B ∈ B(X).
Let (X ,A, {A(x)|x ∈ X}, Q,c) be a discrete-time
Markov Control Model (see (B
¨
auerle and Rieder,
2011) or (Hern
´
andez-Lerma and Lasserre, 1996) for
notation and terminology). This model consists of the
state space X, the control set A, the transition law Q,
and the cost-per-stage c. For each x ∈ X, there is a
nonempty measurable set A(x) ⊂ A whose elements
are the feasible actions when the state of the system
is x. Define K := {(x,a) : x ∈ X, a ∈ A(x)} . c is as-
sumed to be a nonnegative and measurable function
on K.
Optimal Policies for Payment of Dividends through a Fixed Barrier at Discrete Time
141