Innovation Cycles Control through Markov Decision Processes

Vassil S. Sgurev, Stanislav T. Drangajov, Lyubka A. Doukovska and Vassil G. Nikov

Institute of Information and Communication Technologies, Bulgarian Academy of Sciences,

Acad. G. Bonchev Str., Bl. 2, 1113 Sofia, Bulgaria

sgurev@bas.bg, sdrangajov@gmail.com, doukovska@iit.bas.bg, vasilnikov@abv.bg

Keywords: Innovation Introduction, Markov Decision Processes.

Abstract: Innovations are introduced in several cycles, or steps which are of stochastic character. Successful

completion of each cycle results in the beginning of the next one. Initial stages are connected with expenses

of risk (venture) capital and the investments are returned in the final stages, usually with quite big profit. A

helpful approach for control of the innovation process is the use of Markov decision processes which have

proved to be an efficient tool for control of multi state stochastic processes. Those stages may be

summarizes as: 1 – prestart stage; 2 – start stage; 3 – initial expansion stage; 4 – quick expansion stage; 5 –

stage of reaching liquidity of venture investments; 6 – stage of project failure and its cancelling. The

transition from state to state may be controlled through control techniques of Markov Decision Processes so

that maximum profit is achieved in shortest time. The stages are conditional and some of them may be

united, e.g. 1 and 2, or 3 and 4.

1 INTRODUCTION

It is known that the innovations’ introduction

through the respective innovation cycles as a rule is

accompanied with considerable uncertainty and it is

of definitely expressed stochastic character. As the

successful completion of each innovation project

very often results in considerable profit this

stimulates the investment of considerable venture

(risk) means. A very important task arises for

preliminary careful considering and calculating the

stochastic character of the on going processes.

A multi step discrete Markov decision process

with mixed policies is proposed in the present work,

for the innovation risks interpretation. The

innovation process is accomplished, and probably

finished, as a rule, in a cycle of the following 6

stages: 1 – prestart and start stage; 2 – initial

expansion stage; 3 – quick expansion stage; 4 –

preparatory stage; 5 – stage of reaching liquidity of

the venture investment; 6 – stage of project failure

and its liquidation (Grossi, 1990, Cormican, 2004,

Bernsteina, 2006). Besides, the process at each stage

may be in different states where the decision maker

may undertake different actions which result in the

transition to a new state with respective profits and

losses. The first three stages are connected with

initial investments and respective losses. The

objective is they to be minimized. The last three

stages may generate profit and ensure full return of

the investments and considerable gains, but it may

also result in considerable loss if the innovation

product is a failure. It is to be clearly noticed that the

innovation introduction is a risky enterprise and not

each attempt is successful and winning.

It should be explicitly noticed that the innovation

process may only pass from a given stage to the next

one and can never return to a previous stage. No

other stages except the last ones – success or failure,

are absorbing - i.e. the innovation process may not

stay for ever in any of the initial stages or it fails.

The process may stay in a given stage for some time.

It is a responsibility of the decision maker to

undertake such control actions that the process

leaves as soon as possible the first three stages,

which generate expenses, with min losses and

reaches the final stage, which generate profit.

It is to be also noted, that depending on the

decision makers actions a stage may be omitted, e.g.

to pass directly from stage r to stage r+2. I.e. stages

so described are to some degree conditional but

nonetheless the process may develop in only forward

direction.

286

S. Sgurev V., T. Drangajov S., Doukovska L. and G. Nikov V.

Innovation Cycles Control through Markov Decision Processes.

DOI: 10.5220/0004776602860291

In Proceedings of the Third International Symposium on Business Modeling and Software Design (BMSD 2013), pages 286-291

ISBN: 978-989-8565-56-3

2 MARKOV DECISION MODEL

FOR THE INNOVATION

PROCESS

We consider an innovation process, which might be

at any of the six stages of implementation of a new

product. Of course this is for purposes of

methodology. In fact one should begin from the first

stage and reach the last one.

We introduce the following denotation:

1



N where the right hand part of the upper

equation is a reverse mapping of node j of the graph

from Figure 1.

P denotes the transition probability

of the innovation process to pass from state



to state

Nj  when using control

Kk  , where

K is the set of possible policies from state i. As

leaping across or going back to stages of the

innovation process is impossible, then:













;,0

;;;,0

otherwise

KkNjNi

.;;1;10

KkNiPP 





will be denoted the probability the

innovation process to fall in state

i , at using control

Kk  from this state.

An important feature of the innovation process is

that at transition from one stage to the next one in

the first three stages resources are spent, and the

transition from stage to the other in the last three

stages increasing profit is gained, i.e.:













};6,5,4{;0

;};3,2,1{;0

Kki

(1)

Then the maximum restoration of the venture funds

initially invested will be obtained at optimal choice

of control actions from each possible state of the

process, i.e.:

}/*{

NiKk 

This optimal control selection from the separate

states corresponds to maximization of the objective

function:







NjKk

xr max (2)

Different methods of linear and dynamic

programming (Mine, 1975) may be used for finding

the optimal solution of the objective function above

with the existing linear probability constraints.

The specific structure of the proposed here multi

step discrete Markov decision process corresponds

to a sufficient degree to the processes of realization

of innovations and provides possibilities for efficient

control of venture financing of innovations at their

realization.

3 NUMERICAL EXAMPLES

Next Figure 1 illustrates a Markov Decision Process

for control of the development of an innovation

through the 6 stages. The set of arcs U show the

possible transition from one stage (state) of the

innovation process to another one. The denotations

on the arcs of the decision graph should be decoded

as follows:

- the probability for transition from stage i to

stage j using control action k

in state i.

In the final two stages, 5 and 6, which are

ergodic there is one only possible action. At the

other stages we accept for illustration that there are

two possible control actions with the respective

transition probabilities.

Formally this means that the possible actions

} in each state



are defined in the

following way:











}6,5{\},2,1{

};6,5{},1{

The initial probability the process to be in state



is equal to:









.,0

;1,1

otherwise

The problem for finding optimal policies for the

Markov Decision Process shown in Figure 1 may be

reduced to the following linear programming

problem with objective function (2) and constraints:







x 1

, if i = 1 (3)

Innovation Cycles Control through Markov Decision Processes

287

)(

3,1



)(

2,1

2,3

)

3,5

)

3,4

)

2,4

)

5,5

Figure 1: Exemplary transition graph for an innovation process.







jji

KkNjKk

xPx 0

, if Nj  (4)

0

x , if NjKk



; (5)

If we take as a base the graph in Figure 1, then the

equations (2) and (3) to (5) will acquire the

following form:



xrxrxrxrxrxrxr

max

 xrxrxr

(6)

under the constraints:

 xx (7)

2,1

 xPxPxx (8)

3,2

3,1

2,1

 xPxPxPxPxx (9)

4,3

4,2

 xPxPxPxPxx

(10)

5,4

5,3

 xPxPxPxPx (11)

6,4

6,3

 xPxPxPxPx (12)

 xx

(13)

NjKkx

 ;;0

(14)

Let the profits (expenses) }{

r have the following

values:

.0;0

;12;10;8;7

;6;5;11;10







rrrr

The transition probability values are defined in the

following table:

Table 1: Transition probabilities.

STATE 1 STATE 2

Policy 1

Policy 2 Policy 1 Policy 2

2,1

0,8

2,1

0,9

3,2

0,7

3,2

0,8

3,1

0,2

3,1

0,1

4,2

0,3

4,2

0,2

STATE 3 STATE 4

Policy 1

Policy 2 Policy 1 Policy 2

4,3

0,6

4,3

0,5

5,4

0,8

5,4

0,7

5,3

0,3

5,3

0,3

6,4

0,2

6,4

0,3

6,3

0,1

6,3

0,2

STATE 5 STATE 6

Policy 1

- Policy 1 -

5,5

6,6

1 -

The respective transition probabilities are shown

above the arcs of the graph shown in Figure 1, when

using different possible policies. If only policy 1 or

respectively – only policy 2 is used, then the

transition probabilities tables will have the following

form:

Third International Symposium on Business Modeling and Software Design

288

Table 2: Transition probabilities for policy 1.

1 2 3 4 5 6

0 0,8 0,2 0 0 0

0 0 0,7 0,3 0 0

0 0 0 0,6 0,3 0,1

0 0 0 0 0,8 0,2

0 0 0 0 1 0

0 0 0 0 0 1

Table 3: Transition probabilities for policy 2.

1 2 3 4 5 6

0 0,9 0,1 0 0 0

0 0 0,8 0,2 0 0

0 0 0 0,5 0,3 0,2

0 0 0 0 0,7 0,3

0 0 0 0 1 0

0 0 0 0 0 1

Table 2 reflects transition probabilities for policy

1 and Table 3 – for policy 2 respectively.

At least two classes may be distinguished in this

matrix – one quasi block diagonal ergodic, and one

absorbing, corresponding to states 5 and 6. When the

process being controlled falls in one of the latter

states it remains there for ever.

At defining the optimal control through relations

(5) to (13) in the rows of both matrices #3 and #4

will be used, in general with different probabilities,

i.e. both pure and mixed policies will be used, as

seen in the solving of the particular problem.

The linear programming problem (6) to (14)

includes 10 variables

}{

x and 8 constraints. Its

solution results in the following optimal values of

the variables:

Table 4: Linear programming problem solution.

Variables Optimal values

0,8

0,76

0,696

0,7152

0,248

It is seen from the table above, that in the

example considered the optimal solution leads to

pure optimal policies of both types – 1 or 2. Next

table shows the optimal pure policy and the

respective optimal strategy.

Table 5: Optimal pure policy and respective strategy.

State



1 2 3 4 5 6

Optimal policy

Kk 

1 1 1 2 1 1

Optimal strategy

}/{

NiKk



{1, 1, 1, 2, 1, 1}

The following matrix of the achieved optimal

transition probabilities of the Markov process may

be drawn up on the base of the optimal policies.

The new (optimal) transition probabilities matrix

thus constructed also consists of a quasi diagonal

ergodic class and an absorbing class of two states. In

it one row (the fourth) of Table 4 is used,

corresponding to policies 2. The remaining rows are

from Table 3, corresponding to policies 1. In this

sense it is mixed by using both policies – 1 and 2.

Table 6: Optimal transition probabilities matrix.

1 2 3 4 5 6

0 0,8 0,2 0 0 0

0 0 0,7 0,3 0 0

0 0 0 0,6 0,3 0,1

0 0 0 0 0,7 0,3

0 0 0 0 1 0

0 0 0 0 0 1

The Markov process thus constructed will flow

step by step according to the transition probabilities

from Table 6. In the next Figure 2 its stochastic

parameters are shown for the purpose of clearness –

on the arcs the respective probabilities

}{

are

shown for falling from the initial state 1 into state



at passing through the previous state Ni



and in squares next to the vertices the final

probabilities

}/{





are shown for the

process to fall from the initial state 1 into the

corresponding state

.Nj



The final probabilities are also shown in the

following table:

Table 7: Final probabilities.

Final

prob.

1,1

1,2

1,3

1,4

1,5

1,6

Values 1 0,8 0,76 0,69 0,72 0,28

Innovation Cycles Control through Markov Decision Processes

289

0,8

0,56

3 0,456

0,2

0,7

0,69

0,228

0,076

0,7152

0,4872

0,2088

0,2848

Figure 2: Markov process stochastic parameters from state to state and in the states.

On the base of the optimal values of the variables

}{

x of Table 4 through (6) the maximum value of

the objective function is computed to be - 0,328.

The results obtained provide the possibility some

conclusions to be made:

I. When teaching one of the two final states – 5

or 6 the investment made is not paid off in full as

0,328 units remain to be paid off. If the process has

fallen in state 5, then the project is successful and in

may go on further to pay off the investments made

and to produce profit. In case that the process fell in

state 6, the project is a failure and it is almost sure it

will be cancelled. The amount of 0,328 units should

be registered as a loss in this case.

II. Even at optimal decisions for leading the

stochastic innovation process, the end the end of the

project cannot be certainly predicted – a

considerable probability (in the case considered

almost 0,3) exists it to end as a failure. This reflects

the real conditions in similar class of processes,

which are always of explicitly expressed stochastic

character.

III. The method proposed for innovation

processes control on the base of Markov decision

processes has another important advantage - optimal

policies and strategies may be recomputed on the

base of new and more refined data after each step

completed step of the process and the state it falls

into. This may result in better final result by

improving the strategy initially computed.

IV. It is possible to use more precise classes of

Markov decision, e.g. by using profit discount at

each step at each step, with constrained capacity or

through Markov flows or Markov games (Sgurev,

1993).

4 CONCLUSIONS

In conclusion the following general inferences may

be drawn:

1. The innovation processes are highly stochastic

and uncertain, which results to highly imprecise

prognostication of their completion. And this is

connected with a big risk at the venture financing of

such processes.

2. The method proposed in the present work for

using multistep Markov decision processes for

description of the innovation processes provides a

possibility their stochastic character to be recognized

to a considerable degree and an effective procedure

to be proposed for their behavior control.

ACKNOWLEDGEMENTS

The research work reported in the paper is partly

supported by the project AComIn “Advanced

Computing for Innovation”, grant 316087, funded

by the FP7 Capacity Programme (Research Potential

of Convergence Regions) and partially supported by

the European Social Fund and Republic of Bulgaria,

Operational Programme “Development of Human

Resources” 2007-2013, Grant № BG051PO001-

3.3.06-0048.

REFERENCES

Grossi G., Promoting Innovations in a Big Business, Long

Range Planning, vol. 23, #1, 1990.

Mine H., S. Osaki, Markovian Decision Processes, Amer.

Elsevier Publ. Comp., Inc, N.Y., 1975.

Third International Symposium on Business Modeling and Software Design

290

Cormican K., D. O’Sullivan, Auditing best practice for

effective product innovation management,

Technovation, Volume 24, Issue 10, pp. 761-851,

Elsevier, 2004.

Bernsteina B., P. J. Singhb, An integrated innovation

process model based on practices of Australian

biotechnology firms, Volume 26, Issues 5–6, pp. 561–

572, Elsevier, 2006.

Sgurev V., Markov Flows, Publishing House of the

Bulgarian Academy of Sciences, Sofia, 1993 (in

Russian).

Innovation Cycles Control through Markov Decision Processes

291