shown by QCD, and the project principle sets which
targets of QCD is improved. So, the rule in the project
principle is a set of a status to be improved and a
necessary action to improve the target of QCD called
“learning target” for projects. A learning target is
defined which viewpoints the learner should acquire.
The decision making by project managers consists of
two stages of operations: project managers check the
progress and take actions based on the progress. From
this viewpoint, there are two kinds of project princi-
ples: “check progress” and “take action”.
This simulator can give feedback to the learner
whether the learner’s operation follows the project
principle or not. But the learner’s operation can not
be compared directly with a project principle because
situations and actions for them vary with projects. In
order to compare the learner’s operation, it is neces-
sary to generate the reference operation which is an
operation following the project principle.
So, the project principle needs to generate the ref-
erence operation. But, to generate the project prin-
ciple takes a lot of time because trainers simulate
and analyze many operations to get a good result for
each project in order to generate the project principle
which must be useful for various projects. This paper
addresses how to generate such project principle and
reference operation.
3 GENERATION METHOD OF
REFERENCE OPERATION
USING REINFORCEMENT
LEARNING
3.1 Outline of Automatic Generation
Method
We propose a generation method of reference oper-
ation using reinforcement learning and decision tree.
Figure 3 shows the outline of the generation method.
In order to generate the reference operation based
on the project principle, this method uses optimal op-
erations which lead to the best results corresponding
to a learning target set in advance. The optimal oper-
ations can be considered to include correct judgments
and actions to improve the status of the project. Be-
cause QCD should be kept smaller by project man-
agement, an optimal operation is defined as the oper-
ation to minimize the objective function f (operation)
defined as follows:
f(operation) =
∑
i∈LearningTarget
f
i
(operation) (1)
Trainer
Reward
output
Set learning
target
Optimal operation
generating system
Project principle
generating system
Decision tree
learning
Output as
rule format
Reinforcement learning
history
Project
principle
execution
program
Project
model
Project model,
operation input
Project simulator
Optimal operations
Optimal operations
Optimal operation
Project
principle
Reference operation
Project
model
Input
(e.g.)
1. IF on critical path, delay
1. THEN do “overtime directive”
2. IF lack of skill of person, not “progress
2. check” more than 3 days
2. THEN do “progress check”
Simulation
program of
operations
Figure 3: Outline of proposed method of the reference op-
eration.
where LearningTarget includes Q, C or D, operation
is a learner’s operation, and f
i
(operation) is i out-
putted by a simulator after executing operation. For
example, when a learning target is “Q and D”, the
objective function f(operation) = f
Q
(operation) +
f
D
(operation).
First, this method generates results for operations
by using the simulation program of operations to sim-
ulate various types of operations automatically. Sec-
ond, this method generates the optimal operations
from the results by reinforcement learning that can
calculate faster than brute force search. Third, this
method generates the project principle as a group
of rules based on optimal operations by using deci-
sion tree learning. Finally, the reference operation is
generated by the project principle execution program
which takes action following the project principle.
3.2 Optimal Operation Generating
System
It is necessary for generating a project principle to
use optimal operations for various types of projects.
In order to generate an optimal operation, it is nec-
essary to minimize the objective function as formula
(1). But it takes a lot of time to generate the optimal
operation using brute force search. So, the proposed
method searches an operation to improve the status of
the project. This search uses reinforcement learning
which determines future actions so as to maximize re-
ward based on past actions. The reinforcement learn-
ing, project state s
t
(t as time) is a set of QCD given
by simulator, action a is one of selectable actions in
simulator, and reward r(s
t
, a) is defined as formula (2)
AGenerationMethodofReferenceOperationusingReinforcementLearningonProjectManagerSkill-upSimulator
17