which is a reward, to each worker. These approaches
are similar to our new proposed method that improves
its ability by using a reinforcement learning manner.
However, these models do not provide feedback about
the integrated solution to each worker. In contrast, our
proposed new model provides feedback as to the in-
tegrated output to each worker to guide how to revise
his/her answer.
This study improves the previous study (Ogiso
et al., 2016) by replacing each learning machine
with a supervised actor-critic method (Rosenstein and
Barto, 2012). This means that each learning machine
also has the ability to explore new solutions by itself
without the help of each worker.
By using this architecture, each worker do not
need to manage the system full time because the
new model realizes semi-automatic learning. In other
words, the new method also explores better solutions
without our operations.
The learning machine suitable in this situation is a
light weight method. We found that supervised actor-
critic model using kernel method for learning is a
very light weight learning machine which can handle
one pass learning. Using this algorithm with kernel
method makes the calculations simpler for a computer
to calculate thus increasing efficiency.
Supervised Actor-Critic Model is a state of the art
algorithm which runs very light for a reinforcement
learning machine. Reinforcement Learning meth-
ods are often applied to problems involving sequen-
tial dynamics and optimization of a scalar perfor-
mance objective, with online exploration of the ef-
fects of actions. Supervised learning methods, on
the other hand, are frequently used for problems in-
volving static input-output mappings and minimiza-
tion of a vector error signal, with no explicit depen-
dence on how training examples are gathered. The
key feature distinguishing Reinforcement Learning
and supervised learning is whether training informa-
tion from the environment serves as an evaluation sig-
nal or as an error signal. In this model, both kinds of
feedback are available.
Since application of this environment for real
world problems would take huge amount of time it
was tested on a T-Rex game similar to http://www.
trex-game.skipser.com/ which was developed specifi-
cally for this project.
In the developed game, the height and width of a
part of cactus were modified to make the game player
cannot solve them easily without help. Moreover, one
another jumping option was added. But these spec-
ifications were not announced to the players before-
hand. The simulations are explained in section 4.
2 COLLABORATIVE BAGGING
SYSTEM USING SUPERVISED
ACTOR CRITIC MODEL
Bagging is an old concept of creating a super neural
network combining the intelligence of multiple neu-
ral networks working on the same problem but differ-
ent learning and testing scenarios. The idea was that
this could save a lot of computational power and time
and could also run on simpler hardware such as smart
phones or raspberry-pis.
A rough sketch of the ColBagging system is il-
lustrated in Fig 1. The system repeats two phases
alternately. The first phase is the training phase,
where each worker tries to solve a problem to be
solved. Their solutions are emitted by the correspond-
ing online incremental learning machine (MGRNN
(Tomandl and Schober, 2001)). At the same time, the
performance estimator monitors the solutions from all
workers and estimates their quality. This estimation
is usually done by the masters or by a pre-determined
evaluation function. The performance estimator out-
puts the results as the weights for the all workers.
This idea was proposed in (Ogiso et al., 2016). It
is an improved version of the bagging techniques used
before but rather a sophisticated method which calcu-
lates weighted averages of the weights of the input
neural networks which results in more accurate super
neural networks.
In this study, the previous system was improved
by introducing one variation of the reinforcement
learning method: supervised-actor critic for the learn-
ing machine (see Fig 2). By introducing super-
vised actor-critic method, the solution candidate of
each worker will be refined automatically by the ex-
ploration done by the reinforcement learning. This
means that each worker just need to help the learn-
ing machine by teaching action partly. It will not only
reduce the work of each worker but also improve the
learning speed of each learning machine.
To explain the scheme, the next section explains
the supervised actor-critic method used in this system.
3 SUPERVISED ACTOR CRITIC
MODEL
Supervised Actor Critic Model (Rosenstein and
Barto, 2012) is a variation of reinforcement learning
algorithm that introduces human input as the super-
vised signal. It is well known that the reinforcement
learning algorithm can be executed effectively by in-
troducing kernel machines, which add new kernels by
Collaborative Learning of Human and Computer: Supervised Actor-Critic based Collaboration Scheme
795