Regarding the room for improvement in human-
likeness when using small amounts of data, Maia’s
method requires a large amount of data and cannot be
used in many games. Generally, supervised learning
does not work well when there is little training data.
In such cases, the move-matching accuracy and the
likelihood may be improved by search (Jacob et al.,
2022), or by combining human-like models with dif-
ferent move selection mechanisms such as the policy
of strong game AI like AlphaZero (Silver et al., 2018).
In this paper, we first confirm whether supervised
learning like Maia is also effective in Shogi. We then
propose methods that combine multiple policies to
further improve human-likeness. To our knowledge,
we are the first to combine multiple policies to imi-
tate human policies. The combined policies include
those from Maia-like models and AlphaZero-like AI,
where the latter has been shown to be less human-
like (McIlroy-Young et al., 2020). Interestingly, our
results show that the combination improves move-
matching accuracy and likelihood.
2 RELATED RESEARCH
To create strong Go programs, Coulom (2007) pro-
posed a new Bayesian technique for supervised learn-
ing for training a model to predict the probability dis-
tribution of human players’ moves. He used strong
human players’ games to train the prediction model
and then combined the model into a Go program
based on Monte-Carlo tree search (MCTS). The Go
program’s strength was greatly improved. Similarly,
some other researchers strengthened their game AI
by incorporating move prediction models (Tsuruoka
et al., 2002) or evaluation functions trained using hu-
man players’ games (Hoki and Kaneko, 2014).
Obata et al. (2010) proposed a consultation al-
gorithm that selects a move from moves of multiple
Shogi AI and succeeded in making Shogi AI signifi-
cantly stronger than each AI alone. A possible reason
their method worked well was that the majority vote
can compensate for each other’s shortcomings.
AlphaZero (Silver et al., 2018) is a reinforcement
model trained using self-play games instead of hu-
man games. Silver et al. (2018) used a policy net-
work to predict probabilities of moves from positions
and a value network to predict the win rates of posi-
tions. The training data of the networks came from
self-play games played by a variant of MCTS that
incorporates the networks. AlphaZero beated world
champion game AI in chess, Go, and Shogi.
Maia (McIlroy-Young et al., 2020) is known for
one of the most effective chess AI in predicting hu-
man moves. This chess AI used deep neural networks
for supervised learning. Human players were divided
into 9 groups according to their ratings. Each neu-
ral network corresponded to a rating range and was
trained using 12 million games from the players in
the rating range. Their results showed that moves in a
rating range was best predicted by the neural network
of the corresponding rating range, where the move-
matching accuracy was about 50%. McIlroy-Young
et al. (2020) claimed that using neural networks alone
obtained higher move-matching accuracy than com-
bining the neural networks into tree search as Alp-
haZero did. However, Jacob et al. (2022) showed
that even with the same training model as Maia, the
model with search was stronger and had higher move-
matching moves if parameter was adjusted properly.
With respect to human-likeness, Togelius et al.
(2013) introduced the concept of ”believability”. Be-
lievability refers to the ability to make a character or
bot seem as if it were controlled by a human being.
Various approaches were then proposed to achieve hu-
manlike characteristics (Fujii et al., 2013) (Hingston,
2010).
As another approach to create human-like AI,
Kinebuchi and Ito (2015) proposed to improve move-
matching accuracy of Shogi AI by considering the
flow of preceding moves. They also targeted play-
ers in a wide range of skill levels. They represented
the flow by combining a search-based value function
(Hoki and Kaneko, 2014) using a transition probabil-
ity function (Tsuruoka et al., 2002). Linear combi-
nation was used and the weight was trained with hu-
man moves. Their proposed method predicted human
moves significantly better than each function alone.
3 PROPOSED METHOD
The overview of this study is as follows. First, in
the case of Shogi, we confirm whether supervised
learning like Maia can well predict human moves
in two metrics, move-matching accuracy and likeli-
hood. We also compare AlphaZero-like policy with
supervised learning policy to identify the strengths
and weaknesses of each policy. We then propose two
approaches to improve move-matching accuracy and
likelihood by combining the supervised learning pol-
icy and the AlphaZero-like policy.
We follow Maia’s method and use neural networks
for supervised learning. Consider a neural network
used for multiclass classification with K classes, and
let x be the input and u
k
(−∞ < u
k
< ∞) be one of the
output of the neural network. The probability p(C
k
|x)
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
932