Figure 4: Two examples of updating PPN by AND rule in
PPN-search (the circle represents the AND node).
visit count of child i, and n
p
is the visit count of cur-
rent node p. C is a coefficient parameter, which has
to be tuned experimentally. Winands et al. also con-
sider other strategies to optimize the selection based
on UCT, such as progressive bias (PB). But in this
paper, to make it easy to follow, we only apply the
UCT strategy. To transform UCT and pure MCTS to a
solver, a node is assumed to have the game-theoretical
value ∞ or −∞ that corresponds to a proved win or not
win, respectively. In this paper, we consider all the
drawn games as proved to be not win games to make
the experimental results more easy to interpret. When
a child is a proven win, the node itself is a proven win,
and no selection has to take place. But when one or
more children are proven to be not a win, it is tempted
to discard them in the selection phase. In this paper, to
make it easy to compare, i.e., we do not consider the
proved win or the proved not win node in the play-out
step, because such technique can similarly be applied
into PPN-search and MCPN-search. Moreover, for
the final selection of the winning move at the root, of-
ten, it is the child with the highest visit count, or with
the highest value, or a combination of the two. In the
UCT solver or in the pure MCTS solver, the strategy
is to select the child of the root with maximum quan-
tity v +
A
√
n
, where A is a parameter (here, set to 1), v is
the node’s simulation value, and n is the node’s visit
count.
(2) Play-out: The play-out step begins when we
enter a position that is not a part of the tree yet.
Moves are selected in self-play until the end of the
game. This task might consist of playing plain ran-
dom moves.
(3) Expansion: Expansion is the strategic task that
decides whether nodes will be added to the tree. In
this paper, we expand one node for each iteration.
(4) Backpropagation: Backpropagation is the pro-
cedure that propagates the result of a simulated game
back from the leaf node, through the previously tra-
versed node, all the way up to the root. A usual strat-
egy of UCT or pure MCTS is taking the average of
the results of all simulated games made through this
node. For the UCT solver and the pure MCTS solver
(in addition to backpropagating the values 1,0,-1) the
search also propagates the game-theoretical values ∞
or −∞. The search assigns ∞ or −∞ to a won or
lost terminal position for the player to move in the
tree, respectively. Propagating the values back in the
tree is performed similar to negamax in the context of
MIN/MAX searching in such a way that we do not
need to distinguish between MIN and MAX nodes.
More precisely, for negamax, the value of a MIN node
is the negation of the value of a MAX node. Thus, the
player on move looks for a move that maximizes the
negation of the value resulting from the move: this
successor position must by definition have been val-
ued by the opponent.
Compared with PPN-search, the main difference
between the pure MCTS solver and PPN-search is the
backpropagation strategy. For a pure MCTS solver,
the backpropagation strategy of a node is taking the
average of the simulation results of its children. In
contrast, PPN-search follows the AND/OR probabil-
ity rules presented in Eqs. (1) and (2). Actually, both
backpropagation strategies have been discussed in an
early paper of MCTS (Coulom, 2006) that points out
the weakness of AND/OR probability backpropaga-
tion rules for MCTS. Compared with taking the aver-
age, it is noted that they have to assume some degree
of independence between probability distributions for
probability backpropagation rules. This assumption
of independence is wrong in the case of Monte-Carlo
evaluation because the move with the highest value
is more likely to be overestimated than other moves.
Moreover, a refutation of a move is likely to refute si-
multaneously other moves of a node. Such statement
(Coulom, 2006) is true for MCTS when it is used to
find an approximate best move in a game AI, but is
not appropriate when MCTS is used to solve a game
or a game position. There are two reasons: (1) To
solve a game or a game position, the search algorithm
has to go deeply until to the terminal nodes to com-
pletely prove the game-theoretic value. So it is not
necessary for a search algorithm to avoid overestimat-
ing the move with the highest value. In contrast, what
really matters for a search algorithm is the speed to
approach the terminal nodes. (2) To solve a game or a
game position, we need to search on an AND/OR tree
to find the solution. Therefore, the AND/OR prob-
Probability based Proof Number Search
665