have to exist even in zero-sum games (see (Wichardt,
2008) for a simple example) and standard algo-
rithms (e.g., a Counterfactual Regret Minimization
(CFR) (Zinkevich et al., 2008)) can converge to in-
correct strategies (see Example 1). Therefore, we
focus on finding a strategy that guarantees the best
possible expected outcome for a player – a maxmin
strategy. However, computing a maxmin strategy is
NP-hard and such strategies may require irrational
numbers even when the input uses only rational num-
bers (Koller and Megiddo, 1992).
Existing works avoid these negative results by cre-
ating very specific abstracted games so that perfect
recall algorithms are still applicable. One example
is a subset of imperfect recall games called (skewed)
well-formed games, motivated by the poker domain,
in which the standard perfect-recall algorithms (e.g.,
CFR) are still guaranteed to find an approximate
Nash behavioral strategy (Lanctot et al., 2012; Kroer
and Sandholm, 2016). The restrictions on games
to form (skewed) well-formed games are, however,
rather strict and can prevent us from creating suffi-
ciently small abstracted games. To fully explore the
possibilities of exploiting the concept of abstractions
and/or other compactly represented dynamic games
(e.g., Multi-Agent Influence Diagrams (Koller and
Milch, 2003)), a new algorithm for solving imperfect
recall games is required.
1.1 Our Contribution
We advance the state of the art and provide the
first approximate algorithm for computing maxmin
strategies in imperfect recall games (since maxmin
strategies might require irrational numbers (Koller
and Megiddo, 1992), finding exact maxmin has fun-
damental difficulties). We assume imperfect recall
games with no absentmindedness, which means that
each decision point in the game can be visited at
most once during the course of the game and it is
arguably a natural assumption in finite games (see,
e.g., (Piccione and Rubinstein, 1997) for a detailed
discussion). The main goal of our approach is to
find behavioral strategies that maximize the expected
outcome of player 1 against an opponent that min-
imizes the outcome. We base our formulation on
the sequence-form linear program for perfect recall
games (Koller et al., 1996; von Stengel, 1996) and
we extend it with bilinear constraints necessary for
the correct representation of strategies of player 1 in
imperfect recall games. We approximate the bilinear
terms using recent Multiparametric Disaggregation
Technique (MDT) (Kolodziej et al., 2013) and pro-
vide a mixed-integer linear program (MILP) for ap-
proximating maxmin strategies. Finally, we consider
a linear relaxation of the MILP and propose a branch-
and-bound algorithm that (1) repeatedly solves this
linear relaxation and (2) tightens the constraints that
approximate bilinear terms as well as relaxed binary
variables from the MILP. We show that the branch-
and-bound algorithm ends after exponentially many
steps while guaranteeing the desired precision.
Our algorithm approximates maxmin strategies
for player 1 having generic imperfect recall without
absentmindedness and we give two variants of the al-
gorithm depending on the type of imperfect recall of
the opponent. If the opponent, player 2, has either a
perfect recall or so-called A-loss recall (Kaneko and
Kline, 1995; Kline, 2002), the linear program solved
by the branch-and-bound algorithm has a polynomial
size in the size of the game. If player 2 has a generic
imperfect recall without absentmindedness, the linear
program solved by the branch-and-bound algorithm
can be exponentially large.
We provide a short experimental evaluation to
demonstrate that our algorithm can solve games far
beyond the size of toy problems. Randomly gener-
ated imperfect recall games with up to 5 · 10
3
states
can be typically solved within few minutes.
All the technical proofs can be found in the ap-
pendix or in the full version of this paper.
2 TECHNICAL PRELIMINARIES
Before describing our algorithm we define extensive-
form games, different types of recall, and describe the
approximation technique for the bilinear terms.
A two-player extensive-form game (EFG) is a tu-
ple G = (N ,H ,Z,A, u, C , I). N = {1, 2} is a set of
players, by i we refer to one of the players, and by
−i to his opponent. H denotes a finite set of histo-
ries of actions taken by all players and chance from
the root of the game. Each history corresponds to a
node in the game tree; hence, we use terms history
and node interchangeably. We say that h is a prefix
of h
0
(h v h
0
) if h lies on a path from the root of the
game tree to h
0
. Z ⊆ H is the set of terminal states of
the game. A denotes the set of all actions. An ordered
list of all actions of player i from root to h is referred
to as a sequence, σ
i
= seq
i
(h), Σ
i
is a set of all se-
quences of i. For each z ∈ Z we define a utility func-
tion u
i
: Z → R for each player i (u
i
(z) = −u
−i
(z) in
zero-sum games). The chance player selects actions
based on a fixed probability distribution known to all
players. Function C : H → [0,1] is the probability of
reaching h due to chance.
Imperfect observation of player i is modeled via
ICAART 2017 - 9th International Conference on Agents and Artificial Intelligence
64