ample, if an agent was seated in position 5 at a table
for the first tournament, it will now be seated at posi-
tion 6. Likewise for all of the other agents, with the
agent formerly in position 9 now in position 0. Again,
the agents play to elimination. Since the deck was re-
set, the cards will be exactly the same as they were last
time; the only difference will be the betting strategies
of the agents. This method continues until each agent
has sat at each position at the table, and thus had a
chance with each possible set of hole cards.
After the completion of one revolution, the av-
erage ranking of the agents is calculated. A good
agent should be able to play well with good cards, and
not lose too much money with poor cards. It should
be noted that agents kept no memory of which cards
were dealt to which seats, nor which cards would be
coming as community cards. Each tournament was
played without knowledge of previous tournaments
being provided to the agents. After the completion
of a revolution and the noting of the rankings, the
agents would then begin a new revolution. For this
revolution, the deck is shuffled, so that new cards will
be seen. 100,000 such revolutions are played, and
the agents with the lowest average rankings are deter-
mined to be the best. In order for an agent to receive
a low ranking across all revolutions, it had to not only
take advantage of good cards and survive poor ones,
but it also had to take advantage of situations that its
opponents missed.
5 EXPERIMENTAL RESULTS
In order to evaluate agents, a number of benchmarks
were used. In (Beattie et al., 2007; Billings et al.,
2003; Booker, 2004; Schauenberg, 2006), several
static agents, which always play the same, regardless
of the situation, are used as benchmarks. These agents
are admittedly weak players, but are supplemented
by the best agents developed in (Beattie et al., 2007),
and can be used to evaluate the quality of our evolved
agents relative to each other. The benchmark agents
are as follow: Folder, Caller, and Raiser, that always
fold, call and raise, respectively, at every decision;
Random, that always makes random decisions; Cal-
lOrRaise, that calls and raises with equal likelihood;
OldBest, OldScratch, OldStart, that were developed
in (Beattie et al., 2007). OldScratch was evolved with
no head start to the evolution, OldBest was evolved
with a head start, and OldStart was given a head start,
but no evolution.
Baseline agents were evolved from a population
of 1000 agents, for 500 generations, playing 500
tournaments per generation. After each generation,
agents were ranked according to their average. Af-
ter each generation, the 100 best agents were selected
for reproduction, and the rest of the population was
filled with their offspring. LargeHOF agents were
also evolved from a population of 1000 agents, but
included a hall of fame of size 1000. SmallHOF
agents were evolved from a smaller population of 500
agents, with a hall of fame of size 500, and only
50 agents were selected for reproduction each gen-
eration. HOF2Pop agents were evolved using two
co-evolutionary populations of 500 agents each, each
with a hall of fame of 500 agents.
After 500 generations, the best agent from the
500th generation played 100,000 duplicate table tour-
naments, with each of the benchmark agents also sit-
ting at the tables. For each duplicate table tourna-
ment, each agent played in each possible seat at the
table, with the same seeds for the random numbers,
so that the skill of the agents, and not just the luck
of the cards, could be evaluated. After each duplicate
table tournament, the ranking of each agent at the ta-
ble was gathered, and the average was calculated after
the completion of all 100,000 duplicate table tourna-
ments. There were nine agents at the duplicate ta-
bles. The best possible rank was 1, corresponding
to an agent that wins every tournament, regardless of
cards or opponents. The worst possible rank was 9,
corresponding to an agent that was eliminated from
every tournament in last place. The results of our best
agents are shown in figure 7.
Figure 7 represents the results of the duplicate
table tournaments. In Figure 7, the Control agent
represtents the agent that is being evaluated, while
the other vertical bars represent the rankings of the
other agents in the evaluation of the control agent. For
example, the first bar of Random represents how the
Random agent performed against the Baseline agent,
the second bar represents how the Random agent per-
formed against the SmallHall agent, and so on.
The best results were obtained by the agents
evolved with a large hall of fame, but no co-evolution.
These agents obtained an average rank of 2.85 out of
9. Co-evolution seemed to have little effect upon the
agents when a hall of fame was used, and the agents
in the two co-evolutionary populations received aver-
age ranks of 2.92 and 2.93. The difference between
the best agents and the second best seems to be quite
small. A two-tailed paired t-test was conducted on
the null hypothesis thatthe ranks of any two distinct
agents were equal. In all cases, and for all exper-
iments, the null hypothesis was rejected with 99%
confidence. Although the difference is small, enough
hands were played that even small differences equate
to a difference in skill.
ICEC 2010 - International Conference on Evolutionary Computation
30