“goodness” of a candidate split at a node. This
measure is calculated using the neighbor nodes
and conditional probability on the records con-
ditionated to these nodes. The optimal split is
whichever split that maximizes this measure over
all possible splits at the selected node. Recur-
sively, CART splits the records in the training data
set into subset of records with similar values for
the target attribute (Larose, 2005).
Finally, the clustering algorithms used to classify
the real observed behaviour in a set of gameplays (we
call them the team behaviour) are:
• Expectation-Maximitation (Robert Hogg and
Craig, 2005), or simply EM, is a blind clustering
algorithm that tries to classify and create the clus-
ters for the data. This algorithm is useful when
data is hidden or missed. Initially, it takes a like-
lihood and tries to maximize it. The process con-
sists on apply the two following steps iteratively
until it finishes:
– Expectation step: Calculate the expected value
of the log-likelihood function and redefine it.
– Maximization step: Find the parameter that
maximizes the likelihood function.
• K-means (MacKay, 2003) is another popular and
well knownalgorithm. It is a straightforward clus-
tering guided method (usually by a heuristic or di-
rectly by a human) to try to classify data in a fixed
number of clusters. The number of clusters can
be predefined or it can be estimated using heuris-
tics or other kind of algorithms, like genetic algo-
rithms (Gonzalez-Pardo et al., 2010). This algo-
rithm runs in 5 steps (Larose, 2005):
– Define (fix) the number of clusters (k).
– Assign k records to be the initial cluster center
location.
– For each record, find the nearest cluster center.
– For each of the k clusters, find the cluster cen-
troid, and update the location of each cluster
center to the new value of the centroid.
– Repeat the two last steps until a convergence
criteria or a termination condition is reached.
3 EXPERIMENTAL SETUP
The approaches described in this paper has been
tested using a dataset generated using Soccerbots.
Soccerbots
1
simulates the dynamics and dimensions
1
Soccerbots: http://www-2.cs.cmu.edu/∼trb/TeamBots/
Domains/SoccerBots/
of a regulation RoboCup small size robot league
game. Two teams of five robots compete in a soc-
cer field by pushing and kicking a ball into the oppo-
nent’s goal. This simulator has been employed as a
sandbox in several works related to the application of
different machine learning techniques in multiagent
systems (Aler et al., 2009; Leng et al., 2010).
The data has been extracted using SBTourna-
ment
2
, a tool for generating Soccerbots tournaments
and trace generation of robot behaviour. SBTour-
nament extracts periodically the position, direction
and velocity for every robot and the ball during a
match. Additionally, the kick actions and goals are
asynchronously extracted. SBTournament uses these
traces to generate CSV files about every robot, team
and matches played. Finally, the dataset employed
in our evaluation has been enhanced computing some
statistical data extracted from the CSV files, described
below.
The dataset contains information about ∼ 15000
matches played by 74 different teams implemented by
students of Computer Science at Complutense Uni-
versity of Madrid during different academic courses.
There are three different types of information con-
tained in the dataset:
Information about each Robot. The dataset stores
statistics about every robot that has participated
in a match. For every match and every robot, we
have the number of goals scored and kicks per-
formed, the time the robot spent in its own field
and opponent field and in its own and in the op-
ponent goalkeeper area, the time the robot spent
in “ball possession” and the average distance be-
tween the robotand the ball, the center of the field,
its own goal and the opponent goal.
Information about each Team during a Match.
The information about each robot is compiled
for generating the global statistics for each team
during the match. These team statistics contain
the aggregation and the average values from
every feature extracted from the robots that make
up the team, such as the number of goals scored
and received by the team, the sum up of the team
robot kicks, the average time that the team robots
spent in their own field and in the opponent field
or the total time that the team robots spent in “ball
possession”, among others.
Global Information about every Team. Using the
information about a team during all the played
matches we generate a set of descriptive statistics
that summarizes the global team behaviour. These
2
SBTournament: http://gaia.fdi.ucm.es/projects/
soccerBots/SBTournament 1.2.zip
PREDICTING PERFORMANCE IN TEAM GAMES - The Automatic Coach
403