The region delimiters are the following 2 for left team,
and the opposite of right team.
Figure 2: Considered Field Regions.
These regions are built giving the points of a rectan-
gle that defines each region in the field. The results
depend on the team analyzed, the side that team starts
to play and if you are in the first half or the second
half of the game. All these statistics are saved in a
final game file that will be then analyzed.
3.2 Pass Information
A pass may be considered when a player kicks the
ball with enough force in a specific direction, aiming
to a teammate that is supposed to receive the ball.
Regarding passes, the statistics calculated are the
total of passes, the percentage of passes in each part of
the game, the percentage of passes of each team, the
same with each player (passes executed and received),
the percentage of successful passes in each part of the
game of each team. Pass statistical information is also
calculated by each field region.
3.3 Ball Recoveries and Losses
A ball recovery occurs when a player from the team
has the ball possession in a given cycle (e.g. may kick
the ball) and he losses the ball possession for a mem-
ber of the opponent team (a player from the opponent
team may kick the ball). This statistic is the opposite
of a ball recovery. When a team is able to do a ball
recovery, the other team looses the ball possession.
Regarding this statistic, the system calculates the
total of ball recoveries and losses, the percentage of
recoveries and losses in each part of the game, the
percentage of recoveries and losses of each team, the
same with each part of the game. Statistics are also
calculated by each field region.
3.4 Game Occurrences
This game parameter summarizes the occurrences
that happened in the game. These occurrences can
be free kicks (right or left), goals (right or left), off-
sides, corners, etc. In this situation we calculate the
total occurrences, the percentages of each occurrence
and the percentage of occurrences in each region.
3.5 Goal Scoring and Opportunities
A goal scoring opportunity occurs when an attacking
player has the ball possession in a dangerous area.
Dangerous area is a subjective concept that we esti-
mate as a region sufficiently near the opponents goal
that a shoot may be successful. In this statistic cal-
culation we estimated the total of goal opportunities,
the percentage of opportunities in each part of the
game, the percentage of opportunities of each team
and player and the same with the halfs of the game.
Besides this, it is also calculated the percentage that
each region is visited when there is an opportunity.
4 DATA WAREHOUSE
In order to get information about statistics, the final
game file was split into several different files, one for
each type of statistic. In each file the data was sep-
arated by tabs considering the future storage a data
warehouse. Although it could be a good choice, XML
is not being considered as a storage format because it
implies defining a schema and then validating each
file with this schema and, in the process of storage,
its use would decrease the performance of all process
comparing with the tabs separation method.
The reason for using a data warehouse instead of
a conventional database is mainly because, in this
project, the goal was to collect a higher number of
files at once and after that do some interrogations
in the data warehouse. The performance of the data
warehouse in terms of time delay is also critical so
the data warehouse seems the best option in this case.
In order to represent most of the interesting soccer re-
lated statistics the following data warehouse was been
defined.
In this structure three distinct dimensions were de-
fined: Participation, Game Summary and Classifica-
tion. The Participation is the dimension that had all
information about the participants of the game like
referee, trainer and player, filtered by data and league.
The Game Summary dimension is referred to all game
statistics calculated per team like the number of at-
tacks, number of assists, ball recovery among others.
Finally, the Classification dimension has information
about the final results that teams achieve in a spe-
cific competition. In this project only the game sum-
mary dimension have been used mainly to simplify
USING A DATAWAREHOUSE TO EXTRACT KNOWLEDGE FROM ROBOCUP TEAMS
513