Whenever a goal is scored or a save happens, the
targeted goal area is recorded from the attacker’s
point of view. I.e. from a goalkeeper’s perspective the
zones are mirrored. Furthermore, we do not actually
record the goal area where the ball passes the goal line
but rather the area where the ball passes the
goalkeeper or where the goalkeeper saves the ball.
Hence, the collected information is intended to
answer questions like “Where are the strong/weak
areas of a goalkeeper?”, “Is there an area which
should be better covered by the blocking players to
help the goalkeeper?”, or “Has an attacker a certain
“sweet area” when attempting?”.
The recording of the goal areas in case of saves by
goalkeepers has been added rather recently. Thus, we
have only data of 92 HBL games, and 7105 attempts
at this time. Most attempts are targeted at the bottom
section of the goal (approximately 52%). Less than a
quarter of the attempts (22%) are targeted at the top
section, even though the summarized attempt
effectiveness numbers are very similar (77% at the
bottom and 77% at the top respectively). Only about
one fifth of the attempts are targeted at the middle
section, which shows a significantly lower attempt
effectiveness of only 44%. The goalkeeper’s
effectiveness can simply be calculated by subtracting
the attempt effectiveness from 100%.
Again, the numbers have been compared to the
data collected in the 4
th
league. We were able to use
23 matches) including 2,015 attempts. The
distribution of the attempts across goal areas is almost
the same as in case of the HBL (with a maximum
difference of 2% in each section). However, in case
of the lower league we have a lower attempt
effectiveness of 74% in the top section, a significantly
higher effectiveness of 59% in the middle section, and
74% in the bottom section.
4.2.4 Significance of the Sequence of Goals
While considering the question whether the outcome
of a match can be predicted significantly before the
end of a match based on the team’s performance, we
looked at the most prominent indicator: the number
of goals. Several hypotheses have been investigated
and one showed a surprising result: “The team that
scores the nth goal first, will not lose the match”.
With the collected data from the Scouting App,
the complete sequence of match events is available
and can be analysed. Thus, we compared the
“predictability” of different numbers of goals (the n)
ranging from 10 up to 28 based on 98 matches of the
first league.
Below the investigated range, the accuracy
decreases. Above the goal 24 the accuracy does also
decrease because the number of matches in which less
than the required number of goals are scored,
increases (we have in average of approximately 25
goals per match and team in the set of observed
matches).
Two results are particularly interesting. There is a
peak (local maximum) around goal 16 (92.9%) after
which the accuracy decreases. Furthermore, there is a
second peak (the global maximum) around goal 20/21
(95.9%).
We have verified these patterns with the data from
52 matches of the lower leagues. We have found the
same two peaks, the first one at goal 16 (85.4%) and
the second one at goal 26 (100%). However, the
average number of goals per match is significantly
higher (approx. 29) compared to the matches of the
first league.
Since the publicly available HBL data has
sufficient quality regarding the sequence of goals, we
also verified the “two peak finding” using long-term
data (almost 4 seasons) of 1190 first league matches
and of 1559 second league matches. The two peaks
do not exist in long-term data. However, at goal 16
we find an accuracy of 86.6% in the first league and
83.2% in the second league. The maximum accuracy
is at goal 21 in the first league (91.3%) and 22 in the
second league (88.9%). If we just look at the 306
matches of the last season of the first league, we find
the two peaks at 17 goals (89.5%) and at goal 20
(91.5%). Based on 379 second league matches of the
last season we found the first peak at goal 16 (85%)
and the maximum at goal 21 (91%).
4.2.5 Advanced Insights using Data Mining
Techniques
The prediction of the winner of matches is a typical
classification task (Provost & Fawcett, 2013). The
business questions “behind” the classification is:
“Which (minimal) combination of indicators that we
measure can be used to predict the outcome of a
match”. Since we are measuring the indicators while
the match is played, and we want to have an
indication during the match whether we need to
intervene, it is useless to train models using the
absolute numbers of finished matches. We rather
need to use relative indicators as introduced in section
2 that can be measured throughout the match.
Data Mining and its methods are completely new
to team handball coaches. Thus, it is very important
that the results can be explained in terms which can
be understood by the coaches. That is why we started