Time Series and Deep Learning Approaches for Predicting English

Premier League Match Outcomes

Weronika Wiechno

, Bartosz Bartosik

2 a

and Piotr Duch

2 b

Institute of Information Technology, Lodz University of Technology, al. Politechniki 8, Lodz, Poland

Institute of Applied Computer Science, Lodz University of Technology, Bohdana Stefanowskiego 18, Lodz, Poland

Keywords:

Football Match Prediction, Long Short-Term Memory, Deep Neural Network, Machine Learning, Recurrent

Neural Network, Siamese Neural Network, Multilayer Perceptron.

Abstract:

The continuous development of tools used in football match analysis has resulted in a greater availability

of game statistics, providing analysts, coaches, and researchers with with more detailed data regarding the

matches played. This results in the need for more advanced algorithms for effectively processing and inter-

preting the available information. In the paper, the modiﬁed architecture of the Siamese Neural Networks

is presented. The time series approach is incorporated to capture temporal dynamics in teams’ performance

throughout analysed matches. The algorithm was compared with classiﬁers and deep neural networks ap-

proaches commonly used for match outcome prediction in the literature. All methods were trained and tested

on two prepared datasets with the same division into train and test sets. Finally, the proposed architecture

outperforms others by reaching higher overall accuracy in match prediction outcomes.

1 INTRODUCTION

Football has been a popular sport in many a civiliza-

tion since time immemorial. Throughout centuries, it

has occupied people’s minds to a great extent, evolv-

ing into a global phenomenon. Nowadays, with over

5 billion fans worldwide, it is hard to discredit its in-

ﬂuence. Throughout the football season, changes in

team performance are closely associated to ﬂuctua-

tions in public opinion and sentiment, which in turn

inﬂuence the growth of the sports prediction industry

and discussions surrounding match outcomes.

However, this money and popularity inﬂux is not

distributed equally across all football leagues. Among

them the English Premier League stands as the excep-

tional case. According to Ampere’s Sports Consumer

survey (Q4 2023) (Daniel Harraghy, 2024), it is the

third most popular competition after the Uefa Cham-

pions League and the Fifa World Cup. Nevertheless

it is the ﬁrst when it comes to the proportion of me-

dia rights revenue generated by international broad-

cast deals and sponsorships.

The continuous development of tools used in foot-

ball match analysis has resulted in a greater avail-

https://orcid.org/0009-0002-9699-3440

https://orcid.org/0000-0003-0656-1215

ability of game statistics, providing analysts, coaches,

and researchers with detailed insights into historical

games. A considerable percentage of the data col-

lected during these matches is readily available to re-

searchers and analysts aiming to predict future match

outcomes or in-depth evaluation. This in turn, cre-

ates a whole host of new opportunities. After all,

the beneﬁciaries, are not only the large bookmak-

ing companies or clubs. A great many football fans

avail themselves of those analyses to better under-

stand their favourite team’s strategies or to play in

fantasy leagues where reliable predictions are at a pre-

mium.

In this paper, we present a novel approach for pre-

dicting football match outcomes by leveraging time

series analysis and Siamese networks. Siamese net-

works are primarily used in tasks where measuring

the similarity or distance between two inputs is es-

sential. In the proposed approach the modiﬁed archi-

tecture based on Siamese networks is used for match

outcome prediction by comparing pairs of features

related to the teams and speciﬁc match characteris-

tics. The features used for match outcome predic-

tion are taken from the previous matches played by

both teams and treated as time series data to capture

temporal dynamics. To compare the results obtained

using Siamese networks, the most popular classiﬁers

Wiechno, W., Bartosik, B. and Duch, P.

Time Series and Deep Learning Approaches for Predicting English Premier League Match Outcomes.

DOI: 10.5220/0013182500003890

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 3, pages 789-796

ISBN: 978-989-758-737-5; ISSN: 2184-433X

789

and neural networks found in the literature, including

recurrent neural networks, were trained on the same

set of features.

2 BACKGROUND AND RELATED

WORKS

2.1 Predictive Models Evolution

The history of attempts to predict football match re-

sults dates back to the mid-20th century. One of the

earliest studies focused on analysing relationship be-

tween factors like possession, shots on and off target,

and the likelihood of scoring a goal was conducted

over 50 years ago (Reep and Benjamin, 1968). Since

then, many different approaches were used in order

to predict the matches outcomes. The most popular

includes Logistic Regression, Artiﬁcial Neural Net-

works (ANNs), Bayesian Networks, Decision trees,

k-NN, Na

ıve Bayes, Random Forest, and Support

Vector Machine (SVM). Recently, approaches based

on deep learning have become more popular. One of

the example of such approaches is the Artiﬁcial Neu-

ral Network trained to predict results of all matches

during 2006 Soccer World Cup tournament (Huang

and Chang, 2010).

2.2 Literature Review

In case of the Premier League, which is the object

of this research, aforementioned methods are applied

along with different approaches that vary in terms of

selected matches data source, seasons span, features

and feature engineering, and ﬁnally parameter tuning.

Commonly utilized data source are websites con-

taining historical matches data, in particular http://

football-data.co.uk. An example of its 11 seasons

composition with Rating statistics, followed by a

careful feature engineering, is a proposition by (Ba-

boota and Kaur, 2019). As a result, their best classi-

ﬁer, gradient boosting, achieves an accuracy of 56.7%

in predicting the matches outcomes aggregated over

two seasons (2014/2015 and 2015/2016) using 33 for-

merly crafted features. This algorithm slightly out-

performed Random Forest, and got higher advantage

over SVM and Na

ıve Bayes.

While (Muszaidi et al., 2022) operates on the same

data origin, only the 2018-2019 season consisting of

380 matches (each of 62 features) has been selected

for training and validation. Unlike the former, this

paper leverages the ANN approach, namely the Mul-

tilayer Perceptron, and obtains the accuracy of 78.4%,

which is higher than the one obtained by its deeper

version. However, it is not stated what set of this data

has been use for validation.

More sophisticated deep learning classiﬁer has

been presented in (Jain et al., 2021). Authors of

this paper prepared their dataset by taking into ac-

count football matches from seasons 2010/2011 to

2017/2018, and performing manual feature selection

making each sample consists of 22 attributes. Next,

the Long Short-Term Memory (LSTM) neural net-

work, which is typically used in tasks requiring under-

standing of sequence for further predictions, has been

employed. After performing a simple grid search hy-

perparameters tuning, the architecture provides the

accuracy of 81.2%. This approach, however, returns

the output within 2 classes — win or loss. This

methodology is ﬂawed, as it disregards the possibil-

ity of a draw, thereby limiting its predictive accuracy.

An extensive review of machine learning match

results prediction has been prepared by (Bunker and

Susnjak, 2022). The paper compares team sports

studies from 1996 to 2019, where football, as the ma-

jority, stands for around 37% of the sports taken into

account. The authors reveal that the ANNs have been

used the most frequently, i.e. in over 21% of the cases,

with the Decision trees on the second place (13%).

The English Premier League appears in four articles

within the review.

2.3 Discussion

The majority of presented papers rely on a data that

has been dimensionally reduced through simple sta-

tistical method — arithmetic mean. Data preparation

plays a crucial role in the predictive accuracy of foot-

ball match outcomes. The way it is averaged may

limit its potential and lead to masking of existing de-

pendencies. This can prevent the model from captur-

ing them as deciding patterns when it comes to reli-

able predictions.

In order to investigate possible accuracy improve-

ments, more sophisticated and less traditional ap-

proaches could be explored. The repeatability of sta-

tistical models that are exerted in this area presents

great potential for testing the ones that is classical use

case serves different tasks nature.

The increasing number of available data, as the

time passes creates new ﬁeld for validating the solu-

tions developed earlier. More interestingly, investi-

gating more complex models that can efﬁciently work

with the increased amount of information may enable

capturing dependencies that could have been undis-

coverable before.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

790

3 DATA

The data that has been used in this research spans

from the 2017/2018 to 2023/2024 Premier League

seasons. The dataset covers 2660 matches, of which

1192 ended in a home team victory, 602 in a draw, and

866 in an away team victory. The detailed statistics

of each match have been collected from two sources:

“https://fbref.com/en/” and “http://clubelo.com/”.

3.1 Features

For each match, the performance of home and away

teams in previous matches is being compared. Each

team is described by a set of 18 individual features

that have been meticulously chosen as the ones which

contribution to the match outcome may have the

strongest inﬂuence. Next, these teams’ information

are concatenated along with the result of their match

and the venue, yielding a total of 38 attributes describ-

ing their meeting.

These features have been categorized into three

different types. The ﬁrst type is related to the overall

team form and includes the team’s results in previous

matches, speciﬁcally information about wins, losses,

or draws, as well as the location of the match: home

or away. The second type of features concerns de-

tailed information about the team’s performance dur-

ing those matches. This includes the following fea-

tures: the number of aerials won, clearances, corners,

crosses, fouls, goalkicks, interceptions, longballs, off-

sides, passes and passes accuracy, possessions, saves,

shooting accuracy, shots on-target, tackles, throw-ins.

The ﬁnal set of features relates to the strength of the

team in a given match, for which the Elo rating was

used (Elo, 1961). The Elo rating is well known from

chess, it assess the relative strength of teams based

on their previous performance. The adjustments are

made after each match, depending on the match out-

come and the strength of the opponent.

3.2 Dataset

Two datasets were prepared to thoroughly investigate

the proposed approach. In the ﬁrst dataset, the form

of both teams (home and away) in the ﬁve matches

preceding the considered match is taken into account.

The second dataset, however, focuses on the teams’

form at home and away speciﬁcally, considering their

last three respective matches.

The ﬁrst dataset (MatchForm-5) takes into consid-

eration detailed statistics of both teams in the previous

ﬁve matches. For each of those matches, the statistics

of the considered team and its opponent are recorded,

including metrics such as goals scored, shots, passes,

possession, the location of the match, the match out-

come, and the strength of both teams at the time of

the game. This gives a comprehensive informations

about the team’s form leading up to the match in ques-

tion. In total the 38 features is available for each

game. To determine the form of a given team, only

matches played in the Premier League, were taken

into account. Therefore, matches from the Champi-

ons League, FA Cup, or other competitions played by

the teams during that time were ignored. When it was

not possible to generate the form of either team based

on the last ﬁve matches the match was ignored. Fi-

nally, the dataset consists of 2307 matches from the

seven seasons considered, with 1046 home team vic-

tory, 518 draws, and 743 away team victory.

The second dataset (HomeAwayForm-3) considers

only the team’s form based on the venue. It is very

common in football that a team’s playing style differs

between home and away matches. This is clearly vis-

ible in the points earned by teams in home matches

and those gathered in away games. Additionally, this

distinction is also evident in the detailed statistics of

the matches, including metrics such as the number

of passes completed, shots taken, and other relevant

performance indicators. In this dataset the form of

the host team in the last three home matches is taken

into account, reﬂecting potential advantages of play-

ing home. Similarly, for the guest team, their form in

the last three away matches is considered. In contrast

to the ﬁrst dataset, this case does not include statistics

from opponents in historical matches. Matches for

which it was not possible to collect data on the last

three games for either team were ignored. Finally,

the dataset includes 2228 matches with 20 features,

where 1011 matches ended in a home team victory,

499 in a draw, and 718 in an away team victory.

The literature commonly presents two popular ap-

proaches for dividing data into train and test datasets.

In the ﬁrst approach, there is a simple division into

two disjoint sets, with 80% of the matches allocated

to the train set and 20% to the test set (80 20, Fig. 1).

The second method of splitting the data takes into ac-

count the seasons in which the matches are played. In

this case, the most recent season is typically treated

as the test data, while the remaining seasons consti-

tute the train set (test is last). In our study, both ap-

proaches were applied.

4 PROPOSED APPROACH

First, the selected classiﬁers were trained on the pre-

viously prepared data to establish a baseline for the

Time Series and Deep Learning Approaches for Predicting English Premier League Match Outcomes

791

Figure 1: The class distribution among the respective cate-

gories. The Total bars include the whole set of data taken

into consideration in the research, while the Test stand for

the 20% validation split. The percentages on the bars relate

to the share of each label in each dataset.

other algorithms. Next, Multilayer Perceptron, Re-

current Neural Network, and Siamese networks were

trained on the same data.

4.1 Predictive Models

4.1.1 Baseline Classiﬁers

To determine the baseline for the proposed solution

seven different classiﬁers were considered: Random

Forest, Gaussian Na

ıve Bayes, Support Vector Ma-

chine, Decision Tree, K-Nearest Neighbours, and

XGBoost. Those are most commonly used classiﬁers

for predicting match outcomes in the literature.

4.1.2 Multilayer Perceptron

The Multilayer Perceptron (MLP), a type of artiﬁcial

neural network, consists of multiple interconnected

layers of neurons. This structure renders it an asset for

match outcome forecasting algorithms, as it adeptly

tackles complex, non-linear problems. Furthermore,

its inherent ability to discern subtle patterns within

data signiﬁcantly enhances accuracy. In particular,

its capacity to juxtapose various variables—such as

those related to teams’ performance—plays a pivotal

role in reﬁning prediction precision.

4.1.3 Recurrent Neural Networks

Recurrent Neural Networks (RNNs) play a pivotal

role in numerous machine learning tasks – especially

when processing data sets such as time series. They

are instrumental in predictions that heavily rely on

context from earlier steps. Deplorably, they tend to

quickly forget information once learned. That is why

another approach was used - Long Short-Term Mem-

ory.

Recurrent Neural Networks are particularly suited

for predicting match outcomes due to their ability to

process sequential data and capture temporal depen-

dencies. Historical match data for both teams are fed

into the network, allowing it to analyse trends and

patterns over time. By examining sequences of past

performances, the RNN can identify how each team’s

form evolves, accounting for variables such as re-

cent victories, losses, previous detailed match statis-

tics and changes in the teams strength.

4.1.4 Siamese Neural Networks

Siamese Neural Networks are a special type of neu-

ral network architecture designed to identify semantic

similarities between two inputs by processing them

through two or more identical subnetworks that share

the same weights (Bromley et al., 1993). These net-

works take two (or more) input samples and, for each

of them, output embedding vectors, which are then

compared using a distance metric to determine the de-

gree of similarity. This architecture is particularly ef-

fective in tasks such as face recognition and matching,

where understanding the relationship between pairs of

inputs is crucial.

Because the aim of the classiﬁer is to determine

the outcome of the match as one of the three classes,

this type of task signiﬁcantly differs from the the clas-

sical use case — the architecture and loss function

have to be adjusted accordingly, so that the output can

be clearly interpreted.

Figure 2: Proposed Siamese Neural Network architecture.

In order to fulﬁl the desired requirements, the ar-

chitecture shown in the Figure 2 has been proposed.

⃗x

, ⃗e

are the match input data and their embeddings,

respectively, where i = 1 is home, and i = 2 away

team. The concatenated embeddings are being passed

to a head network which ﬁnally predicts the match

result vector ⃗y that is compared against the target

⃗

within the loss function.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

792

4.2 Grid Search

Determining optimal aforementioned classiﬁers hy-

perparameters can be done using various techniques,

starting from manual adjustments based on observed

learning curves, through random search, to evolution-

ary and gradient-based methods. Taking into account

the computational effort required for this research,

as well as knowledge of subsets of hyperparameters

for which the best performance could be expected,

a grid search method has been employed. As each

type of model is trained under speciﬁc hyperparame-

ters regime, different subsets for each classiﬁer have

been considered.

Within this stage, not only the hyperparameters

were considered as an object of adjustment, but also

the type of the data structure that was used in the

training. This has been done in order to capture both

the one that contributes to the predictive accuracy the

most, and the potential interdependencies between its

structure, and the models architecture.

4.2.1 Baseline Classiﬁers

All of the classiﬁers were evaluated by ﬁne-tuning

their hyperparameters. The optimal conﬁgurations

were identiﬁed using grid search, ensuring that the

best parameters were selected for each of them. Op-

timisation included a wide range of hyperparameters,

among which there was the number of estimators for

the Random Forest, value of C for SVM, maximum

depth for Decision Trees or gamma hyperparameter

for XgBoost.

4.2.2 Multilayer Perceptron

To guarantee that the optimal hyperparameters were

used, and the best possible performance was achieved

in MLP approach a grid search was conducted to tune

the following parameters: batch size, model activa-

tion function, rate of the dropout layer, and optimizer.

It was found that the smaller the batch size was the

better was the prediction accuracy, and that the rate

of the dropout layer needed to be relatively high to

achieve the best results.

4.2.3 Recurrent Neural Networks

In an effort to identify the optimal setup of the dataset,

features, and hyperparameters in RNN, various con-

ﬁgurations were tested. The grid search of hyper-

parameters included variables such as the number of

neurons, batch size, model activation function, model

optimizer, and the rate of dropout layers.

4.2.4 Siamese Neural Networks

The optimal hyperparameters search in case of

Siamese Neural Network is focused not only on the

basic ones, like batch size or learning rate, but also

on extensive search of the architecture that would suit

the match outcome prediction task.

As the typical use case of this model is determin-

ing the degree of similarity, it has been ﬁrst tested in

this classical setup, i.e. with one output neuron. For

this task, the Mean Squared Error (MSE, eq. 1) loss

function has been applied, and a tanh activation func-

tion. The labels for away win, draw, and home win

has been set as −1, 0, and 1, respectively. Next, the

model has been trained for a regression task, where

the output was being discretised to one of the three

labels.

MSE(

⃗

t,⃗y) =

∑

i=1

− y

)

(1)

However, because task itself is about predicting

one of the three classes, the second setup considered

three output neurons, contrastively to the standard

Siamese network architecture. These three neurons,

followed by so f tmax activation function, can be now

exploited by using Categorical Cross Entropy (CCE,

eq. 2) loss function.

CCE(⃗y,

⃗

t) = −

∑

i=1

∑

c=1

i,c

log(y

i,c

) (2)

As a result, according to Table 1, an architecture

with one hidden layer in the twin subnetworks and

one in the head network has been chosen. Moreover,

the highest accuracy has been achieved on the test

dataset with the use of CCE loss. This is a setting

that corresponds to three output neurons and one-hot

labels encoding.

5 RESULTS

5.1 Baseline Classiﬁers

The highest accuracy achieved by the Random Forest

classiﬁer was 55.01%. Nonetheless, it lacked balance,

particularly in predicting draws, which were often

predicted with the 0.00% accuracy. It was only when

the HomeAwayForm-3 dataset with an 80/20 train-test

split was utilized that Random Forest emerged as one

of the more balanced out classiﬁers while maintaining

a substantial overall accuracy of 54.48%.

Similarly, the XGBoost classiﬁer achieved its

highest accuracy of 55.10% on the HomeAwayForm-3

Time Series and Deep Learning Approaches for Predicting English Premier League Match Outcomes

793

Table 1: Chosen hyperparameters subset for grid search.

parameter subset optimal

batch size {16,64, 128} 128

learning rate

{10

−3

, 3 · 10

−4

, 3 · 10

−5

}

−4

twin

hidden layers

{64,

128,

64-64,

128-64,

128-128-64}

128

head

hidden layers

{none,

64,

128,

64-32,

128-64}

loss function {MSE, CCE} CCE

dataset. As expected, general performance improved

with more training data, as seen in the result on the

test is last split. Nevertheless, draw prediction re-

mained unsatisfactorily low at 0.00%.

Interestingly, this pattern was not observed with

the Gaussian Na

ıve Bayes classiﬁer, a probabilis-

tic model. Its accuracy remained relatively sta-

ble across all datasets. The highest overall accu-

racy was achieved with the HomeAwayForm-3 dataset

and the test is last split. Notably, the draw pre-

diction accuracy for this model was consistent be-

tween both data splits, reaching approximately 15%

for the HomeAwayForm-3 dataset and 25% for the

MatchForm-5 dataset.

The Support Vector Machine (SVM) with an RBF

kernel and Decision Tree classiﬁers were much more

sensitive to the way the training/testing split was

conducted. Both classiﬁers performed well on the

MatchForm-5 dataset but saw a substantial drop in

accuracy when using the HomeAwayForm-3 dataset

with an 80 20 split. Nevertheless, on the latter

dataset, the achieved accuracy was among the most

balanced across all classiﬁers, but the draw prediction

plummeted to 0.00% on the former dataset. Visualiz-

ing the decision tree structure using Graphviz and the

plot tree function provided insight into the models’

mechanics but did not resolve the issue of low draw

accuracy on the HomeAwayForm-3 with test is last

split.

After tuning hyperparameters for the K-Nearest

Neighbours (K-NN) and Bagging classiﬁers, with the

former optimizing the number of neighbours and the

latter adjusting the number of estimators, both classi-

ﬁers reached a high accuracy. Notably, the Bagging

classiﬁer, using the Decision Tree as its base estima-

tor, outperformed - in therms of accuracy - all classi-

ﬁers, including the Decision Tree itself.

5.2 Multilayer Perceptron

Inasmuch as the baseline classiﬁers performed rea-

sonably well, the introduction of the Multilayer Per-

ceptron (MLP) classiﬁer led to further improve-

ments. The best performance was observed on

the MatchForm-5 dataset using the test is last split,

achieving an overall accuracy of 57.45%. This con-

ﬁguration was particularly notable for generating the

most balanced predictions, with the draw predic-

tion accuracy reaching 15.27%. Similarly, in the

MatchForm-3 dataset, it was the split with more train-

ing data that had better overall accuracy, albeit with

the worse draw predictive performance.

5.3 Recurrent Neural Networks

The evaluation of the accuracies of Recurrent Neu-

ral Networks in both datasets and splits showed

lower sensitivity to the different dataset conﬁgura-

tions. The overall accuracies achieved on each of

the two datasets was approximately 56%. The most

profound difference found was in the accuracy of

draw prediction. Speciﬁcally, in the MatchForm-

5 dataset (Table 5), the aforementioned accuracy

reached a notable 22.22%, whereas in both splits of

the HomeAwayForm-3 dataset (Table 2 and Table 3),

it was signiﬁcantly lower, at 0.00%.

5.4 Siamese Neural Networks

While the overall results for deep learning methods

yield the highest accuracies, Siamese Neural Network

consistently outperforms other models, demonstrat-

ing the best results across prepared datasets and their

validation splits. It is particularly evident in the splits

of MatchForm-5 dataset (Table 4 and Table 5).

However, the problem present in other models, i.e.

accuracy of draw prediction, is a case of this solution

as well. Despite the best overall results - shown in Ta-

ble 5, it has the lowest score in this metric. Addition-

ally, the overall accuracy (Table 2) indicate minimal

difference between the Siamese network and a simple

Multilayer Perceptron. However, a closer examina-

tion of the class-wise accuracy reveals the Siamese

network’s tendency to alleviate the problem by ignor-

ing prediction of draw outcomes.

The model presents better performance than the

others inasmuch as it developed a strategy of focusing

on the prediction of the winner of the match. Conse-

quently, this outcome proves that, given task-speciﬁc

adjustments in the default architecture, the approach

can be applied to problems whose ultimate goal does

not necessarily focus on determining similarity.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

794

Table 2: Accuracies achieved on HomeAwayForm-3 80 20.

Classiﬁer Overall accuracy Home win accuracy Draw accuracy Away win accuracy

Random Forest 54.48% 73.33% 7.76% 59.38%

Gaussian Na

ıve Bayes 51.12% 67.14% 14.56% 54.13%

Support Vector Machines 48.20% 56.19% 23.31% 54.88%

Decision Tree 43.94% 50.47% 30.09% 44.36%

K-Nearest Neighbours 50.22% 75.71% 7.76% 42.85%

Bagging 50.67% 68.57% 9.71% 54.13%

XGBoost 52.92% 77.62% 5.82% 50.37%

Multilayer Perceptron 56.50% 80.95% 7.76% 48.12%

Recurrent Neural Network 56.05% 71.42% 0.00% 61.65%

Siamese Neural Network 56.50% 88.57% 0.00% 49.62%

Table 3: Accuracies achieved on dataset HomeAwayForm-3 test is last.

Classiﬁer Overall accuracy Home win accuracy Draw accuracy Away win accuracy

RandomForest 54.14% 83.56% 0.00% 48.98%

Gaussian Na

ıve Bayes 53.82% 73.29% 15.71% 52.04%

Support Vector Machines 54.46% 74.66% 5.71% 59.18%

Decision Tree 49.04% 47.95% 0.00% 85.71%

K-Nearest Neighbours 51.59% 80.14% 1.43% 44.9%

Bagging 54.78% 82.19% 1.43% 52.04%

XGBoost 55.1% 80.82% 0.00% 56.12%

Multilayer Perceptron 56.69% 78.76% 0.00% 51.02%

Recurrent Neural Network 56.69% 64.38% 0.00% 71.42%

Siamese Neural Network 57.96% 81.51% 0.00% 64.28%

Table 4: Accuracies achieved on dataset MatchForm-5 80 20.

Classiﬁer Overall accuracy Home win accuracy Draw accuracy Away win accuracy

Random Forest 53.67% 73.18% 0.00% 63.04%

Gaussian Na

ıve Bayes 49.56% 55.45% 25.96% 57.97%

Support Vector Machines 53.24% 68.63% 3.84% 65.94%

Decision Tree 53.03% 86.81% 0.00% 39.13%

K-Nearest Neighbours 50.86% 70.00% 2.88% 56.52%

Bagging 54.54% 78.63% 0.96% 56.52%

XGBoost 52.59% 68.63% 8.65% 60.14%

Multilayer Perceptron 56.71% 67.27% 2.28% 50.43%

Recurrent Neural Network 56.28% 75.45% 6.73% 49.27%

Siamese Neural Network 57.58% 85.38% 0.00% 61.59%

Table 5: Accuracies achieved on dataset MatchForm-5 test is last.

Classiﬁer Overall accuracy Home win accuracy Draw accuracy Away win accuracy

Random Forest 55.01% 81.16% 0.00% 54.36%

Gaussian Na

ıve Bayes 50.45% 57.79% 25.0% 57.28%

Support Vector Machines 55.62% 71.42% 4.16% 67.96%

Decision Tree 52.88% 87.01% 0.00% 38.83%

K-Nearest Neighbours 54.40% 75.32% 4.16% 58.25%

Bagging 56.53% 81.81% 2.77% 56.31%

XGBoost 53.49% 73.37% 9.72% 54.36%

Multilayer Perceptron 57.45% 55.84% 15.27% 56.31%

Recurrent Neural Network 56.53% 56.49% 22.22% 48.02%

Siamese Neural Network 58.97% 84.42% 0.00% 62.14%

6 CONCLUSIONS

In the paper, the method for predicting football match

outcomes based on Siamese Neural Network archi-

tecture was presented. The proposed method lever-

ages the ability of Siamese networks to compare the

performance of two teams by processing their recent

performance data through two identical subnetworks.

This approach allows the model to capture similari-

ties and differences in the teams’ trends leading up

to the match. The inputs of the network are detailed

statistics from the previous matches of the analysed

Time Series and Deep Learning Approaches for Predicting English Premier League Match Outcomes

795

teams. That way the network can more effectively un-

derstand the trend in team performance, which leads

to better match outcome predictions. In contrast to

the classic Siamese network architecture, the output

of the proposed architecture consists of three neurons,

corresponding to the possible match outcomes.

The proposed algorithm was compared with clas-

siﬁers commonly used for match outcome predic-

tion in the literature. For this purpose, seven classi-

ﬁers (Random Forest, Gaussian Na

ıve Bayes, Support

Vector Machine, Decision Tree, k-Nearest Neigh-

bours, and XGBoost) were selected and trained, in-

cluding the optimization of their parameters. To fur-

ther investigate the results of the proposed algorithm,

two additional approaches based on deep neural net-

works were examined: Multilayer Perceptron and Re-

current Neural Network. All methods were trained

and tested on two prepared datasets: MatchForm-5

and HomeAwayForm-3 with the same division into

training and test sets.

The obtained results demonstrate that approaches

based on deep neural networks outperform traditional

classiﬁers for each of the datasets analysed. The per-

formance of MLP and RNN was by at least 2% better

than the best of the classical classiﬁers. The proposed

architecture of Siamese Neural Networks achieved re-

sults up to 59.00% in overall match prediction accu-

racy, which are better than those obtained by MLP

and RNN. This indicates that Siamese networks, with

their ability to effectively capture the comparative dy-

namics between teams, offer a promising approach for

improving match outcome prediction.

The greatest challenge in predicting football

match outcomes is forecasting a draw. This situa-

tion is reﬂected in the results obtained by the analysed

methods. Approaches based on deep neural networks

are characterized by greater overall accuracy, but ac-

curacy in predicting draws is very low.

This work contributes to the ﬁeld of football

match outcome prediction by introducing a method

based on the modiﬁed Siamese Neural Network archi-

tecture and times series approach. Future work may

include incorporating Recurrent Neural Networks or

Long Short-Term Memory units into the Siamese

Neural Network architecture to better capture tempo-

ral dependencies and enhance the model’s ability to

analyse the patterns of team performance over time.

This integration could potentially improve prediction

accuracy by allowing the model to consider both the

historical context and the evolving dynamics of the

teams involved.

ACKNOWLEDGEMENTS

This work was ﬁnanced by the Lodz University of

Technology, Faculty of Electrical, Electronic, Com-

puter and Control Engineering as a part of statutory

activity (project no. 501/2-24-1-1).

REFERENCES

Baboota, R. and Kaur, H. (2019). Predictive analysis and

modelling football results using machine learning ap-

proach for english premier league. International Jour-

nal of Forecasting, 35(2):741–755.

Bromley, J., Guyon, I., LeCun, Y., S

ackinger, E., and Shah,

R. (1993). Signature veriﬁcation using a ”siamese”

time delay neural network. Advances in neural infor-

mation processing systems, 6.

Bunker, R. and Susnjak, T. (2022). The application of ma-

chine learning techniques for predicting match results

in team sport: A review. Journal of Artiﬁcial Intelli-

gence Research, 73:1285–1322.

Daniel Harraghy (2024). How the Premier League’s

global popularity is driving its revenue growth.

https://www.sportspromedia.com/insights/opinions/

premier-league-tv-rights-revenue-global-popularity-

sponsorship-data-ampere/. Online; accessed 22

October 2024.

Elo, A. E. (1961). New uscf rating system. Chess life,

16:160–161.

Huang, K.-Y. and Chang, W.-L. (2010). A neural net-

work method for prediction of 2006 world cup foot-

ball game. In The 2010 International Joint Conference

on Neural Networks (IJCNN), pages 1–8.

Jain, S., Tiwari, E., and Sardar, P. (2021). Soccer result

prediction using deep learning and neural networks.

In Intelligent Data Communication Technologies and

Internet of Things: Proceedings of ICICI 2020, pages

697–707. Springer.

Muszaidi, M., Mustapha, A. B., Ismail, S., and Razali, N.

(2022). Deep learning approach for football match

classiﬁcation of english premier league (epl) based on

full-time results. In Proceedings of the 7th Interna-

tional Conference on the Applications of Science and

Mathematics 2021: Sciemathic 2021, pages 339–350.

Springer.

Reep, C. and Benjamin, B. (1968). Skill and chance in as-

sociation football. Journal of the Royal Statistical So-

ciety. Series A (General), 131(4):581–585.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

796