NBA Player Score Prediction Based on Machine Learning
Haoyu Chen
Faculty of Science, China Pharmaceutical University, Nanjing, China
Keywords: Machine Learning, Visualization Technology, Random Forest, Linear Regression.
Abstract: With the success of machine learning and data visualization in many fields, the NBA(National Basketball
Association) has also benefited from its huge demand for data analysis. These analysis results have been
extensively applied in player draft, player training and tactical decisions, playing a crucial role in management
and coaching staff decisions. This article utilizes data visualization technology and machine learning to
analyze the NBA dataset. Using random forest and multiple linear regression models to predict NBA player
scoring performance, and evaluate the model using R-square scores and MAE(Mean Absolute Error). There
are some significant relationships between Points and several features like Turnovers, FGM and Minutes
Played. After a ten-fold validation experiment, it was found that both the multiple linear regression and
random forest are greater than 0.98 in R-square scores. And according to the result of the comparison, the
multiple linear regression model is more suitable as a score prediction model and has a better stability for this
dataset.
1 INTRODUCTION
Basketball is one of the most popular sports in the
world. Many people are attracted by its entertainment
and antagonism. Therefore, NBA (National
Basketball Association), the top basketball league in
the world was focused on millions of fans worldwide
who eagerly followed the performances of their
favorite teams and players. Behind the scenes, teams
and coaching staffs have recognized the value of
leveraging data to gain a competitive edge. By
extracting insights from vast amounts of historical
game data, teams can optimize strategies, enhance
player performance, and make informed decisions
both on and off the court (Thabtah et al 2019). Team
managers in the NBA are beginning to gradually
realize the huge potential of data analysis in
basketball and are attempting to recruit data analysts
to carry out further quantitative analyses of their
players' physical condition and game performance.
However most traditional data analysis only uses
tools such as line charts to visually present players'
various data or uses heat maps to display players'
sweet zones (Georgievski and Vrtagic 2021). Such
methods can only have a superficial understanding of
the data set as it can not find the interaction among
various features. Otherwise, the noise hiding behind
the features can seriously influence the result of
analysis. The traditional analysis methods can not
solve these problems. So machine learning fills this
area perfectly. With the development of machine
learning, data analysts can utilize the rich and diverse
dataset to offer more comprehensive and three-
dimensional analysis, including regression,
classification and clustering. This paper will predict
player scores by using machine learning and
statistical analysis based on a dataset from Kaggle
which contains ample samples of NBA players. The
dataset would be presented by several charts and
tables visually. After analyzing the charts and
pictures obtained from the result of data
preprocessing and visualization, the p-value and
student test would be used to determine the final
features. These features should have greatly
impression on the model. Then the methods of linear
regression and random forest would be used to predict
scores based on these selected features. Linear
regression is a mathematical and statistical method
that determines the parameters of a straight line by
examining the relationship between the independent
and dependent variables to find a line that best fits all
samples. Predictions are made on new data through
the model obtained. Random forest is an algorithm
based on decision trees. It builds multiple different
decision trees by picking features multiple times.
Predictions on new data would gain a final prediction
Chen, H.
NBA Player Score Prediction Based on Machine Learning.
DOI: 10.5220/0012801700003885
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Data Analysis and Machine Learning (DAML 2023), pages 291-296
ISBN: 978-989-758-705-4
Proceedings Copyright © 2024 by SCITEPRESS – Science and Technology Publications, Lda.
291