Analyzing Factors that Lead to NBA Regular Season Success

Mohamad El-Hajj, Jackson Steed, Victor Gore, Craeg Jethro Infante, Raniel Flores, Danindu Wakista

and Mohammed Elmorsy

Computer Science Department, MacEwan University, Edmonton, Canada

ﬂ

Keywords:

Clustering, K-Means, Decision Trees, Sports Analytics, National Basketball Association.

Abstract:

The National Basketball Association (NBA) values regular-season success and acknowledges the crucial role

of a team’s roster composition in determining overall performance. This study uses machine learning tech-

niques, speciﬁcally unsupervised learning clustering and decision tree models, to predict the composition of

a winning roster. Our research identiﬁed three distinct clusters based on win percentage and the distribution

of players across different skill levels. Successful teams typically have more top-tier players and a signiﬁcant

representation of players in the lowest skill level. In contrast, teams that spread their talent across the entire

roster are less successful. We have noticed that players with average to above-average skills are notably af-

fected by excessive playing time in the previous game, which leads to decreased performance and potential

losses for the team in the next game. Considering the time of year and the gap between games, we recom-

mend prioritizing the rest and recovery of top players, especially in the latter half of the season. It’s crucial to

ensure that players who are not as skilled as the top players but still make signiﬁcant contributions to the team

maintain consistent performance, especially during the ﬁrst half of the season. Analyzing height’s impact on

basketball player performance has revealed practical insights that can empower coaches and management. We

found that the shortest and tallest players often perform less than those of average height. Most top performers

in the NBA tend to have heights closer to the average. However, for players who frequently operate near

the net and encounter numerous rebound opportunities, it is generally preferable to have an average or taller

player for slightly enhanced overall performance compared to below-average height players. Teams can use

these insights to improve their roster construction and maximize player utilization by coaches from one game

to the next. This research provides practical strategies that can be immediately implemented to enhance team

performance.

1 INTRODUCTION

The NBA is considered the top professional basket-

ball league globally, with 30 teams, each with 15 tal-

ented players. Teams range from smaller market ones

like the New Orleans Pelicans to globally celebrated

franchises like the Los Angeles Lakers (NBA, 2023),

(Burns, 2023).

NBA teams increasingly use advanced analytics

to improve their operations and overall performance.

This includes identifying players with long-term po-

tential, reducing injury risks, and maintaining consis-

tent performance. Analytics have led to increased rev-

enue and a stronger winning record (Bishop, 2023).

Teams analyze player longevity, injury susceptibility,

performance at different stages of the season, play

styles, and more to evaluate a player’s suitability and

alignment with the team’s vision. This work suggests

going beyond standard statistics to using data mining

to ensure the selection of effective players and the for-

mation of successful team combinations.

NBA teams must balance building competitive

rosters with ﬁnancial stability. The league enforces

a salary cap to create fairness and equal opportunities

for all teams. Teams aim to optimize player combi-

nations while managing expenses. Identifying under-

valued players through data analysis can help teams

secure talented players at lower costs.

Teams closely guard their proprietary advanced

analytics systems and methods to gain a competi-

tive edge. Despite this secrecy, certain statistics are

widely used by sports media, basketball aﬁcionados,

and team management. One such advanced statistic

is win-shares, which aims to estimate the number of

wins each player contributes to their team over a sea-

son. It serves as a valuable metric that essentially

functions as a scorecard, measuring a player’s overall

impact on their team’s success. Another crucial statis-

El-Hajj, M., Steed, J., Gore, V., Infante, C. J., Flores, R., Wakista, D. and Elmorsy, M.

Analyzing Factors that Lead to NBA Regular Season Success.

DOI: 10.5220/0013041500003828

In Proceedings of the 12th International Conference on Sport Sciences Research and Technology Support (icSPORTS 2024), pages 83-94

ISBN: 978-989-758-719-1; ISSN: 2184-3201

tical measure is a player’s offensive rating, which as-

sesses their scoring effectiveness by evaluating their

scoring efﬁciency while considering the number of

possessions they utilize. Additionally, (Basketball

Reference, 2023a), the plus/minus statistic helps in

gauging whether a team outscores their opponent or

is outscored when a speciﬁc player is on the ﬂoor. In

this work, we use data mining methods to answer four

research questions were as follows:

1. What is the optimal strategy of a roster: Should

a roster concentrate on the best players and sup-

plement with weaker ones, or distribute talent

evenly across the team?

2. Can the timing of the year or the duration be-

tween games impact the performance?

3. Could a player’s performance in the current

game be inﬂuenced by their duration of participa-

tion in the previous game? What are the different

ways in which a player’s height may affect their

performance?

4. Can we explore the elements that led to the team’s

exceptional performance in a particular season?

Our primary objective is to have a positive impact on

coaches and management by offering them pioneering

and effective strategies for the development and man-

agement of their rosters. We are dedicated to provid-

ing valuable insights and implementing practical solu-

tions that will signiﬁcantly improve the efﬁciency and

effectiveness of their roster management processes.

2 RELATED WORKS

Franks et al. (Franks et al., 2015). evaluated the de-

fensive metrics that inﬂuence the outcome of an NBA

game. They used a matchup matrix and a spatial re-

gression model to create a new metric. Their analy-

sis involved closely evaluating the number of points a

defensive player prevents an opponent from scoring.

They then identiﬁed the speciﬁc location on the court

and developed a disruption score to determine where

a defender is most likely to stop a shot.

McIntyre et al. (McIntyre et al., 2016) conducted

a study on defensive strategies used in response to of-

fensive screens in basketball. They used data mainly

from SportVU and carefully analyzed how the defen-

sive team reacts when the offensive team sets a screen.

The researchers categorized the screens based on their

location on the court and developed four distinct clas-

siﬁcations for the defensive tactics used in response

to these screens. This detailed analysis better ex-

plains how offensive and defensive strategies interact

in NBA games.

Gonzalez et al. (Gonzalez et al., 2013) studied

how a player’s performance changes throughout a

season. They examined two main factors: the number

of minutes a player spent in a game and their Verti-

cal Jump Power (VJP). The researchers compared the

VJP of players in the starting lineup with those who

were nonstarters. Their analysis found that starters

who played an average of 27.8 ± 6.9 minutes per

game tended to increase their VJP compared to non-

starters, who played an average of 11.3 ± 7.0 minutes

per game. Speciﬁcally, starters increased their VJP by

77.3 ± 78.1 W, while nonstarters increased by 2160.0

± 151.0 W.

In their study, Drakos et al. (Drakos et al., 2010)

thoroughly analyzed NBA injuries over 17 years.

They examined the total number of injuries related to

the number of games played and calculated the in-

jury rate per thousand athletes. The researchers also

looked into the speciﬁc body areas affected by these

injuries. They attempted to identify potential correla-

tions between injuries and demographic factors such

as weight, height, player age, and NBA experience.

However, they did not ﬁnd any signiﬁcant correlations

between these variables.

Berri et al. (Berri et al., 2011) critically exam-

ined the reverse-order draft system for amateur play-

ers. This system is designed to give weaker teams an

advantage by allowing them to secure the ﬁrst draft

picks. The researchers evaluated various factors such

as the players’ college performance, draft age, years

of college basketball experience, player height, and

position played. The study focused on college bas-

ketball players’ performance metrics and their inﬂu-

ence on draft day. It found that the number of points

scored in college was a signiﬁcant factor in the draft,

but it had minimal correlation with a player’s scoring

potential in the NBA. This suggests that the current

draft system may overlook crucial performance met-

rics when selecting future star players.

Fearnhead and Taylor (Fearnhead and Taylor,

2011) critically examine the prevalent rating systems

used to evaluate an NBA player’s performance. They

start by looking at the conventional regression model

that correlates a player’s performance with the num-

ber of wins their team achieves. They argue that

this model, while helpful, falls short in capturing the

player’s complete individual performance as it tends

to diminish the player’s contribution to the team’s suc-

cess. Fearnhead and Taylor have developed a new

model that provides a more accurate assessment of a

player’s abilities by separating their performance into

offensive and defensive ratings. This approach allows

for a more comprehensive evaluation of a player’s

skill set, taking into account their contribution to the

icSPORTS 2024 - 12th International Conference on Sport Sciences Research and Technology Support

team beyond just the number of wins. The method

leverages data from multiple seasons to estimate a

player’s ability in a speciﬁc season and measures de-

fensive and offensive ratings separately, combining

them to give an overall rating.

Most literature used statistical models to obtain

their results. In our work, we introduced machine

learning models, such as classiﬁcations and cluster-

ing, to predict the answers to our questions.

3 DATA

For our analysis, we utilized two main datasets gath-

ered from Kaggle (Kaggle, 2022) and the NBA open

data (ESPN, 2023). The ﬁrst dataset contains 26,652

rows and 21 columns, offering a comprehensive

overview of overall NBA game statistics. The second

dataset consists of 668,629 rows and 29 columns, pro-

viding detailed individual NBA player statistics per

game. These available datasets cover the period from

2003 to 2020 and were merged to create our primary

dataset. To enhance our analysis, we incorporated

win-share, offensive win-share, defensive win-share,

season team wins, season team losses from basket-

ball (Basketball Reference, 2023b), and NBA player

height data from ESPN (ESPN, 2023).

We analyzed individual player game perfor-

mances from 2003 to 2020, focusing on the more re-

cent style of play. The dataset contained over 600,000

rows. After data cleaning, around 550,000 rows were

left. We used feature selection and creation to retain

relevant columns such as season, plus-minus, height,

points, assists, rebounds, steals, turnovers, and more.

This helped prevent the curse of dimensionality. We

excluded irrelevant features like players’ nicknames

and team abbreviations from our analysis.

As per Basketball Reference, the win-share met-

ric is a player statistic designed to apportion credit

for team success among team members (NBA Stuffer,

2023a). Win-shares estimate the number of wins a

player contributes to their team through offensive and

defensive performances. Offensive win-shares cen-

ter on a player’s offensive contributions, such as scor-

ing points, creating team opportunities, and efﬁcient

shooting (Sporting Charts, 2023). Defensive win-

shares isolate a player’s defensive impact, including

blocking shots, stealing the ball, and overall defen-

sive prowess (Sports Lingo, 2023). These metrics as-

sess an individual’s collective offensive and defensive

performance in a season.

We obtained the team’s season wins and losses

data from Basketball (Basketball Reference, 2023b),

which included the number of wins, losses, team

name, and season. Combining the wins and losses

provided the team’s total games for that season. We

then calculated the win percentage by dividing the

number of games won by the total games played that

season.

Win Percentage =

Number of Games Won

Total Games

We introduced game dates and minutes played as

key components to create multiple new dimensions.

The game date was crucial for identifying the date of

the previous game and calculating the gap between

each game. Additionally, we leveraged the game

date to categorize the season into early, mid, and late

stages, with each stage representing a three-month pe-

riod. By accessing the previous game date, we were

able to extract the minutes played in the preceding

game. These additional dimensions enable us to ex-

amine how the stage of the season, the duration be-

tween games, and the minutes played in the previous

game inﬂuence an individual’s current game perfor-

mance.

In the world of the NBA, plus-minus (+/-) is a sta-

tistical tool used to gauge the point differential when

a player is on the court (NBA Stuffer, 2023b). It

provides valuable insights into a team’s performance

with a speciﬁc player on the ﬂoor. A positive plus-

minus value indicates that the player’s team outscored

the opponents while they were on the court. Con-

versely, a negative plus-minus value suggests that the

opposing team outscored the player’s team during

their time on the court. Win shares are crucial met-

rics for a player’s season performance, making plus-

minus an important measure of a player’s game per-

formance. Additionally, points, ﬁeld goals attempted,

free throws attempted, and turnovers are factored in

to create an offensive rating metric, offering a com-

prehensive analysis of a player’s offensive game per-

formance. The offensive rating is designed to quan-

tify a player’s offensive efﬁciency and contribution to

their team’s scoring, often expressed as the number of

points a player produces per 100 possessions (Fromal,

2023).

Player’s Possessions = Field Goals Attempted + 0.44×

Free Throws Attempted + Turnovers

Offensive Rating =

Points

Player’s Possessions

× 100

In our data analysis, we observed a wide range

of values within each category. To address this, we

opted to use a straightforward discretization method

called equal frequency binning for the mentioned val-

ues. Equal frequency binning involves dividing a di-

mension into bins to ensure that each bin contains a

Analyzing Factors that Lead to NBA Regular Season Success

Figure 1: Optimal number of clusters.

Figure 2: K-means clustering depicting heights and win-shares.

similar frequency of values. In this case, we created

six bins for each dimension. This approach ensures

that each category or bin will have an equally dis-

tributed representation when we run algorithms on the

data, thereby enhancing the resilience and accuracy of

our analysis.

4 ANALYSIS AND

METHODOLOGY

In the next section of our study, we will carefully an-

alyze and discuss the particular techniques utilized to

address each of the four research questions we have

identiﬁed. Subsequently, we will provide an in-depth

explanation of the results and valuable insights ob-

tained from our extensive analysis of each of these

research inquiries.

4.1 Question One: Optimal Strategy

When deciding whether teams should focus on hav-

ing a few standout players or distributing talent across

various positions, we utilized the k-means clustering

technique. This method helps identify data points that

are more similar to each other than to others. It in-

volves randomly placing centroids, which represent

the center of a cluster, and then assigning data points

to clusters. The algorithm then calculates the distance

between each cluster’s centroid and the speciﬁc data

point and assigns the point to the nearest centroid.

New centroids are computed based on the points be-

longing to each cluster, and this process is repeated

until the centroids stop changing signiﬁcantly.

We also used the elbow method to determine that

three clusters, as in Figure 1, are optimal for analyz-

ing the distribution of player types from each team

and their respective win percentages. This strategic

insight allows us to understand the balance between

icSPORTS 2024 - 12th International Conference on Sport Sciences Research and Technology Support

superstar players and talent spread across many po-

sitions. Before this analysis, all player features had

been categorized into bins, with bin 1 denoting the

lowest rating and bin 6 representing the highest. Sub-

sequently, we tabulated the number of players at each

level for every team and year, using these ﬁgures as

the basis for the clustering features.

After running the k-means algorithm on the

dataset, we observed the formation of three well-

deﬁned clusters, as illustrated in Figure 2. These clus-

ters exhibit noticeable similarities in their features,

providing us with crucial insights into the character-

istics required for constructing a successful roster.

4.2 Question Two: Timing of the Year

Decision trees are used to evaluate how speciﬁc fac-

tors impact an NBA player’s performance. This in-

volves categorizing input data into different classes

using a classiﬁcation method. Classiﬁcation decision

trees make decisions based on the features of the data

and create rules to assign each instance to a speciﬁc

class (Raj, 2023). The key factors being examined in-

clude parts of the season, the number of days between

games, and the player’s previous and current game

minutes. Other features, such as the winning team,

were included to provide additional insight. The de-

cision trees use plus-minus and offensive ratings as

target variables to determine if the speciﬁed features

inﬂuence the player’s performance. Since this is a

classiﬁcation problem, each feature was divided into

six equal-frequency bins to ensure the readability of

the decision tree outcomes. Before running the algo-

rithm, the dataset is split into a training set for build-

ing the tree and a testing set for evaluating perfor-

mance, with a random seeding and a 0.7 ratio. The

classiﬁcation decision tree is applied to different as-

pects of the data. First, the data is examined as a

whole, and then each of the six bins in win-share,

offensive win-share, and defensive win-share are ana-

lyzed to compare differences between different player

levels. This results in 38 decision trees: 19 targeting

plus-minus and 19 targeting offensive ratings.

The study aimed to analyze how a gap inﬂuences a

player’s performance across different points in a sea-

son and to determine if this inﬂuence varies based on

the player’s skill level. Players in consecutive win

share brackets were combined to form three distinct

categories to simplify the ﬁndings.

The analysis utilized the f-regression function (Pe-

dregosa et al., 2011), incorporating the target plus-

minus to reﬁne the assessment. This function eval-

uates the correlation between each regressor and the

target variable, converting these correlations into F-

scores. These scores measure the degree of linear de-

pendency between each regressor and the target, thus

aiding in identifying the most predictive features of

the outcome.

4.3 Question Three: Player’s

Performance

We conducted an in-depth analysis of various perfor-

mance metrics to investigate the connection between

a player’s height and their performance on the basket-

ball court. One of the metrics we found to be particu-

larly useful was offensive win-shares, which take into

account a variety of offensive statistics. As we sought

to develop a comprehensive performance statistic, we

initially considered multiple factors but eventually

honed in on the relationship between player height

and win-shares as the key components.

In narrowing our focus to player heights and of-

fensive win-shares, we employed the elbow method

as a crucial step in our analysis. This meticulous

approach allowed us to determine that the optimal

number of clusters was 3, laying the groundwork for

implementing a k-means clustering algorithm on the

data. This reafﬁrmed the precision and rigour of our

analysis.

Upon applying the k-means clustering algorithm,

we uncovered three distinct clusters that encapsulated

player heights ranging from 165cm to 231cm. By di-

viding win-shares into six bins, we could visually de-

pict player performance across different height cate-

gories described later

This led us to delve deeper into the relationship

between a player’s height and their ability to secure

rebounds. To analyze this, we sorted the players

into six performance tiers and then further catego-

rized them into three height groups. H1 represents the

tallest players, H2 consists of players of average NBA

height, and H3 encompasses the shortest players.

4.4 Question Four: Team’s Exceptional

Performance

During our analysis, we delved into the factors con-

tributing to exceptional athletic performances, focus-

ing on the remarkable success of the 2015-2016 War-

riors team. Notably, the Warriors concluded the

regular season with a historic 73–9 record, surpass-

ing the previous record of 72–10 established by the

1995–1996 Chicago Bulls. This achievement solidi-

ﬁed their position with the best regular-season record

in the history of the NBA.

To conduct a comprehensive analysis of the

Golden State Warriors’ performance during the 2015-

Analyzing Factors that Lead to NBA Regular Season Success

2016 NBA season, we implemented a thorough

methodology. Our approach commenced by meticu-

lously ﬁltering the dataset to exclusively focus on the

2015-2016 season. Subsequently, we meticulously

gathered and organized detailed statistical informa-

tion, including but not limited to points per game

and three-point shooting percentages, for each indi-

vidual player on the team. Given that each entry in

the dataset corresponded to a player’s statistical con-

tribution to a speciﬁc game on a speciﬁc date, we con-

scientiously compiled multiple entries for each player

to ensure an accurate representation of their perfor-

mance throughout the season.

We gathered detailed data for each player, includ-

ing their points per game and three-point percentages.

This allowed us to analyze their performance through-

out the season thoroughly. We expanded our analysis

to include the Cleveland Cavaliers, also known as the

Cavs, a professional basketball team based in Cleve-

land. Between 2015 and 2018, the Cavaliers faced the

Golden State Warriors in four consecutive NBA Fi-

nals, igniting a ﬁerce rivalry and creating one of the

most memorable matchups in modern NBA history.

For the Cavaliers, we collected and examined their

corresponding performance metrics. By using a sim-

ilar methodology, we calculated the Cavaliers’ mean

points per game and three-point percentages, enabling

a comprehensive comparison between the two teams’

performance. This comprehensive approach provided

valuable insights into the factors contributing to each

team’s success.

5 RESULTS AND DISCUSSION

In this section, we will thoroughly examine the results

for each research question.

5.1 Question 1: Optimal Strategy

In our upcoming discussion, we will thoroughly an-

alyze the three primary clusters identiﬁed during the

initial cluster analysis. Our focus will be primarily on

examining the mean values for each feature. We will

also provide some insights based on the analysis of

median values, albeit to a lesser extent.

In Cluster 1 in Figure 3, teams experienced the

least success, with an average win percentage of 1.9,

equivalent to roughly 27-55 in the regular season.

These teams had the lowest number of top-end play-

ers, averaging 1.02, and the highest number of low-

end players, averaging 5. Their struggle to win is

attributed to a need for more high-end talent and an

abundance of low-end players.

The most compelling analysis arises from the

comparison of clusters 2 and 3. Cluster 2 has a diverse

distribution of talent, with teams possessing between

2.06 and 3.50 players in bins 1-5 and 1.53 players in

bin 6. Their win percentage is approximately 0.379,

translating to a 31-51 record. This group displays

marginal improvement over Cluster 1. Conversely,

Cluster 3 showcases a top-heavy lineup, with teams

having 3.21 players in bin 6, 2.45 in bin 5, fewer than

two players in bins 2-4, and 2.71 in bin 1. Their reg-

ular season record is roughly 0.622, equivalent to 51-

31. Although teams in Cluster 3 have more players in

bins 5 and 6, they also have more in Bin 1 compared

to the Cluster 2 teams. An analysis of win percent-

age makes it clear that the more successful teams are

those in Cluster 3.

A particularly interesting statistic reveals that

when there are three players in bin 6 and 2 players

in bin 5, successful teams in cluster 3 can assemble a

lineup of 5 players who are well above average, ensur-

ing a cohesive team with no weak links on the ﬂoor.

On the other hand, teams in cluster 2 cannot achieve

this and are more likely to ﬁeld a lineup where at least

one of the ﬁve players on the ﬂoor is only slightly

above average or even below average. Additionally,

teams with lower win percentages need more super-

star players to rely on during crucial game moments.

At times, all a team needs is a brief period where

their best players completely take over a game, and

the more superstar players a team has, the more likely

they are to accomplish this.

Based on our research, it is recommended that

teams prioritize acquiring top-tier talent rather than

focusing on the depth of their roster or evenly dis-

tributing talent. This is particularly relevant in the

NBA, which is characterized as a superstar-driven

league. Our ﬁndings indicate that teams with a few

dominant superstars tend to achieve greater success

and have a higher regular-season win percentage. In

contrast, those lacking a superstar player often need

help to keep up with the competition.

5.2 Question 2: Timing of the Year

In this study, we analyzed three statistics: Defensive

Rebounds (DREB), Rebounds (REB), and Offensive

Rebounds (OREB). We categorized their values into 6

bins, as shown in Figure 4. In our analysis, regardless

of the win-share category, each bin displayed a con-

sistent pattern in the decision trees we later discuss,

with offensive rating as the target classiﬁer.

We found that players who were on the court for

bin 1 or bin 2 minutes (equivalent to 20 minutes or

less, considered a low amount of time) tended to have

icSPORTS 2024 - 12th International Conference on Sport Sciences Research and Technology Support

Figure 3: Number of players of each type and win percentage for teams in each cluster.

Figure 4: Defensive Rebounds, Rebounds, and Offensive rebounds used with Win share bins.

a bin 1 or bin 2 offensive rating, ranging from 0 to

70. This outcome was expected, as offensive rating

is heavily inﬂuenced by the points scored by an NBA

player. Limited time on the court leads to fewer op-

portunities to contribute offensively.

When we change the target classiﬁer to plus-

minus, we observe different outcomes. The results

are affected by whether the team wins or loses. Gen-

erally, when a player’s team loses, their plus-minus

is between bins 1 to 3, regardless of their win-share

variation. On the other hand, if the team wins, the

plus-minus is either bin 5 or bin 6, indicating a high

overall performance if the player played 20 minutes

or more. These patterns align with our expectations,

Analyzing Factors that Lead to NBA Regular Season Success

suggesting that a player’s overall performance is posi-

tively inﬂuenced by the team’s victory and signiﬁcant

playing time, showcasing their consistent contribution

to the team’s success.

These trends are generally expected. When we

look at plus-minus, we focus on players who fall

into the fourth category for each win-share category.

These players are average or above average and have

signiﬁcantly contributed to their team’s success dur-

ing the season. They can be described as ”role play-

ers” or good players but are not considered ”all-star”

caliber players. These players either support high-

level talent in a roster by assisting them when they

are on the court together or by holding down the team

when their all-star players are not on the court.

When analyzing their decision trees, the number

of minutes played in the previous game becomes an

important feature in the player’s plus-minus. If the

team lost and the player’s previous game minutes

were over 25, the player’s current game performance

had a plus-minus in the ﬁrst category, which is ex-

tremely low. However, if their minutes were less than

25, their current game performance had a plus-minus

in the second category, which is still low but not as

bad. The difference in plus-minus is small; however,

we can see that the excessive use of role players in

the previous game not only negatively affects their

overall performance in the current game but may also

cause their team’s loss. This suggests that role players

should be used cautiously if their previous game was

strenuous.

In the early part of the season, players in groups

1 and 2 are most affected by the gap between games,

while the impact on the rest of the players is mini-

mal. However, as we move into the later part of the

season, players in groups 1 and 2 still experience an

impact on their +/-. This impact is reduced compared

to the early part of the season. Once again, players

in the middle of the pack do not experience a major

impact, while top players see a signiﬁcant impact on

their performance due to the gaps between games.

The results are shown in Figure, 5, it is worth not-

ing that the accuracy of these decision trees ranges

from 0.3 to 0.5, representing extremely low accuracy.

This could be attributed to the extensive use of equal-

frequency binning. Increasing the number of bins

from 6 might lead to more accurate results.

As the season begins, players in the lower skill

level groups, speciﬁcally bins 1 and 2, are most af-

fected by the extended breaks between games. This

can have a signiﬁcant impact on their performance

and readiness. However, as the season advances, the

inﬂuence on players in these skill-level groups grad-

ually decreases. Meanwhile, players in the interme-

diate skill level range continue to encounter mini-

mal impact from the gaps between games, while the

top-tier players are notably affected by the extended

breaks, potentially impacting their momentum and

form.

Upon analyzing the impact of the timing of the

season and the duration between games on player per-

formance, the ﬁndings are as follows: Figure 6. In the

early stages of the season, underperforming players

are notably more affected by longer breaks between

games. As the season progresses, top-performing

players are increasingly impacted by the duration of

breaks between games, while mid-range players tend

to maintain consistent performance regardless of the

gap. From a strategic perspective, top players should

prioritize rest at the season’s commencement and re-

duce rest as the season advances. Moreover, due to

their versatility, role players can be utilized more fre-

quently. Lastly, players ranked at the bottom in terms

of performance would beneﬁt from consistent game

time, particularly at the season’s onset.

5.3 Question 3: Player’s Performance

After analyzing the k-means clustering plot, we dis-

covered some fascinating results. Previous studies,

such as the one by Berri et al. (Berri et al., 2011),

have indicated that height plays a signiﬁcant role in

the selection of amateur players in drafts. However,

our plot unveiled an intriguing pattern. The players

were effectively categorized into three clusters based

on height, with distinct groups for shorter, medium-

height, and taller players.

The analysis of the players’ heights in relation to

their performance yielded intriguing ﬁndings. Upon

closer examination, it was noted that the shortest play-

ers tended to exhibit lower offensive win-shares, sug-

gesting a diminished level of performance. How-

ever, this trend was not conﬁned to shorter players,

as taller players also displayed a similar pattern. Inter-

estingly, a performance peak was identiﬁed within the

medium-height range, followed by a decline among

the taller players. These observations point to the pos-

sibility that optimal performance may not necessarily

correlate with extreme heights, but rather lie some-

where within the middle range of heights.

Upon analyzing the height-rebounds graph, a clear

pattern emerges indicating that players with higher

overall performance also excel in rebounds, in line

with expectations. Notably, players in the average

height category (H2) exhibit the most impressive re-

bound performance within each group. Consistently,

a trend is evident wherein shorter players tend to un-

derperform compared to their average or tall counter-

icSPORTS 2024 - 12th International Conference on Sport Sciences Research and Technology Support

Figure 5: Decision tree with plus-minus as target classiﬁers in bin 4 of the win-share categories.

parts across offensive and defensive rebounds within

the same or lower performance groups. In summary, it

can be inferred that an average-height player is well-

suited for positions such as Center or Power Forward,

although locating such players may not always be fea-

sible. In such instances, sacriﬁcing some performance

in favour of added height can lead to favourable out-

comes. However, it is imperative to stress that the pri-

mary focus should not solely revolve around acquir-

ing the tallest players; prioritizing the best performers

remains crucial. In scenarios where a choice must be

made between a shorter top performer and a slightly

less skilled player of average or above-average height,

the latter should take precedence.

5.4 Question 4: Team’s Exceptional

Performance

In the 2015-2016 season, the Golden State War-

riors ﬁnished ﬁrst in the Western Conference with an

unprecedented 73-9 record, surpassing the previous

record set by the 72-10 Chicago Bulls led by Michael

Jordan. The Cleveland Cavaliers ﬁnished ﬁrst in their

Eastern Conference with a 57-25 record. In our anal-

ysis of the most correlated factors contributing to ex-

ceptional achievements, we found that two statistics

played signiﬁcant roles. These factors are the abil-

ity to score high points per game and the ability to

score many 3-pointers. These two aspects emerged

as strong indicators of exceptional performance and

were closely linked to achieving outstanding results.

When examining the statistical data of the Cleve-

land Cavaliers and the Golden State Warriors, it is ev-

ident from Figure 7 that the points per game graph

illustrates the signiﬁcant scoring advantage of the

Golden State stars, Stephen Curry and Klay Thomp-

son, over the Cavaliers’ stars, LeBron James and

Kyrie Irving. While the remaining players on both

teams make valuable contributions to their respective

performances, it is noteworthy that Stephen Curry’s

exceptional average of 30.1 points per game stands

out as a rare achievement in the NBA, playing a piv-

otal role in the Warriors’ historic season. Our analy-

sis indicates a strong correlation between the presence

of high-scoring players and the attainment of this re-

markable achievement.

The second highly inﬂuential factor contributing

to exceptional achievements is the 3-point score per-

centage. In Figure 7.B, a signiﬁcant disparity between

the two teams is evident, emphasizing one of the pri-

mary reasons for Golden State’s formidable perfor-

mance. Steph Curry and Klay Thompson’s shooting

percentages far surpass those of the Cavaliers’ start-

ing lineup (excluding centers due to limited data near

the basket). Furthermore, Harrison Barnes and Dray-

mond Green demonstrate superior 3-point shooting

percentages compared to Lebron James, Kyrie Irv-

ing, and Kevin Love. It’s noteworthy that although

the Cavaliers’ 3-point percentages were considered

good, the Warriors’ dominance completely overshad-

owed them.

6 LIMITATIONS AND FUTURE

WORK

As we move forward, we will focus on a comprehen-

sive analysis of NBA postseason games. It’s impor-

tant to note that our dataset is primarily focused on

the 82-game regular season, which means our insights

Analyzing Factors that Lead to NBA Regular Season Success

Figure 6: Fregression function visualization with speciﬁc bins correlated together.

into team performance during the playoffs will be

somewhat limited. However, the best-of-seven play-

off series format provides a unique opportunity for in-

depth analysis, offering a more streamlined approach

compared to the extensive data entries from the reg-

ular season. Moreover, we can delve into a thorough

examination of individual player performances, aim-

ing to gain insights into their inﬂuence on their re-

spective teams, especially those who made signiﬁcant

contributions to their teams’ advancement in the play-

offs or those who were eliminated early.

7 CONCLUSION

Our analysis of the factors inﬂuencing NBA regular

season performance shows that a team’s roster com-

position signiﬁcantly affects its success. We’ve iden-

icSPORTS 2024 - 12th International Conference on Sport Sciences Research and Technology Support

Figure 7: A. Points per game and B. three-point percentage for Cleveland. Cavaliers (brown) and the Golden State Warriors

(blue).

tiﬁed three clusters based on win percentage and the

number of players at different skill levels. The most

successful teams tend to have a higher number of top-

end players and a signiﬁcant number of players in

the lowest skill level. On the other hand, teams that

evenly distribute their talent across the roster tend to

be less successful. We’ve also discovered that play-

ers of average to above-average skill levels are most

affected by excessive playing time in the previous

game. If these players have logged signiﬁcant min-

utes in the previous game, they are more likely to

show a decline in performance and potentially lead

the team to a loss in the next game. Taking into ac-

count the time of year and the gap between games,

we recommend giving priority to the rest and recov-

ery of the top players, particularly in the latter half of

the season. It is crucial to ensure that below-average

players maintain consistent performance, especially

during the ﬁrst half of the season.

When we consider the impact of height on player

performance, we ﬁnd that the shortest and tallest play-

ers tend to underperform compared to those closer to

average height. The majority of top performers in the

NBA have an average height compared to other play-

ers. However, for players who operate near the net

and encounter many rebound opportunities, an aver-

age or taller player is preferable to a below-average

height player for slightly better overall performance.

Analyzing Factors that Lead to NBA Regular Season Success

Coaches and management could use this information

to construct and deploy teams more effectively, lead-

ing to an increased win percentage in regular sea-

son games. Additionally, coaches could analyze suc-

cessful seasons, such as the Golden State Warriors

in 2015-2016, to identify important factors leading to

these achievements, such as having players who can

effectively score three-pointers.

REFERENCES

Basketball Reference (2023a). Glossary. https://www.

basketball-reference.com/about/glossary.html. Ac-

cessed: 10-15-2023.

Basketball Reference (2023b). Nba player stats:

Advanced. https://www.basketball-reference.com/

leagues/NBA 2022 advanced.html. Accessed: 12-10-

2023.

Berri, D. J., Brook, S. L., and Fenn, A. J. (2011). From

college to the pros: Predicting the nba amateur player

draft. Journal of Productivity Analysis, 35:25–35.

Bishop, E. (2023). 5 ways nba teams use an-

alytics to gain a competitive edge. https:

//www.sportskeeda.com/basketball/5-ways-nba-

teams-use-analytics-gain-competitive-edge. Ac-

cessed: 12-10-2023.

Burns, M. (2023). Nba team market size rank-

ings. https://hoop-social.com/nba-team-market-size-

rankings/. Accessed: 12-10-2023.

Drakos, M. C., Domb, B., Starkey, C., Callahan, L., and

Allen, A. A. (2010). Injury in the national basket-

ball association: a 17-year overview. Sports health,

2(4):284–290.

ESPN (2023). Nba player stats. https://www.espn.com/nba/

stats/player/\ /season/2020/seasontype/2. Accessed:

12-10-2023.

Fearnhead, P. and Taylor, B. M. (2011). On estimating the

ability of nba players. Journal of Quantitative analysis

in sports, 7(3).

Franks, A., Miller, A., Bornn, L., and Goldsberry, K.

(2015). Counterpoints: Advanced defensive metrics

for nba basketball. In MIT Sloan Sports Analytics

Conference, Boston, MA. Presented at the MIT Sloan

Sports Analytics Conference.

Fromal, A. (2023). Understanding the nba: Ex-

plaining advanced offensive stats and metrics.

https://bleacherreport.com/articles/1039116-

understanding-the-nba-. Accessed: 12-14-2023.

Gonzalez, A. M., Hoffman, J. R., Rogowski, J. P., Burgos,

W., Manalo, E., Weise, K., Fragala, M. S., and Stout,

J. R. (2013). Performance changes in nba basketball

players vary in starters vs. nonstarters over a compet-

itive season. The Journal of Strength & Conditioning

Research, 27(3):611–615.

Kaggle (2022). Nba games data. https:

//www.kaggle.com/datasets/nathanlauga/nba-

games?select=games details.csv. Accessed: 12-

10-2023.

McIntyre, A., Brooks, J., Guttag, J., and Wiens, J. (2016).

Recognizing and analyzing ball screen defense in the

nba. In Proceedings of the MIT sloan sports analytics

conference, Boston, MA, USA, pages 11–12.

NBA (2023). About the nba. https://www.nba.com/news/

about. Accessed: 12-10-2023.

NBA Stuffer (2023a). Analytics movement in the

nba. https://www.nbastuffer.com/analytics101/nba-

analytics-movement/. Accessed: 12-10-2023.

NBA Stuffer (2023b). Nba plus-minus and impact metrics

in basketball explained. https://www.nbastuffer.com/

analytics101/plus-minus/. Accessed: 12-14-2023.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,

Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,

Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,

Cournapeau, D., Brucher, M., Perrot, M., and Duch-

esnay,

E. (2011). Scikit-learn: Machine learning in

python.

Raj, A. (2023). An exhaustive guide to decision tree clas-

siﬁcation in python 3.x. https://towardsdatascience.

com/an-exhaustive-guide-to-\newlineclassiﬁcation-

using-decision-trees-8d472e77223f. Accessed:

12-14-2023.

Sporting Charts (2023). What is offensive win

shares. https://www.sportingcharts.com/dictionary/

nba/offensive-win-shares-ows.aspx. Accessed: 12-

14-2023.

Sports Lingo (2023). Defensive win shares

(dws). https://www.sportslingo.com/sports-

glossary/d/defensive-win-shares-dws/. Accessed:

12-14-2023.

icSPORTS 2024 - 12th International Conference on Sport Sciences Research and Technology Support