subsequently fan attendance dwindle. Therefore from
a revenue point of view, it is in the interest of the
owner and/or manager/coach of a team to improve its
winning record in order to improve fan loyalty or at
least maintain a high level of loyalty.
The objective of this paper is to test if the fan
loyalty of a NFL team is largely determined by its
success in the field. Fan loyalty is being defined here
as the percentage of stadium capacity filled in the
home attendance for each team from 2005-2013. This
definition is most effective for the purpose of this
paper as the attendance data is the most
comprehensive and the easiest to interpret.
Percentage of stadium capacity is being used because
different teams have different sized markets and in
turn different sized stadiums, so it is not in the interest
of this experiment to give a significant advantage to
teams with larger stadiums by simply using the raw
attendance numbers.
The computer science technique of data mining
and predictive modeling and is being used for the
analysis. The data mining process involves
discovering interesting and useful patterns and
relationships in large volumes of data (Marchi, 2010).
By applying predictive modeling techniques to
sports, this research contributes to a better
understanding of underlying factors that govern
human behavior associated with sports followings.
The programming language JAVA is used to create
and run the regression models for the data.
This genre of scientific research falls under the
relatively new area of “Sports Data Mining”. This
area has experienced rapid growth in recent years
(Baker and McHale, 2013; Hamadani, 2006; Stekler,
2007). Sports organizations are keen to find more
practical methods to extract valuable knowledge
using data mining techniques (Lewis, 2003; Silver,
2012). By finding the right ways to make sense of
data and turning it into actionable knowledge, sports
organizations have the potential to secure a
competitive advantage over their peers. Professional
sports organizations are multi-million dollar
enterprises with millions of dollars spent on a single
decision. With this amount of capital at stake, just one
bad or misguided decision has the potential of setting
an organization back by several years. With such a
huge risk at stake and a critical need to make good
decisions, the sports industry is an attractive
environment for applications of data mining (Boulier
and Stekler, 2003; Sinha and others, 2013;
Schumaker and others, 2010).
2 HYPOTHESIS
IF a team has a greater winning record, THEN the
team will have stronger or greater fan loyalty (in this
case, percentage of stadium capacity filled during
home games) BECAUSE the team will be more
enjoyable to watch and will attract greater attention in
its local community, leading to new fans joining the
fan-base, and former fans coming back as well. More
people will tune in to the team on television, more
merchandise will be sold, and teams will have an
increased following on social media, in addition to
more people coming to the games.
3 DATA AND METHODOLOGY
Data for this research comes mainly from websites
such as NFL.com and ESPN.com. JAVA
programming is used for analysis. Eclipse (an
integrated development environment for
programming Java) is used to code the program used
for outputting the results of the experiment. In
correspondence with Eclipse, certain Java libraries
found online were used to help with the coding. These
included Apache POI (used to read data from the
excel file), Apache Commons Mathematics (used to
help create the linear/simple regression), and
Princeton’s Algorithms and Clients (used to help
create the quadratic regression).
Here are the specific steps:
1. The JAVA development kit was downloaded
from Oracle’s website (http://www.oracle.com/) and
then installed
2. The IDE (integrated development
environment) Eclipse was downloaded and installed,
which was used for the actual coding of the regression
models
3. The data was read from the Excel file using the
Apache POI library and was subsequently stored in a
two-dimensional matrix
4. The 2-dimensional matrix was then inputted
into a Simple Regression call, using the Simple
Regression class in Apache Commons Mathematics
library (commons.apache.org/math/)
5. The results of the linear (or simple) regression
model were then outputted.
6. The matrix was split in to 2 separate arrays of
data, one for the x-values (Wins in a season), and one
for the y-values (average percentage of home
attendance in terms of stadium capacity)
7. This was inputted into a Polynomial
Regression call, using the Polynomial Regression