Sports Analytics: Maximizing Precision in Predicting MLB Base Hits

Pedro Alceo, Roberto Henriques


As the world of sports expands to never seen levels, so does the necessity for tools which provided material advantages for organizations and other stakeholders. The main objective of this paper is to build a predictive model capable of predicting what are the odds of a baseball player getting a base hit on a given day, with the intention of both winning the game Beat the Streak and to provide valuable information for the coaching staff. Using baseball statistics, weather forecasts and ballpark characteristics several models were built with the CRISP-DM architecture. The main constraints considered when building the models were balancing, outliers, dimensionality reduction, variable selection and the type of algorithm – Logistic Regression, Multi-layer Perceptron, Random Forest and Stochastic Gradient Descent. The results obtained were positive, in which the best model was a Multi-layer Perceptron with an 85% correct pick ratio.


Paper Citation