Authors:
Dante Costa
1
;
Joseana Fechine
1
;
José Brito
2
;
João Ferro
2
;
Evandro Costa
2
and
Roberta Lopes
2
Affiliations:
1
Department of Systems and Computing, Federal University of Campina Grande, Campina Grande, Brazil
;
2
Computing Institute, Federal University of Alagoas, Maceio, Brazil
Keyword(s):
Supervised Machine Learning, Feature Selection, Genetic Algorithms, Knowledge Discovery in Databases, White-Box Prediction, Data Mining.
Abstract:
Predictive models in machine learning and knowledge discovery in databases have been used in various application domains, including sports and basketball, in the context of the National Basketball Association (NBA), where one can find relevant predictive issues. In this paper, we apply supervised machine learning to examine historical and statistical data and features from players in the NCAA basketball league, addressing the prediction problem of automatically identifying NCAA basketball players with an excellent chance of reaching the NBA and becoming successful. This problem is not easy to resolve; among other difficulties, many factors and high uncertainty can influence basketball players’ success in the mentioned context. One of our main motivations for addressing this predicting problem is to provide decision-makers with relevant information, helping them to improve their hiring judgment. To this end, we aim to have the advantage of producing an interpretable prediction model r
epresentation and satisfactory accuracy levels, therefore, considering a trade-off between Interpretability and Predictive Accuracy, we have invested in white-box classification methods, such as induction of decision trees, as well as logistic regression. However, as a baseline, we have considered a relevant method as a reference for the black-box model. Furthermore, in our approach, we explored these methods combined with genetic algorithms to improve their predictive accuracy and promote feature reduction. The results have been thoroughly compared, and models exhibiting superior performance have been emphasized, revealing predictive accuracy differences between the best white box and black box models were very small. The pairing of the genetic algorithm and logistic regression was particularly noteworthy, outperforming other models’ predictive accuracy and significant feature reduction, assisting the interpretability of the results. Furthermore, the analysis also highlighted which features were most important in the model.
(More)