A Discretized Enriched Technique to Enhance Machine Learning Performance in Credit Scoring

Roberto Saia, Salvatore Carta, Diego Recupero, Gianni Fenu, Marco Saia


The automated credit scoring tools play a crucial role in many financial environments, since they are able to perform a real-time evaluation of a user (e.g., a loan applicant) on the basis of several solvency criteria, without the aid of human operators. Such an automation allows who work and offer services in the financial area to take quick decisions with regard to different services, first and foremost those concerning the consumer credit, whose requests have exponentially increased over the last years. In order to face some well-known problems related to the state-of-the-art credit scoring approaches, this paper formalizes a novel data model that we called Discretized Enriched Data (DED), which operates by transforming the original feature space in order to improve the performance of the credit scoring machine learning algorithms. The idea behind the proposed DED model revolves around two processes, the first one aimed to reduce the number of feature patterns through a data discretization process, and the second one aimed to enrich the discretized data by adding several meta-features. The data discretization faces the problem of heterogeneity, which characterizes such a domain, whereas the data enrichment works on the related loss of information by adding meta-features that improve the data characterization. Our model has been evaluated in the context of real-world datasets with different sizes and levels of data unbalance, which are considered a benchmark in credit scoring literature. The obtained results indicate that it is able to improve the performance of one of the most performing machine learning algorithm largely used in this field, opening up new perspectives for the definition of more effective credit scoring solutions.


Paper Citation