Stacking Ensemble LearninG Approach for Credit Rating of Bank

Customers

Qinyu Guo

College of Computer and Information Science & College of Software, Southwest University, Chongqing, China

Keywords: Bank Credit Scores, Machine Learning, Stacking Model.

Abstract: The banking industry has experienced tremendous growth and change in recent years, creating new challenges

and opportunities for credit assessment and management. In this context, accurately and efficiently assessing

customer credit risks has become the key to the success of the banking business. A financial risk approval

model based on stacking technology is proposed in response to this demand. The model starts by selecting a

data set containing multiple bank user features. After a series of steps, such as data preprocessing, feature

selection, preliminary model training, and model optimization, it finally forms a credit assessment model with

high prediction accuracy. During the model training process, various machine learning algorithms were used

for comparison, including neural networks, random forests, decision trees, naive Bayes, etc., and the

algorithms were improved through stacking technology to achieve higher accuracy and Area Under Curve

(AUC). In addition, based on the stacked model's prediction results, each customer's credit score is also

calculated, and the distribution of customers with different credit score segments is displayed through

visualization technology. This provides financial institutions with detailed information about their customers'

credit risks, helping them formulate more reasonable lending policies and interest rates. Experimental results

show that compared with other models, the proposed superposition-based risk approval model improves the

joint loan approval rate by about 6% on the actual data set, proving its effectiveness and feasibility in financial

risk assessment.

1 INTRODUCTION

Assessing the risks of lending money to a person or a

business is the credit rating process. One of the main

methods financial institutions use to evaluate their

operational risks is credit rating, which tries to detect

applicants with poor credit who may have a high

likelihood of defaulting (Jiang and Packer 2019).

With the increasing market uncertainty, investors

face more significant risks when investing large

amounts. It has become an urgent issue to accurately

assess and predict the risks and returns of credit

products. Financial institutions have vast data about

borrowers, including their historical borrowing and

repayment records, economic status, social media

activity, and consumption behavior. However, with

the rapid growth of credit-related financial product

markets, the challenge is accurately extracting

valuable information from this complex and diverse

data (Musdholifah et al 2020). Certain data features'

correlation with credit assessments may need to be

more stable and can even change over time.

In consideration of this, banks play an essential

role in determining credit risk. Before approving a

loan, evaluating a borrower's credit history is

necessary to identify potential high and low risks

(Kadam et al 2021). Then, machine learning (ML)

algorithms, which enable systems to decipher patterns

and make data-driven predictions autonomously have

emerged as a promising solution for assessing the

likelihood of loan defaults (Musdholifah et al 2020).

However, to further accurately predict bank user loan

risks, the model needs further enhanced (Beutel et al

2019).

Therefore, the Ensemble learning method is

adopted, mainly including Bagging (Bootstrap

Aggregating), creating multiple sub-datasets through

random sampling, and training each subset

independently (Uddin et al). The final result is based

on each learner's average or majority vote (Erdal and

Karahanoğlu 2016). Boosting: by introducing the

learner step by step and adjusting based on the error of

the previous step, the goal is to reduce the overall error

and enhance the model's performance (Carmona et al

2019).

Ensemble learning methods offer a superior

alternative to traditional machine learning techniques.

They frequently achieve improved accuracy and lower

274

Guo, Q.

Stacking Ensemble Learning Approach for Credit Rating of Bank Customers.

DOI: 10.5220/0012801200003885

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Data Analysis and Machine Learning (DAML 2023), pages 274-278

ISBN: 978-989-758-705-4

the danger of overfitting by integrating the predictions

of various models. Their intrinsic versatility enables

the incorporation of multiple methods, improving

generalization to unseen data and performance in

high-dimensional settings (Uddin et al). Subsequently,

the efficacy of these models is rigorously assessed

using metrics such as accuracy and AUC, ensuring a

comprehensive evaluation of their predictive

capabilities (Wang et al 2023).

As a result, this research aims to combine the

ensemble learning method and big data analysis to

develop a new, accurate, and reliable credit

assessment model. This model will help financial

institutions and investors better understand

investment opportunities and achieve a balance

between risk and returns in a complex market

environment.

2 RELATED WORK

Combining various model predictions, traditional

Ensemble learning offers a distinct advantage over

conventional techniques by enhancing accuracy and

preventing overfitting. Methods like Voting, Bagging,

and Boosting are pivotal in this domain.

Erdal and Karahanoğlu explored the determinants

of profits for Development and Investment Banks in

Turkey using bagging ensemble models (Erdal and

Karahanoğlu 2016). Leveraging three tree-based

machine learning models as base learners, their study

revealed that ensemble models, specifically Bag-

DStump, Bag-RTree, and Bag-REPTree,

outperformed individual models in predicting the

profitability determinants of Turkish banks.

Uddin et al. introduce a machine learning based

loan prediction system better to identify qualified bank

loan applicants (Uddin et al). Nine machine learning

algorithms and three deep learning models were used,

achieving enhanced performance with ensemble

voting model techniques.

Carmona, Climent, and Momparler employ

extreme gradient boosting techniques to predict bank

failures in the U.S. banking industry systematically. It

effectively identifies critical indicators related to bank

defaults, which is crucial in determining a bank's

vulnerability (Carmona et al 2019).

This study aims to exploit the complex hybrid

capabilities of Stacking. Unlike other methods, this

method utilizes multiple models to generate meta-

features fed into another model for final prediction.

This approach not only consolidates the strengths of

each model but also compensates for their respective

weaknesses, paving the way for more accurate and

detailed credit assessment models.

Therefore, the primary data sources were obtained

from the UCI Machine Learning Repository, which

describes the dataset's attributes (South German Credit

2020). It highlights the effectiveness of ensemble

learning, focusing on Stacking methods. A robust and

comprehensive predictive model has been

implemented using Random Forest and Gradient

Boosting as the base models and Logistic Regression

as the meta-model. A distinctive feature of the

approach is to convert model-predicted probabilities

into standardized credit scores, providing institutions

with intuitive indicators. By combining meticulous

data preprocessing, feature selection, and innovative

applications of stacking and mapping techniques, this

work offers a unique solution to the credit assessment

field.

3 METHODOLOGY

This section delves into the study of bank credit

prediction models and evaluates them. Then, the

predictive model method was optimized and

improved through ensemble learning. It provides a

comprehensive breakdown of evaluation indicators,

comparative analysis, and final credit scores.

3.1 Data Preprocessing

The data is looked up to identify potential missing

values in the variables in the dataset. Fortunately, the

data set is complete without missing values, so no

additional data imputation step is required.

To ensure that the model remains unbiased, the Z-

score method can be implemented to detect and

address any outliers within the data. This method

calculates the difference between an observation and

the mean as several standard deviations. By setting a

threshold of 3 standard deviations, any data points

with a Z-score value more significant than three can

be identified as outliers, which can then be safely

removed.

3.2 Exploratory Data Analysis

Thermal correlation plots can be used to identify

relationships between features in data. Analyzing

these plots makes it possible to discover correlations

between different variables, making them a valuable

tool for data analysis. Through Fig. 1, it is confirmed

that the relationship between these features is weak, so

Stacking Ensemble Learning Approach for Credit Rating of Bank Customers

275

multicollinearity problems will not occur in

subsequent model training.

3.3 Feature Selection

To ensure that all features are on the same scale, the

data was standardized using StandardScaler. This

ensures the model has even weights across parts and

helps speed up the training process.

Feature selection was then performed, and the

importance of all features was evaluated using a

random forest classifier. Based on the feature

importance results, only the top 10 most important

components are retained to reduce the complexity of

the model and avoid overfitting. These ten features

(Fig. 2) are believed to predict a customer's credit risk

better. The filtered critical feature data is saved to a

new file for subsequent use.

3.4 PreliminarY Model Training

The research embraced four distinct supervised

learning algorithms to devise an accurate credit

scoring model to create a precise credit scoring model

(Table 1). These encompassed Neural Networks,

known for their prowess in recognizing intricate

patterns; Random Forests, revered for their ensemble-

based approach; Decision Trees, celebrated for their

transparent decision-making structure; and Naive

Bayes, distinguished for its probabilistic foundations.

Each of these algorithms underwent meticulous

parameter tuning to hone their performance. Their

effectiveness was evaluated using two pivotal metrics:

the Area Under the Curve (AUC) and accuracy. While

the AUC offered insights into the model's overall

performance across varied classification thresholds,

accuracy provided a snapshot of the model's success

rate in making correct predictions.

3.5 Model Improvement: Stacking

Model

Stacking technology is employed to improve the

model's predictive accuracy, a form of ensemble

learning combining multiple machine learning

algorithms to achieve better predictive performance.

This study used random forest and gradient boosting

as basic models. These base models are independently

trained on the data and make individual predictions.

However, no simple majority vote is taken, or these

predictions are averaged.

Instead, the predictions from the random forest

and gradient boosting models are used as new “meta-

features” as input to subsequent logistic regression

meta-models. Essentially, this meta-model is trained

to make final predictions based on the predictions of

the base model. This hierarchical arrangement of

models effectively captures the respective strengths

of the base models while compensating for their

weaknesses.

3.6 Credit Score Calculation

The conversion of model-predicted probabilities into

actual credit scores is achieved by applying the

following mapping method. This method utilizes the

prediction results obtained from the aforementioned

stacked model for each individual customer.

min( )

() ()

max( ) min( )

CABSP SP

−

=−× =

−

，

(1)

A represents a fixed constant set at 300, which

serves as the fundamental baseline for the credit score.

B, on the other hand, is a constant with a value of 500,

usually adjusted by logarithmic properties to

accommodate diverse data distributions and specific

requirements. P signifies the probability of high credit

risk computed through the stacking process. This

formula facilitates the mapping of potential outcomes

to a credit score range spanning from 300 to 800. This,

in turn, furnishes financial institutions with a more

intricate and nuanced customer credit rating

assessment.

3.7 Visualization of Results

A bar chart serves as a valuable visual tool, employed

to effectively illustrate the distribution of customers

across a spectrum of diverse credit scores. This

graphical representation imparts a more

comprehensive understanding of the model's output,

shedding light on the precise count of customers

within each distinct scoring segment and, in particular,

how it delineates between the issuance of good and

bad credit within each of these individual segments.

4 EPECTIONTAL RESULT

Within this section, the model's performance is

subjected to a comprehensive comparative

evaluation, utilizing the dataset. Furthermore, the

conclusive bank credit scoring results are delineated

and presented.

4.1 Dataset

The dataset chosen for this research study offers a

comprehensive insight into the credit risk associated

DAML 2023 - International Conference on Data Analysis and Machine Learning

276

with bank customers. It comprises a total of 1,000

entries, consisting of 700 instances characterized by

good credit and 300 instances associated with bad

credit. This dataset encompasses 20 predictor

variables that encompass a wide range of financial,

professional, and personal background information

about the customers under analysis.

4.2 Result

As depicted in Fig. 1, the correlation diagram reveals

that the majority of variables exhibit weak

correlations, as indicated by the coloration being close

to neutral. This suggests that these variables maintain

a relatively high degree of independence from each

other. This independence is advantageous for model

accuracy, as highly correlated variables can give rise

to multicollinearity problems. Notably, the diagram

does not display any dark red or dark blue blocks,

signifying the absence of strong correlations between

variables.

Figure 1: Correlation Heatmap (Picture credit: Original).

Figure 2: Top 10 Important Features (Picture credit:

Original).

As shown in Fig. 2, outliers are successfully

handled, and the data is normalized through data

preprocessing. This ensures that the model has even

weights on all features. Furthermore, during the

feature selection process, we used the feature

importance of the random forest classifier to retain

only the most essential ten elements, reducing the

model's complexity and effectively avoiding

overfitting.

Model training: Training of the preliminary model

provided good results in terms of AUC and accuracy.

However, employ stacking techniques to improve the

accuracy of predictions further. They combine random

forest and gradient boosting as the base model and

adopt logistic regression as the meta-model to enhance

overall performance (Table 1).

Table 1: Performance of the various algorithms.

Algorithm Accuracy AUC

Model Stacking 93.71% 88.33%

Decision Tree 73.45% 70.03%

Random Forest 77.56% 75.45%

Logistic Regression 79.43% 78.74%

Neural Networks 73.45% 73.23%

Figure 3: Distribution of CreditScore (Good xs. Bad)

(Picture credit: Original).

Fig. 3 illustrates the distribution of customer credit

scores, indicating that most customers have good

credit status. However, there is a higher proportion of

bad credit in the low segment of 300-500, and these

customers tend to have higher credit risks. On the

other hand, when the credit score exceeds 600,

customers with good credit dominate, indicating lower

credit risks. In the middle area between 500 and 600,

the distribution of good and bad credit is relatively

balanced, meaning moderate credit risk. This

information is valuable for financial institutions to

develop appropriate loan policies and interest rates for

customers with different credit scores.

In summary, this analysis has revealed weak

correlations among dataset variables, contributing to

enhanced model accuracy. Effective data

preprocessing and feature selection have been

Stacking Ensemble Learning Approach for Credit Rating of Bank Customers

277

employed, and the adoption of model stacking has

notably improved prediction accuracy. The

distribution of customer credit scores indicates that

most have good credit, with higher credit risk in the

lower score range. These insights are invaluable for

financial institutions when tailoring their policies and

interest rates to accommodate customers with

different credit profiles, thereby enhancing risk

management.

5 CONCLUSION

This research endeavor embarked on a meticulous

journey to construct a robust model for the evaluation

of credit risk among bank customers. This involved a

comprehensive fusion of data preprocessing

techniques, intricate feature selection, diverse model

training strategies, and the application of advanced

stacking methodologies. It is noteworthy that the

resultant model demonstrated not only a

commendable AUC but also achieved impressive

accuracy levels. Furthermore, the model possesses the

unique capability to transform predicted probabilities

into concrete credit scores, endowing financial

institutions with vital decision-making insights of

paramount significance. This symbiotic fusion of

technical prowess and financial acumen forms the

bedrock of this research's contributions. However, it's

imperative to underscore that the enlightening power

of data visualization played a pivotal role in this

research, as evidenced in the intricacies of Fig. 1 and

Fig. 2. These figures provided an in-depth perspective

into the intricate web of inter-feature relationships and

their relative significance. Likewise, the revelations

encapsulated within Fig. 3 elegantly portrayed the

subtleties of customer distribution across a spectrum

of credit score brackets. These insights are not just

enlightening; they are transformative for financial

institutions. They furnish these entities with the ability

to craft judicious loan policies and finely-tuned

interest rate structures, thus optimizing risk

management strategies. In essence, this study

bequeaths a potent tool to banks and fiscal institutions,

endowing them with the capacity to assess credit risks

with unparalleled precision. Nonetheless, as the

inexorable march of technology continues and data

repositories burgeon, the immense potential persists

for further refining this model. Future endeavors could

delve into innovative feature engineering paradigms

and leverage avant-garde modeling techniques. These

forward-looking efforts would ensure that the

prediction framework remains perched at the zenith of

accuracy, seamlessly catering to the evolving

demands of the dynamic financial sector. The horizon

for advancement is boundless, and this research marks

but a foundational step toward an ever-brighter future

in credit risk assessment.

REFERENCES

J. X. Jiang and F. Packer, "Credit ratings of Chinese firms

by domestic and global agencies: Assessing the

determinants and impact," Journal of Banking &

Finance, vol. 105, pp. 178-193, 2019.

M. Musdholifah, U. Hartono, and Y. Wulandari, "Banking

crisis prediction: emerging crisis determinants in

Indonesian banks," International Journal of Economics

and Financial Issues, vol. 10, no. 2, pp. 124, 2020.

A. S. Kadam, S. R. Nikam, A. A. Aher, et al., "Prediction

for loan approval using machine learning algorithm,"

International Research Journal of Engineering and

Technology (IRJET), vol. 8, no. 04, 2021.

J. Beutel, S. List, and G. von Schweinitz, "Does machine

learning help us predict banking crises?" Journal of

Financial Stability, vol. 45, pp. 100693, 2019.

N. Uddin, M. K. U. Ahamed, M. A. Uddin, et al., "An

Ensemble Machine Learning Based Bank Loan

Approval Predictions System with a Smart

Application," Available at SSRN 4376481. (This one

seems like a working paper available on SSRN and not

necessarily a journal paper. Thus, it might not fit

perfectly into the provided format)

H. Erdal and İ. Karahanoğlu, "Bagging ensemble models

for bank profitability: An empirical research on Turkish

development and investment banks," Applied soft

computing, vol. 49, pp. 861-867, 2016.

P. Carmona, F. Climent, and A. Momparler, "Predicting

failure in the US banking sector: An extreme gradient

boosting approach," International Review of

Economics & Finance, vol. 61, pp. 304-323, 2019.

G. Wang, S. W. H. Kwok, D. Axford, et al., "An AUC-

maximizing classifier for skewed and partially labeled

data with an application in clinical prediction

modeling," Knowledge-Based Systems, vol. 278, pp.

110831, 2023.

South German Credit (UPDATE). (2020). UCI Machine

Learning Repository. https://doi.org/10.24432/C5QG88.

DAML 2023 - International Conference on Data Analysis and Machine Learning

278