further improving the success rate of telemarketing.
Hyeon used Bayesian network models to predict
customer responses to bank telemarketing and
designed a decision system to provide real-time
decision support (Hurst 2008, Ahn & Ezawa 1997).
Elsalamony applied logistic regression, a naive
Bayesian algorithm, and a neural network model to
analyze marketing effectiveness (Elsalamony 2014).
Kim classified customers and used convolutional
neural network models, decision tree models, and
logistic regression models to predict the probability
of successful customer marketing under different
classifications, to find the optimal model for
predicting telemarketing effectiveness (Kim et al.
2016). Jiang compared the predictive performance of
various models on bank deposit data and found that
the Forest model is one of the best-performing models
in predicting this type of data (Jiang 2021). Liu
proposed a fuzzy support vector machine (SVM)
model and compared it with traditional SVM models
regarding prediction performance. The results
showed that the newly proposed fuzzy SVM model
had better prediction performance (Liu et al. 2017).
Jiang used various classification models such as
Bayesian and logistic regression to predict the
optimal consumer group for telemarketing and
provided some suggestions for refined management
and services of banks (Jiang 2018). Chun improved
the unsupervised learning Kohonen network and
proposed a Kohonen-supervised learning network
model for telemarketing prediction (Yan et al. 2020).
2 METHODOLOGY
2.1 Data Sources
The data in this article is taken from the direct
marketing activity data of bank fixed deposits on the
Kaggle website, including two datasets: the training
set and the test set, with 45211 pieces of data in the
training set and 4521 pieces of data in the test set. This
data includes data related to direct telemarketing
activities carried out by Portuguese banking
institutions and a collection of various customer
information data. This article will perform logistic
regression on the training set to determine the model,
and then use the test set to test the accuracy of the
model in determining whether customers will choose
to engage in fixed deposit business.
2.2 Variable Selection
This study aims to predict whether customers will
engage in fixed deposit business. Therefore, the
dependent variable y is whether customers choose a
fixed deposit business, and it is a 0-1 variable. The
proportion of y in the two datasets is shown in Figure
1.
Figure 1: The proportion of fixed deposits (Picture credit:
Original).
From the above figure 1, it can be seen that the
majority of customers refuse to make fixed deposits,
and the distribution of the proportion in the training
and testing sets remains consistent. The independent
variables are divided into two parts. The first part is
customer-related data, including 8 items such as age,
work, and marital status, as shown in Table 1.
From the below table, it can be seen that there are
more middle-aged people aged 30 to 40 in terms of
age distribution. In terms of year-end deposit balance
in banks, there are both large deposits and large
liabilities, and overall, customers have a positive
year-end deposit balance. In terms of work, there are
more Blue-collar and Management, both exceeding
20%, and almost all have stable sources of income. In
terms of marital status, more than half of married
individuals have a relatively happy overall family
situation. In terms of education, there is almost no
illiteracy, and more than half of the clients have
reached the secondary level with a high level of
education. In terms of credit, only a few customers
have default records. At the same time, the number of
customers with or without housing loans remains
relatively stable, while the majority of customers do
not have personal loans.
The second part is the telemarketing data for the
customer, including the customer's contact
information, the month and date of the last contact of
the year, and a total of 8 items of data. The interval
between the previous two marketing activities is -1,
indicating that they have not been contacted before,
as shown in Table 2.