An Analysis of Customer Churn Prediction in Different Business

Industries

Zhengyang Zhao

Electronic Information Engineering, South China Agricultural University, Guangzhou City, China

Keywords: Artificial Intelligence, Machine Learning, Deep Learning.

Abstract: In this article, the currently deployed forecasting techniques are reviewed. Churn is widely used for areas such

as web services, gaming and insurance. However, since it is vastly used to improve predictability in various

industries, there is a great deal of variation in its definition and usage. This paper categorises the traditional

methods of machine learning and deep learning, presents a number of papers related to these two technologies,

and discusses and analyses the papers in order to provide more academics with a clear understanding of how

these two technologies are used in different industries. The paper brings together definitions of froth in the

following areas as business management, Information and communication technology (ICT) and newspaper

industry, and explains the differences between them. On the basis of this, churn loss, attribute engineering

and predictive modelling are categorised and explained. This study can be conducted by debris integration

studies in industrial domains and selecting churn definitions and relevant models suitable for most interest to

researchers.

1 INTRODUCTION

The term "customer churn" is commonly used to

describe a customer's tendency to stop working with

an organisation for a specific period of time or

contract (Chandar, 2006). Preventing customer churn

is critical when operating a service. In the past, the

efficiency of customer acquisition related to the

amount of repeat customers was favourable.

However, with the globalisation of services and

intense competition leading to market saturation,

customer acquisition costs are rising rapidly

(Verbraken, 2014).

For technology companies, customer persona

profiling is a major challenge in the contemporary

business environment (Ebiaredoh-Mienye, 2021).

These companies always suffer heavy losses due to

customer churn. Early identification of customer

personality traits is important to minimise customer

churn and develop loyal customers especially in case

of misinformation (Awan, 2022). Many studies have

been conducted in the past to analyse customer churn

and develop strategies to reduce it. Online shopping

platforms in particular have the advantage of being

easily accessible through PC web pages or mobile

apps, but conversely, this advantage can also be a

https://orcid.org/0009-0006-7441-2384

disadvantage in terms of being easily seen and

quickly left (Seo, 2023). Therefore, even a slight

decrease in customer churn can lead to higher

conversion rates, which can result in huge profits

(Ahmed, 2024). For these reasons, predicting

customer churn can be used as a way to increase the

value of the company.

Customer Relationship Management (CRM)

initially emerged as a business management approach

to improve efficiency in areas such as marketing,

sales and business administration, as well as to

enhance organisational efficiency and customer value

functions (Parvatiyar, 2001). It has been used to

develop marketing strategies using personal and

behavioural data of customers, particularly to meet

individual and unique consumer needs (Shaw, 2001).

Since then, a number of companies, taking full

advantage of Information technology (IT), have

begun to apply specialised techniques for customer

acquisition, retention and selection (Kumar, 1996).

With the integration of IT and CRM technologies, an

increasing number of organisations are adopting these

technologies in areas as diverse as data warehousing,

online platforms and finance (Bose, 2002). Due to

developments in big data, many data mining and

machine learning solutions available to analyse this

Zhao, Z.

An Analysis of Customer Churn Prediction in Different Business Industries.

DOI: 10.5220/0012972800004508

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence (EMITI 2024), pages 781-785

ISBN: 978-989-758-713-9

781

data, they can analyse the data and discover the

underlying causes of customer churn. Moreover, they

can be used to design customer retention strategies to

minimise customer churn (Ullah, 2019). Nowadays,

churn analysis has become an important strategy for

personalised customer management, and studies have

shown that improving retention of specific groups of

existing customers is more beneficial than attracting

new customers (Jahromi, 2010). Many studies have

applied several deep learning model-based froth

analysis techniques to services in the field of

computer science (Lee, 2019; Zhang, 2017).

The rest of the paper is organised as follows: In

section 2 introduces some methods done by other

researchers and discusses how they use machine

learning and deep learning to address customer churn

in different industries. In Section 3, the paper

discusses the results of researchers in various

industries who have used these methods to detect

customer churn in recent years. In Section 4, the paper

presents its conclusions.

2 METHOD

2.1 Traditional Machine Learning-

Based Algorithms

This paper explores the application of machine

learning and deep learning to customer churn in non-

contractual settings over the past decade. Machine

learning offers robust capabilities for capturing

nonlinear relationships among features, allowing it to

discern varied effects based on different

characteristics (Qiu, 2020). Deep learning, a

sophisticated extension of machine learning and

neural networks, has become increasingly popular for

predicting customer churn. It differs significantly

from traditional models and is often considered a

distinct category. Deep learning models typically

involve training on condensed sparse customer data

or using fully connected neural networks. These

networks often incorporate latent vectors extracted

from autoencoders, linking these vectors with static

data to predict churn effectively (Ahn, 2020).

2.1.1 Random Forest

Random Forest (RF) is an ensemble learning

technique that enhances model performance by

integrating multiple decision trees. This method

clusters data into smaller groups, with each subset

being used to train an individual decision tree (Nath,

2003). Each decision tree in RF is trained using

randomly selected data points with replacement using

a technique called bootstrap clustering. In addition, a

random subset of the quality for each fork in the

decision tree is chosen to be considered instead of all

features. This ramps up generality of model and

reduces overfitting.

To combine customer churn prediction and

segmentation, Olah et al. proposed a churn prediction

and customer segmentation framework (Olah, 2019).

They used RF to predict customer churn and gain

insight into the pivotal factors that contribute to

customer churn, they identified the factors using an

attribute selection classifier. Next, they extracted all

the customer churn data that was properly predicted

by the RF and performed a customer analysis to

understand the similarities between these churned

customers. Ultimately, based on the results of the

analysis, some retention strategies and

recommendations were made.

2.1.2 Decision Tree

Decision Tree (DT) is a tree structure similar to

flowchart, where each internal node exhibits a test for

an attribute, each branch exhibits the result of test,

and leaf nodes represent final result or classification.

They can be used for both classification and

regression, and are created through a process called

iterative partitioning, where data is repeatedly divided

into subsets based on certain attribute values. The

goal is to create a tree that accurately predicts the

target variable, with the most important variables at

the top of the tree. These algorithms differ in the way

they choose attributes to partition the data and how

they handle missing values and continuous variables.

DT algorithms have many advantages: They easily

visualise and understand, can handle numerical data,

use a non-parametric approach and do not require a

prior assumptions (Hassouna, 2016). Umayaparvathy

et al. conducted a comparison for predicting

confusion between Artificial Neural Network (ANN)

and DT, and found that the decision tree-based

method was more accurate than the neural network-

based method (Umayaparvathy, 2012).

In a study by (Dahiya, 2015), decision trees were

employed to predict customer churn, demonstrating

superior performance over logistic regression. The

decision trees achieved an impressive accuracy of

99.67% on a large dataset (Bamina, 2019). Another

study found that XGBoost led in terms of accuracy,

reaching 79.8%, and also scored the highest Area

Under the Curve (AUC) at 58.2% for predicting

customer churn.

In the study conducted by (Ullah, 2019), several

machine learning techniques were used to classify

customer data using annotated datasets. The aim was

to evaluate which algorithm best categorises

EMITI 2024 - International Conference on Engineering Management, Information Technology and Intelligence

782

customers into frequent and non-frequent customer

categories. The DT algorithm was used for

classification. It was classified as an enthusiastic

learning algorithm where the training data is

generalised to classify new samples. This algorithm

is an improved version of the original ID3 and C4.5,

and is widely used in the literature to analyse data.

2.1.3 Support Vector Machine

Support Vector Machine (SVM) is a powerful

machine learning model for classification and

regression tasks. On the basis of the concept of a

decision plane that defines boundaries of a decision.

It works by mapping the input data into a high-

dimensional feature space where hyperplanes can be

used to discriminate between different classes. The

hyperplane is chosen in such a way that the margin

between two classes is maximised. This machine has

many advantages, including the ability to handle

high-dimensional data, efficiency in dealing with

small datasets, and robustness to outliers.

More specifically, in the current study, they

developed and implemented a hierarchical joint

Bayesian model to predict intervals between events

and the number of customer events using YouView

data (Moral, 2022). When they attempted to classify

customer status ("YES" vs. "NO" subscription

customers), the results got using hierarchical joint

Bayesian model parameter estimation outperformed

the results obtained using the raw data from all

machine learning methods, and in terms of accuracy,

the SVM approach was the best performing overall,

with 92% accuracy, 100% correct positive rate, and

14% false positive rate.

2.2 Deep Learning

Deep Learning (DL) is a more recent analytical

approach to predicting disruption. According to Ian et

al., it is part of machine learning (Goodfellow, 2016).

Due to its increasing industries importance in recent

times, it has become a separate academic field. It is

also true for building models for flop prediction

analyses. In their 2019 study, Lee, Eunju et al.

demonstrated that DL models were more effective in

predicting game flops compared to traditional

methods (Lee, 2019). They enhanced prediction

accuracy by integrating deep learning with traditional

machine learning techniques, utilizing feature

modification strategies such as memory and

generalization. Zhang, Rong et al. also explored the

effectiveness of deep learning versus traditional

machine learning in predicting customer churn in the

insurance sector (Zhang, 2017). They processed

features specifically for deep learning applications

and merged these insights with traditional models.

Their findings showed that the deep learning-based

churn prediction method outperformed conventional

machine learning algorithms in terms of accuracy.

2.2.1 Artificial Neural Networks

ANN is an artificial intelligence system inspired by

the human brain. It is composed of interconnected

units known as nodes or neurons, which collaborate

to process information. Each neuron receives input

signals from other neurons and produces output

signals to pass on to other neurons in the network. The

input layer receives the data to be processed while the

output layer produces the final result. The

intermediate hidden layer performs various

calculations and transformations on the data. It as

been highly successful in solving complex problems

that are difficult for traditional algorithms to handle.

However, it requires large amounts of data and

computational resources for training, and their

performance may depend heavily on the quality of the

data used for training.

The research by Arokia Panimalar and

Krishnakumar is centered on creating a robust

customer churn prediction model known as DFE-

WUNB, designed to operate within cloud computing

frameworks. This model leverages ANN for deep

feature extraction, effectively addressing the intricate

non-linear patterns found in telecommunications

customer churn datasets. The DFE-WUNB model

demonstrates superior accuracy in predicting

customer churn compared to conventional methods

(Panimalar, 2023).

2.2.2 Convolutional Neural Network

Convolutional Neural Network (CNN) is a deep

learning algorithm primarily used for image

recognition and classification tasks. Inspired by the

structure and function of the human visual system, it

is highly effective in recognising objects within the

visual area. Firstly, there are convolutional layers:

These layers apply a series of filters to the input image

to create feature maps that capture different aspects

of the image, such as edges, corners, or specific

patterns. Secondly, activation layers: After each

convolutional layer, an activation layer is usually

applied to introduce nonlinearity to the network,

allowing it to learn more complex patterns; Rectified

Linear Unit (ReLU) is a common activation function

used in WSNs. Third, clustering layers: These layers

reduce the spatial size of the feature map, helping to

An Analysis of Customer Churn Prediction in Different Business Industries

783

reduce computational complexity, prevent

overfitting, and make feature detection independent

of size and orientation: After several layers of

convolutional and clustering layers, the final set of

layers in a CNN is usually one or more fully

connected layers that perform high-level inference

and categorise input image into predefined classes:

The last layer of the network produces the output. The

output can be a probability distribution of classes in a

classification task or a bounding box in an object

detection task.

CNNs have revolutionised various fields like

computer vision and medical imaging. They are

particularly effective because they are able to

automatically learn and extract relevant features from

raw data without the need to manually engineer

features. In short, Improved RoCE Network (IRNs)

are powerful deep learning models that excel at

processing network-like data such as images, which

are the cornerstone of modern image analysis and

recognition systems. In a study by Ahmed et al.

(Ahmed, 2019), Ahmed et al. utilized a DP approach

and proposed a method called "TL-DeepE", which

starts with TL (transfer learning) by tuning several

pre-trained deep CNNs. They converted the TL

dataset into a 2D image format. They then used these

CNNs as base classifiers and Genetic Programming

(GP) and AdaBoost as meta-classifiers. The accuracy

of their method on the Orange and Cell2Cell datasets

was 75.4% and 68.2% with an overall utilisation rate

of 83% and 74%, respectively.

3 DISCUSSIONS

It can be confirmed that the modelling techniques

favoured by different business domains are different.

Companies in the gaming, social media and telecoms

industries, which rely heavily on log data and have

easy access to customer information, use deep

learning techniques, which have relatively more

applications for big data, and this is a fast-growing

trend. For example, in the area of image recognition

for social media platforms, many social media

companies want to automatically tag user-uploaded

images with relevant keywords to improve

searchability and provide better content

recommendations. They decided to use deep learning,

specifically CNN, to recognise and categorise the

content of these images. CNN is trained on large

datasets of tagged images and can learn complex

patterns and features directly from the data. When a

new image is uploaded, the model is able to predict

what is depicted in the image with high accuracy,

even if the image is slightly different from the one in

the training set. This deep learning approach enables

social media platforms to optimise image tagging

systems, leading to improved user experience and

engagement.

For the financial and insurance industries, they

use traditional machine learning models or analyses

due to relatively small volume of log data and the

small degree of variation in the information obtained

from customers. For example, in the area of credit risk

assessment, some financial firms (e.g., banks or credit

unions) want to optimise their processes for assessing

the creditworthiness of potential borrowers.

Accurately predicting whether a borrower is likely to

default on a loan is critical for financial institutions to

manage risk and maintain profitability. Companies

have historical data on loans, including various

attributes of borrowers (e.g., income, employment

status, credit history) and whether they will ultimately

default on the loan. By using a VPPM to assess credit

risk, financial firms can reduce the number of non-

performing loans and improve the health of their

overall loan portfolio.

In new tech companies have to deal with large

amounts of complex data, so they will opt for high-

cost deep learning models to facilitate data processing,

giving them more optimistic expected returns.

Whereas in traditional financial firms, they are more

likely to choose lower-cost models to minimise costs

as there is less variation in customer information. In

addition, the developed algorithms should rely on

more advanced hardware or transmission

mechanisms to achieve higher processing speeds and

more accurate identification capabilities (Deng, 2023;

Sugaya, 2019).

4 CONCLUSIONS

This paper compares techniques for predictive

analysis of user momentum using log data. In recent

years, methods that use deep learning algorithms to

analyse the prediction of user momentum have

emerged. Deep learning algorithms outperform other

algorithms. Unlike other modelling techniques, they

are able to learn customer behavioural patterns from

massive amounts of data through layers of stacked

neuron structures. Therefore, applying this data to

deep learning algorithms to generate latent features

given the timestamp and large number of

observations is expected to perform better than

traditional fuzzy prediction models. Therefore, the

reader needs to understand the shape of the datasets

EMITI 2024 - International Conference on Engineering Management, Information Technology and Intelligence

784

and apply the appropriate algorithms to solve the

prediction problem.

REFERENCES

Awan, M. J., Khan, M. A., Ansari, Z. K., Yasin, A., &

Shehzad, H. M. F. 2022. Fake profile recognition

using big data analytics in social media platforms.

International Journal of Computer Applications in

Technology, 68(3), 215.

Ahmad, N., Awan, M. J., Nobanee, H., Zain, A. M.,

Naseem, A., & Mahmoud, A. 2024. Customer

personality analysis for churn prediction using hybrid

ensemble models and class balancing techniques.

IEEE Access, 12, 1865–1879.

Ahn, J., Hwang, J., Kim, D., Choi, H., & Kang, S. 2020. A

survey on churn analysis in various business domains.

IEEE Access, 8, 220816–220839.

Ahmed, U., Khan, A., Khan, S. H., Basit, A., Haq, I. U., &

Lee, Y. S. 2019. Transfer learning and meta

classification based deep churn prediction system for

telecom industry. arXiv.

Bose, R. 2002. Customer Relationship management: key

components for IT success. Industrial Management

and Data Systems, 102(2), 89–97.

Chandar, M., and Krishna, P. A. L. 2006. Modeling churn

behavior of bank customers using predictive data

mining techniques. Proc. Nat. Conf. Soft Comput.

Techn. Eng. Appl. (SCT), pp. 24-26.

Dahiya, K., & Bhatia, S. 2015. Customer churn analysis in

telecom industry. ICRITO.

De Andrade Moral, R., Chen, Z., Zhang, S., McClean, S.,

Palma, G. R., Allan, B., & Kegel, I. 2022. Profiling

television watching behavior using Bayesian

hierarchical joint models for Time-to-Event and Count

data. IEEE Access, 10, 113018–113027.

Deng, X., Oda, S., Kawano, Y., 2023. Graphene-based

midinfrared photodetector with bull’ s eye plasmonic

antenna. Optical Engineering, 62(9), p. 097102-

097102.

Ebiaredoh-Mienye, S. A., Esenogho, E., & Swart, T. G.

2021. Artificial neural network technique for

improving prediction of credit card default: A stacked

sparse autoencoder approach. International Journal of

Power Electronics and Drive Systems, 11(5), 4392.

Goodfellow, I., Bengio, Y. and Courville, A. 2016. Deep

Learning, Cambridge. MA, USA:MIT Press.

Hassouna, M. S., Tarhini, A., Elyas, T., & AbouTrab, M. S.

2015. Customer churn in Mobile Markets: A

comparison of Techniques. International Business

Research, 8(6).

Jahromi, A. T., Sepehri, M. M., Teimourpour, B., &

Choobdar, S. 2010. Modeling customer churn in a

non-contractual setting: the case of

telecommunications service providers. Journal of

Strategic Marketing, 18(7), 587–598.

Komenar, M. 1996. Electronic marketing.

Lee, E., Jang, Y., Yoon, D., Jeon, J., Yang, S., Lee, S., Kim,

D., Chen, P. P., Guitart, A., Bertens, P., Periáñez, Á.,

Hadiji, F., Müller, M., Joo, Y., Lee, J., Hwang, I., &

Kim, K. J. 2019. Game data mining competition on

churn prediction and survival analysis using

commercial game log data. IEEE Transactions on

Games, 11(3), 215–226.

Nath, S. V. and Behara, R. S. 2003. Customer churn

analysis in the wireless industry: A data mining

approach. Proc. Annu. Meeting Decis. Sci. Inst., vol.

561, pp. 505-510.

Panimalar, S. A., & Krishnakumar, A. 2023. Customer

churn prediction model in cloud environment using

DFE-WUNB: ANN deep feature extraction with

Weight Updated Tuned Naïve Bayes classification

with Block-Jacobi SVD dimensionality reduction.

Engineering Applications of Artificial Intelligence,

126, 107015.

Parvatiyar, A. and Sheth, J. N. 2001. Customer relationship

management: Emerging practice process and

discipline. J. Econ. Social Res., vol. 3, no. 2.

Pamina, J., Raja, J., Bama, S. S., Soundarya, S., Sruthi, M.

S., Kiruthika, S., Aiswaryadevi, V. J., & Priyanka, G.

2019. An effective classifier for predicting churn in

telecommunication. Journal of Advanced Research in

Dynamic and Control Systems, 11, 221–229.

Qiu, Y., Chen, P., Lin, Z., Yang, Y., Zeng, L., & Fan, Y.

(2020, June). Clustering Analysis for Silent Telecom

Customers Based on K-means++. In 2020 IEEE 4th

Information Technology, Networking, Electronic and

Automation Control Conf. (ITNEC) (Vol. 1, pp. 1023-

1027). IEEE.

Seo, D., & Yoo, Y. 2023. Improving shopping mall revenue

by Real-Time Customized digital coupon issuance.

IEEE Access, 11, 7924–7932.

Shaw, M. J., Subramaniam,C., Tan, G. W. and Welge, M.

E. 2001. Knowledge management and data mining for

marketing. Decis. Support Syst., vol. 31, no. 1, pp.

127-137.

Sugaya, T., Deng, X., 2019. Resonant frequency tuning of

terahertz plasmonic structures based on solid

immersion method. 2019 44th International

Conference on Infrared, Millimeter, and Terahertz

Waves, p.1-2.

Umayaparvathi, V., & Iyakutti, K. 2012. Applications of

data mining techniques in telecom churn prediction.

International Journal of Computer Applications,

42(20), 5–9.

Ullah, I., Raza, B., Malik, A. K., Imran, M., Islam, S. U., &

Kim, S. W. 2019. A Churn Prediction Model using

Random Forest: Analysis of machine learning

techniques for churn prediction and factor identification

in telecom sector. IEEE Access, 7, 60134–60149.

Verbraken, T., Verbeke, W., & Baesens, B. 2014. Profit

optimizing customer churn prediction with Bayesian

network classifiers. Intelligent Data Analysis (Print),

18(1), 3–24.

Zhang, R., Li, W., Tan, W. M., & Mo, T. 2017. Deep and

shallow model for insurance churn prediction service.

SCC.

An Analysis of Customer Churn Prediction in Different Business Industries

785