system, traditionally, the workload is the largest, and
the problem in daily operation is the work of
extracting, transforming and integrating data from
business database to data warehouse. The reason is to
extract, transform and integrate data from different
kinds and forms of business, and finally store it in
data warehouse. And to maintain and manage the
quality of data.
4.2 Establishment of Prediction Model
The churn prediction model is a model for predicting
customer churn in the performance market, which
subdivides different prediction targets of different
users and makes the prediction results more forward-
looking. When the model is initially built, it is
necessary to study the data in the data warehouse and
obtain the data related to the loss prediction and
analysis (Szafrański, Zieja, Wójcik, et al. 2018). And
organize them according to the time granularity to be
studied, and explore the data. At the same time,
strengthen the discussion with the bureau. Through
exploring the data and understanding the demand, we
should be able to preliminarily determine the time
window structure of the forecasting model, the
definition of the group to be predicted, the forecasting
target, and select the index set sensitive to the
forecasting target.
The process of data understanding is an iterative
process. Choosing the appropriate time window
structure, groups to be predicted, prediction targets
and index sets is half the success of loss prediction.
Therefore, we must carefully scrutinize the data to
understand the work at this stage (Bimonte, Billaud,
Fontaine, et al. 2021). In the stage of data exploration,
the most valuable index set for loss prediction is
obtained. Then, by making derivative variables, the
data can more fully reflect the customer's behavior
changes. After defining the forecast target, the
forecast target is divided into several loss types, and
the priority of each type is defined to ensure that each
customer is in only one loss type state. Then, using
the data in the time window, mark each user with the
churn type.
In the data preparation stage, after making the
analysis table of time period A and time period B, the
operation of establishing the model can be started,
and the number of samples to be extracted in the
sample table is designed. Assume that 30,000
samples of 0 loss type need to be sampled, and 1,000
samples of other loss types need to be sampled.
Create an empty table to store the samples extracted
next. This table is called sample table for short. Pay
attention to tick the position of "Output should be
attached to the specified table" in the figure. The data
preparation process is shown in Figure 3.
Figure 3: Data preparation process.
The data of loss prediction model mainly comes
from two parts: detailed accounting list, user data, etc.
If the data warehouse of the system has been built,
these data can be provided by the data warehouse;
otherwise, a data mart can be set up separately to
provide data for data mining.
Take samples with loss type 0. When extracting,
first define the source table. On the input data page of
bivariate statistics, in the available input data box,
select a time period analysis table. In the filter record
condition box, the selection condition is "loss type
flag =0". On the sample page, select "create sample".
The sampling technology is "random selection of N
records", and the number of records should be filled
in "30000".
Input the sample table into the decision tree model
to start training. After the training, check the
confusion matrix output by the model to determine
the training effect of the model. If you are not
satisfied with the training effect, you can try to retrain
the model by adjusting the number of samples, the
proportion of samples, the input fields of the model,
the weights of the fields, and the parameters of the
decision tree algorithm to improve the effect of the
model. When modifying the parameters of decision
tree algorithm, the prediction effect of the model can
be improved by setting the cost matrix.
4.3 Solution Based on Data Warehouse
Platform
The performance market operation system generally
adopts a four-tier structure, as shown in Figure 4. The
operation system has established a unified enterprise
data information platform for the performance
industry. In this paper, the advanced data warehouse
technology and system analysis and mining tools
which are popular in the market in recent years are
used to extract useful information from the enterprise
historical data, provide services for the enterprise