by (1) preserving the previously learned knowledge
and (2) dynamically adjusting the network architec-
ture for each new chunk to achieve the highest per-
formance during training. IFL expands the network
topology by adding new hidden layers and new units
during each adaptation phase. Determining the most
suitable model’s architecture leading to the best per-
formance is a complex problem. Our IFL approach
adds hidden units one by one until the model does not
converge anymore. Nevertheless, as we are chang-
ing the network architecture to increase the perfor-
mance, we may over-fit the resulted model. Hence,
we utilize a validation chunk during training to avoid
over-fitting after each extension. More precisely, only
the weights of the new hidden units are updated each
time and the previous units are frozen to store the pre-
vious knowledge. Thus, less computational time is
needed to conduct learning for each new chunk. In
this way, our IFL approach will always outperform
other incremental learning approaches as the former
continuously adapts its architecture to reach optimal
accuracy. Most of the architecture of past incremen-
tal learning approaches is permanently fixed (Anowar
and Sadaoui, 2021), and the accuracy may not im-
prove when new chunks are fed to the model. Imple-
menting the IFL algorithm is challenging, requiring a
deep investigation of the building blocks and libraries
of MLA toolkits, such as creating a new hidden unit,
adding a new connection, and freezing the weights of
a old connection (not to be re-optimized). Moreover,
using a real CCF dataset, we create training, testing,
and validation data chunks and handle the highly im-
balanced chunks. For a robust validation of our IFL
algorithm, we develop four fraud classifiers, trained
on different chunks, and then compare their predic-
tive performances on unseen data. The experimental
results show the efficiency of the proposed learning
approach.
The study (Guan and Li, 2001) was the first one
to develop an IFL approach. This approach, which
is elementary, adopted a network topology that is not
common because the input layer is directly connected
to the output layer. During training, it freezes all the
weights connected to the output unit. Although freez-
ing these weights can speed up training, it will how-
ever reduce the model’s performance. Also, this pa-
per does not provide details on the algorithm design
and its implementation. Our paper presents all the
stages of our IFL algorithm that learns progressively
from newly available chunks. Through a concrete ex-
ample, based on constructive neural networks, we il-
lustrate step by step the sophisticated behavior of our
adaptive approach.
2 RELATED WORK
We review recent research on detecting CCF and
highlight its weaknesses. The majority of studies con-
ducted batch learning, such as (Hassan et al., 2020)
that explored deep learning, like BiLSTM and Bi-
GRU, and classical learning, such as Decision Tree,
Ada Boosting, Logistic Regression, Random For-
est, Voting and Naive Base. Since the fraud dataset
is highly imbalanced, the authors adopted random
under-sampling, over-sampling and SMOTE. The hy-
brid of over-sampling, BiLSTM and BiGRU lead to
the highest accuracy. Another work (Nguyen et al.,
2020) also assessed several MLAs, including LSTM,
2-D CNN, 1-D CNN, Random Forest, ANN and
SVM, using different data sampling methods on three
credit-card datasets. LSTM and 1-D CNN combined
with SMOTE returned the best results. We believe
LSTM can be a good option for incremental learn-
ing since this algorithm can remember past data and
therefore creates predictions using the current inputs
and past data, leading to a better response to the en-
vironmental changes. In both papers, LSTMs and the
other models were trained on very large datasets, re-
quiring storing sensitive information forever. Nev-
ertheless, since user transactions are available incre-
mentally, conventional MLAs are inappropriate for
streaming data. Our proposed method aims to address
the real CCF classification context.
In (Anowar and Sadaoui, 2020), the authors first
utilized SMOTE-ENN to handle a highly imbalanced
CCF dataset and then divided the dataset into multiple
training chunks to simulate incoming data. They pro-
posed an ANN-based incremental learning approach
that learns gradually from new chunks using an in-
cremental memory model. For adjusting the model
each time, the memory consists of one past chunk (so
that data are not forgotten immediately) and one re-
cent chunk (to conduct the model adaptation). The
authors demonstrated that incremental learning is su-
perior to static learning. However, using two chunks
every time can be expensive computationally. Also,
since the ANN topology is fixed, the model cannot
adapt to significant changes in the chunk patterns. In
our study, the ANN architecture is dynamic to build
an optimal fraud detection model. Instead of using
two chunks simultaneously, leading to storing more
data, we use only one chunk. With transfer learning,
we take advantage of the previous chunk without stor-
ing it.
In (Bayram et al., 2020), the authors introduced
a Gradient Boosting Trees (GBT) approach, which is
just an ensemble of decision trees, to minimize the
loss function gradually. The ensemble is updated for
Incremental Feature Learning for Fraud Data Stream
269