Lifelong Learning Needs Sleep: Few-Shot Incremental Learning

Enhanced by Sleep

Yuma Kishimoto and Koichiro Yamauchi

Department of Computer Science, Chubu University, Kasugai, Japan

Keywords:

Sleep, Lifelong Learning, Incremental Learning, Catastrophic Forgetting, Few-Shot Leanring, Incremental

Few-Shot Learning.

Abstract:

Catastrophic forgetting due to incremental learning in neural networks is a serious problem. We demonstrate

that introducing a sleep period can address this issue from two perspectives. First, it provides a learning period

for re-learning old memories. Second, it allows for time to process new learning. We applied a VAE, enhanced

by an adapter, for incremental learning of new samples and generating valid samples from a few learning

examples. These generated samples are used for the neural network’s re-learning, which also contributes to

improving its generalization ability. The experimental results suggest that this approach effectively realizes

few-shot incremental learning.

1 INTRODUCTION

Artiﬁcial intelligence (AI) has rapidly advanced with

improvements in computing power and storage capac-

ity. Deep learning, a core AI technology, achieves

high accuracy through ofﬂine learning by layering

neural networks.

Some applications, however, require incremental

learning to handle new tasks. Mixing past training

data with new data demands substantial memory re-

sources, making it impractical in many real-world

scenarios. Lifelong learning models have been devel-

oped to address this by enabling incremental learning

and recognition.

Lifelong learning models have evolved signiﬁ-

cantly. However, they often struggle with catas-

trophic forgetting, which is a critical issue in incre-

mental learning. Existing lifelong learning models

employ various techniques to mitigate this problem,

such as replay mechanisms(e.g., replay-buffer, naive

rehearsal(Hsu et al., 2018) , VQ based(Hayes et al.,

2019), GAN(Shin et al., 2018)(Lesort et al., 2019),

VAE(van de Ven et al., 2020), FearNet(Kemker and

Kanan, 2018)), spontaneous ﬁring(Golden et al.,

2022), and weight consolidation strategies such as

EWC(Kirkpatrick et al., 2017) (Rusu et al., 2016),

Packnet (Mallya and Lazebnik, 2018).

Despite progress, no model has simultaneously

achieved incremental end-to-end learning, knowledge

https://orcid.org/0000-0003-1477-0391

reformulation, and immediate response in a short

time. To address this, we propose a novel approach

introducing sleep periods in lifelong learning. These

periods enable relearning old memories and reorga-

nizing existing knowledge.

Our model uses an adapter-enhanced VAE to

record past samples and generate pseudosamples for

data augmentation. The VAE encoder also serves as

the backbone for classiﬁcation tasks, ensuring model

compactness and quick responses.

This method offers an integrated solution that en-

ables effective incremental learning, enhances knowl-

edge reformulation, and ensures rapid inference after

observing new instances.

Although lifelong learning methods based on self-

organization(e.g. (Parisi et al., 2017)) display proper-

ties similar to ours, they do not use knowledge reor-

ganization. In our model, the sleep process realizes

knowledge reorganization, which contributes to the

acquisition of high generalization capability.

2 RELATED WORKS

This section compares the proposed method with ex-

isting methods related to few-shot learning, incremen-

tal learning, and sleep-based learning.

Kishimoto, Y. and Yamauchi, K.

Lifelong Learning Needs Sleep: Few-Shot Incremental Learning Enhanced by Sleep.

DOI: 10.5220/0013170200003905

In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2025), pages 243-250

ISBN: 978-989-758-730-6; ISSN: 2184-4313

243

2.1 Incremental Learning

Incremental or continual learning methods allow new

samples to be learned by an already-trained network.

However, this is always accompanied by catastrophic

forgetting(French, 1999).Various methods have been

proposed to solve this problem.

2.1.1 Replay Model

Naive Rehearsal (Hsu et al., 2018) and Remind(Hayes

et al., 2019) address catastrophic forgetting by replay-

ing past data alongside new data. Methods utilizing

VQ improve efﬁciency by compressing data and re-

ducing memory usage, but increasing past data raises

memory consumption, making them unsuitable for

resource-limited environments. To solve this prob-

lem, generative replay models were developped.(Shin

et al., 2018)(Kemker and Kanan, 2018)(Lesort et al.,

2019)(Liu et al., 2020)(van de Ven et al., 2020).

These methods, however, require training a neural

network as a replay buffer. To prevent catastrophic

forgetting, many past samples must be mixed during

training, leading to high computational costs.

2.1.2 Weight Consolidation

Catastrophic forgetting occurs when modifying

weight connections critical to past knowledge. Freez-

ing these connections during new learning retains

memories. EWC(Kirkpatrick et al., 2017) and synap-

tic intelligence(Zenke et al., 2017) minimize the im-

pact of new tasks by restricting changes to impor-

tant weights, while PackNet(Mallya and Lazebnik,

2018) partitions network capacity by masking criti-

cal weights for each task, enabling continuous learn-

ing. This method preserves past knowledge by ﬁxing

task-speciﬁc weights. However, these approaches are

unsuitable for environments with incrementally pro-

vided datasets because ﬁxed weights cannot be up-

dated, limiting their adaptability.

2.2 Methods that Mimic Sleep

Many organisms have sleep periods, often remaining

inactive during them. Inspired by this, several meth-

ods introduce periodic learning into machine learning,

enabling continuous incremental learning.

2.2.1 Incremental Learning with Sleep (ILS)

The ILS model utilizes two types of radial basis func-

tions (RBFs) (Yamauchi and Hayami, 2007). During

daytime learning, one RBF incrementally learns new

samples by adding RBFs, while the other prunes re-

dundant RBFs using pseudo-samples. After nighttime

learning, all daytime RBFs are replaced with param-

eters from the night-learned RBF. This cycle repeats

daily. The system showed improved generalization

after nighttime learning compared to daytime learn-

ing. However, its simple network structure limits its

recognition accuracy.

2.2.2 Sleep Models Using Spiking Neuron

Models

The spiking neuron model mimics brain sleep pro-

cesses, reproducing synaptic ﬁring to prevent catas-

trophic forgetting during continuous learning. The

multilayer spiking neural network (Golden et al.,

2022) uses spontaneous ﬁring during sleep to form

shared synaptic weights, ensuring existing tasks are

preserved when new tasks are added. By integrating

a pseudo-sleep process for re-learning, this approach

reviews past tasks and mitigates catastrophic forget-

ting. While still in its early stages, this model has

demonstrated memory retention during sleep, but the

evolution of learning outcomes requires further anal-

ysis.

2.2.3 FEARNet

FEARNet(Kemker and Kanan, 2017) is a continuous

learning approach that combines short- and long-term

memory networks to adapt to new tasks while retain-

ing prior knowledge. It uses ResNet as the back-

bone for feature extraction. As new data are intro-

duced, features are pre-extracted, and knowledge ac-

cumulates in both memory networks, leveraging re-

play mechanisms for learning.

While ResNet provides strong feature extraction,

FEARNet relies on pre-extracted features, making it

less ﬂexible for adapting to diverse datasets and tasks.

This reliance on ResNet limits its scalability despite

its strengths in continuous learning.

2.3 Positioning of the Proposed Method

As detailed in 3.1,this paper proposes a continual few-

shot learning method using VAE and k-nearest neigh-

bors (K-NN). Features are extracted by VAE, with an

adapter layer mitigating catastrophic forgetting. Un-

like FearNet, VAE handles feature extraction ﬂexibly,

enabling efﬁcient task adaptation.

The method employs a two-stage learning pro-

cess: daytime learning for new tasks and nighttime

learning for reviewing and reconstructing knowledge.

This approach efﬁciently adapts to continual tasks

while preserving past knowledge

Table 1 compares the proposed method with ma-

jor few-shot continual learning methods, highlight-

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

244

Table 1: Comparison of Methods.

Method

Catastrophic

Forgetting

Prevention

Scalability

Knowledge

re-organization

Computation

Resource

Efﬁciency

Response time

Generative Replay models

◦ ◦ ◦ △ △

Naive Rehearsal

◦ △ ◦ △ △

VQ-based Methods

◦ △ △ △ △

Multi-layer Spiking

Neural Network

△ △ ◦ △ △

EWC

△ △ × △ △

PackNet

◦ △ × △ △

FEARNet

◦ × △ △ ◦

Proposed Method

◦ ◦ ◦ ◦ ◦

ing their characteristics. These methods are evalu-

ated based on network size efﬁciency, suppression of

catastrophic forgetting, scalability, and computational

resource efﬁciency. The proposed method excels in

scalability, knowledge re-organization, and computa-

tional efﬁciency in a continual learning environment.

In the table, ◦ indicates superiority, △ average perfor-

mance, and × poor performance.

3 PROPOSED METHOD

3.1 Overview

Figure 1: Learning Stages and Flow.

Environments in which learning systems are used, re-

quire that known samples be recognized while learn-

ing new samples incrementally. When new samples

arise, rapid incremental learning is needed to up-

date the recognition system and adapt it to subse-

quent tasks. Few-shot incremental learning faces the

dual challenges of catastrophic forgetting and overﬁt-

ting. Combining data augmentation with incremental

learning can mitigate these risks.

Figure 1 outlines the proposed method, which al-

ternates between two stages: daytime (learning new

tasks) and nighttime (reviewing past tasks and recon-

structing knowledge). This model combines a VAE

with NNs as recognition heads, with the VAE encoder

serving as the backbone. During daytime, NNs learn

few-shot samples as feature-label pairs from the back-

bone. The Adapter, introduced by (Houlsby et al.,

2019), is attached to the encoder’s output to adapt

and reconstruct new samples, mitigating catastrophic

forgetting without updating the original VAE param-

eters.

However, at night, copies of the NNs and decoder

from the original VAE are used to generate learn-

ing samples by producing previously memorized sam-

ples, and the VAE is retrained. In this retraining pro-

cess, contrastive loss was used to train the metric.

In the following sections, we explain the recogni-

tion head (NNs and VAE) with the attached Adapter,

followed by a detailed description of the two learning

periods, daytime and nighttime.

3.2 VAE with an Adapter

The model uses a variational autoencoder

(VAE)(Kingma and Welling, 2014) for feature

extraction and data expansion in both day and night

phases. The VAE, composed of a CNN-based

encoder and decoder, outputs a latent variable z from

input x, assumed to follow the following normal

distribution.

(z|x) = N (µ

(x), σ

(x)) (1)

Here, φ denotes the parameter vector of the encoder.

By contrast, the decoder reconstructs the input vector

from z. That is,

(x|z) = N (µ

(z), σ

(z)) (2)

Here, θ denotes the parameter vector of the decoder.

The parameter vectors are optimized through learning

Lifelong Learning Needs Sleep: Few-Shot Incremental Learning Enhanced by Sleep

245

by minimizing the following Kullback-Leibler diver-

gence (KLD) loss function:

L(θ, φ; x) = −D

(z|x)||p(z))

+ Eq

(z|x)[log p

(x|z)] (3)

The VAE originally minimizes Eq 3 using a ﬁxed

dataset. During daytime, incremental learning is

needed as new samples are provided. Minimizing Eq

3 with only new samples risks catastrophic forgetting,

losing prior information.

To address this, an Adapter(Houlsby et al., 2019)

is added to the encoder. The adapter, a single linear

layer, is inserted as a bypass in the output of the en-

coder’s ﬁnal layer.

(x) = µ

(x) + f

Adapter,µ

aµ

(x) + b

] (4)

(x) = σ

(x) + f

Adapter,σ

aσ

(x) + b

] (5)

Figure 2: Adapter Update Flow.

Eq 3 is calculated as follows: The VAE mainly

learns during nighttime. Let an input image from a

set of new data be x

∈ D

new

(date), the latent variable

obtained through the encoder be z

, and the image re-

constructed through the decoder µ

) be x

′

The reconstruction error representing the differ-

ence between x

and x

′

is calculated using the binary

cross-entropy(Kingma and Welling, 2014) in Eq 6.

recon

new

(date)) =

−

∑

i∈D

new

(date)



logx

′

+ (1 − x

)log(1 − x

′

)



(6)

Additionally, the KL divergence between latent

variable z and its prior distribution is calculated us-

ing Eq 7 and included in the loss function.

new

(date))

= −

∑

i∈D

new

(date)



1 + log(σ

) − µ

− σ



(7)

where σ

is the variance of the input data and µ

is the mean of the input data. From Eqs 6 and 7 , the

total loss of the VAE is expressed as shown in Eq 8.

total

new

(date)) =

recon

new

(date)) + L

new

(date)) (8)

When updating the Adapter, the parameters of the

encoder and decoder are ﬁxed, and updating is per-

formed using Eq 9 only for the set of new samples

from that day, as shown in Figure 2. D

new

(date) ≡

{(x

, y

)}

t∈Daytime(date)

∆θ

∝ −∇

total

new

(date)), (9)

where θ

= [W

aµ

, w

aσ

, b

]

In this way, by training only the Adapter layer,

the weights of the existing encoder and decoder re-

main ﬁxed, preserving the information from previ-

ously learned tasks, thereby preventing catastrophic

forgetting.

3.3 Recognition Head

Nearest neighbors perform both learning and recog-

nition as recognition heads. The NN is com-

posed of feature vectors µ

) and variance vectors

)obtained from the encoder in Eq 4 and corre-

sponding labels y

. This set is referred to as the sup-

port set S.

S ≡ {(µ

), σ

), y

)| j = 1, ·· · , M} (10)

During recognition, the label vector y

∗

of the tem-

plate vector closest to the mean vector µ

(x) is output.

That is

∗

= argmax

(x)

)

∥µ

(x)∥∥µ

)∥

(11)

The σ(x

) will be used for data augmentation as ex-

plained in 3.4.1.During daytime learning, the Adapter

is conﬁgured for the new samples, and the feature and

variance outputs are adjusted for the new samples.

The adjusted feature outputs are then used to add a

new support set to the existing support S. That is,

S = S ∪ {(µ

), σ

), y

)| j ∈ D

new

(date)} (12)

After daytime learning, recognition results are de-

termined by nearest neighbors. Inputs include known

and newly learned samples, so cosine similarities of

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

246

feature outputs are compared, with and without the

adapter, to select the most probable labels. During

nighttime learning, the VAE and support sets are re-

built because the encoder’s feature outputs change.

new

≡ {(µ

′

), y

)| j ∈ χ} (13)

where φ

′

denotes the re-constructed VAE during

nighttime learning. Therefore, the next day, the learn-

ing process is stated after initialization as follows:

S = S

new

, φ = φ

′

(14)

3.4 Entire Learning Process

Here, the entire learning process is brieﬂy explained.

The learning process is divided into two categories:

daytime and nighttime learning.. Note that ini-

tially, VAE initial learning must be performed to

successfully realize data augmentation. Two VAEs

are prepared for each learning phase. Therefore,

BaseVAE:(φ, θ)] is used for daytime learning, and

SubVAE:(φ

′

, θ

′

) for nighttime learning.

3.4.1 Daytime Learning

During daytime learning, the BaseVAE learns

from the new samples D

new

(date) ≡ {(x

, y

)|t ∈

Daytime(date)}.

First, the Adapter learns using Eq 9. Subse-

quently, NN appends new prototypes using Eq 12.

3.4.2 Nighttime Learning (Sleeping)

During nighttime learning, data augmentation and the

training of SubVAE are executed simultaneously. In

this process, latent vectors z are generated using the

templates in the NN.

z ∼ N (µ

), σ(x

)

) (15)

The proposed system generates M samples for each

template vector.The decoder generates input vector

from z as follows. In this paper, let us denote the set

of generated input vectors as R.

R = R ∪(µ

(z), y

), (16)

The relearning process uses SubVAE(φ

′

, θ

′

), and

parameters of SubVAE are initialized using the pa-

rameters of BaseVAE

During nighttime learning, the SubVAE attempts

to minimize not only the standard loss deﬁned in Eq

3 but also a contrastive loss for the decoder. The con-

trastive loss is deﬁned by

contrast

∑

(i, j)∈RBuffer,i̸= j



·∥µ

′

)−µ

′

)∥

+ (1 − y

) · max(0, m − ∥µ

′

) − µ

′

)∥

)



(17)

Therefore, the SubVAE performs learning using

the following loss function during nighttime learning.

nightime

= L

total

+ L

contrast

, (18)

where, L

total

is deﬁned by Eq 8.

After learning the SubVAE, the nearest neighbors

are rebuilt from scratch. Therefore, the template vec-

tors are generated from the encoder in the SubVAE

without its adapter: µ

′

), where x

∈ RBuffer and

the label for each template is set to the corresponding

label y

∈ R.

4 EXPERIMENTS

This section presents preliminary benchmark results

using the MNIST dataset to evaluate the proposed

system. Ablation tests were conducted to assess the

impact of each technique introduced in the method.

4.1 Initial Setting

This experiment used the MNIST dataset, consisting

of grayscale images of handwritten digits (0–9) with

60,000 training and 10,000 test samples, each 28 × 28

pixels. Data were divided for incremental learning:

classes [0–4] for the 1st daytime learning and [5–9]

for the 2nd. The experiment followed two phases:

”daytime” and ”nighttime.” In daytime learning, the

model learns from new data (30 batches of [0–4] for

the 1st and [5–9] for the 2nd). During nighttime learn-

ing, the VAE reconstructs data by decoding feature

representations from both newly learned and previ-

ously learned data.

4.2 RESULTS

4.2.1 Entire Behavior of the Proposed Model

Table 2 presents the accuracies after the 1st daytime

learning, 1st sleep learning, 2nd daytime learning, and

2nd sleep learning. We can see from the table that

accuracy after daytime learning increased following

successful sleep learning.

Lifelong Learning Needs Sleep: Few-Shot Incremental Learning Enhanced by Sleep

247

Data: BaseVAE:(φ, θ), SubVAE:(φ

′

, θ

′

Adapter: θ

, M , χ

init

, D

new

initialize BaseVAE:(φ, θ) by using L

total

(χ

init

)

while True do

receive new samples

new

= {(x

, y

)}

p=1

optimize Adapter: θ

to minimize

total

new

) with

freezing BaseVAE:(φ, θ).

Append prototypes to the Nearest

Neighbors (NNs).

for (x

, y

) ∈ D

new

meant

= µ

(x)

S = S ∪ {(µ

), σ

))}

end

Expanding D

new

and old memories.

for each µ

), σ

), y

where i ∈ S do

for n = 0 to M do

R =

R ∪ {µ

(z( f

)), σ

), y

)}

end

Initialize new VAE parameters

′

= φ, θ

′

= θ

optimize φ

′

, θ

withoutAdapter.

by using L

total

(R)

Rebuild the nearest neighbors

S = Φ

for each (x

, y

) ∈ R do

mean j

= (µ

′

), σ

′

))

S = S ∪ {( f

mean j

, y

)}

end

φ = φ

′

, θ = θ

′

end

Algorithm 1: Learning Flow Algorithm.

4.2.2 Ablation Test

An ablation study evaluated the proposed method by

removing individual components. Removing the VAE

Adapter reduced recognition accuracy by 5%, while

removing contrastive loss reduced it by 7%. These

results highlight the importance of each module for

learning performance.

Ablation Test: Effect of Removing the Adapter.

The Adapter layer enables the VAE to retain past

knowledge while adapting to new data efﬁciently dur-

ing daytime, without signiﬁcant parameter updates.

Its purpose is to prevent catastrophic forgetting and

support learning new knowledge alongside existing

knowledge.

Experimental results show that removing the

Adapter layer reduced recognition accuracy by 5%

(Table 3). This highlights signiﬁcant catastrophic

forgetting, where previously learned knowledge was

rapidly lost. The Adapter improves long-term perfor-

mance by balancing knowledge retention and updates

with new data.

Figure 3: Left: without adapter, Right: with adapter.

Subjective evaluation assessed the visual quality

of generated images, focusing on classes [0, 4, 8].

Without the Adapter, results were poor. As shown

in Figure 3, images generated with the Adapter were

clearer, more stable, and less noisy. This aligns with

the decline in label prediction accuracy, highlighting

the Adapter’s role in improving model learning and

image generation.

These ﬁndings conﬁrm the Adapter’s effective-

ness in maintaining ﬂexibility and preventing catas-

trophic forgetting, crucial for balancing past and new

data.

Ablation Test: Effect of Removing the Contrastive

Loss. Contrastive loss enhances the separation of

data points in latent space, improving interclass dis-

criminability. It brings same-class points closer and

separates different-class points by a distance, increas-

ing intraclass density of latent vectors and easing

interclass discrimination. Additionally, it improves

template quality, boosting nearest-neighbor prototype

accuracy.

Removing contrastive loss reduced recognition

accuracy by 7% (Table 4), indicating insufﬁcient class

separation in latent space and degraded discrimina-

tion performance. With contrastive loss, class dis-

tances in latent space are maintained, improving clas-

siﬁcation accuracy.

In conclusion, contrastive loss is essential for en-

hancing the proposed method’s discriminative ability,

enabling clear class separation and improved perfor-

mance.

4.2.3 Sleep Learning Effectiveness

Nighttime learning was introduced to prevent accu-

racy decline after daytime learning, and its effective-

ness was veriﬁed. As shown in Table 2 ”Day2,”

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

248

Table 2: Recognition Accuracy by Class After Daytime and Nighttime Learning.

Class Day 1 (Daytime) Day 1 (Nighttime) Day 2 (Daytime) Day 2 (Nighttime)

0 99.18 98.88 98.37 89.08

1 99.47 99.47 99.47 92.51

2 82.85 81.20 80.72 67.54

3 89.90 85.64 85.15 74.52

4 89.92 90.22 89.21 67.68

5 0.00 0.00 46.86 77.91

6 0.00 0.00 62.73 91.96

7 0.00 0.00 50.97 81.32

8 0.00 0.00 35.73 84.29

9 0.00 0.00 16.55 78.30

Table 3: Accuracy Comparison Between Proposed Method

and Without Adapter after the 2nd sleep learning.

Proposed Without

Class Method(%) Adapter(%)

0 89.08 77.86

1 92.51 93.30

2 67.54 78.88

3 74.52 74.75

4 67.68 39.51

5 77.91 76.91

6 91.96 92.59

7 81.32 82.49

8 84.29 67.66

9 78.30 70.17

Average

Accuracy 80.51 75.41

Table 4: Accuracy Comparison Between Proposed Method

and Without Contrastive Loss after the 2nd sleep learning.

Proposed Without

Class Method(%) Contrastive Loss(%)

0 89.08 80.61

1 92.51 96.04

2 67.54 78.10

3 74.52 76.04

4 67.68 57.23

5 77.91 70.18

6 91.96 68.48

7 81.32 76.26

8 84.29 67.56

9 78.30 63.83

Average

Accuracy 80.51 73.43

recognition accuracy for classes 5–9 signiﬁcantly de-

creased after daytime learning but improved remark-

ably after nighttime learning. This suggests that fo-

cusing on new data during daytime learning degrades

past knowledge, lowering accuracy.

Nighttime learning uses both newly acquired data

and reconstructed past data, reinforcing prior knowl-

edge and preventing catastrophic forgetting. The

VAE’s reconstructed data supplements unseen pat-

terns, balancing past memory retention with new

knowledge acquisition.

5 DISCUSSION

Ablation studies veriﬁed the contribution of each

module in the proposed method to learning perfor-

mance. Removing elements like the Adapter and con-

trastive loss demonstrated their quantitative roles in

the method’s core mechanisms.

The proposed framework introduces a novel per-

spective by achieving efﬁcient learning with limited

data and preventing catastrophic forgetting through

daytime and nighttime learning cycles. This high-

lights its potential for sequential data processing in

real-world applications.

In the future, to further broaden the scope of the

proposed method, comparative experiments should be

conducted with other state-of-the-art methods to con-

ﬁrm its relative superiority. However, at this stage, the

signiﬁcance lies in the fact that the proposed method

offers a new direction.

6 CONCLUSIONS

This study proposed a VAE-based method with an

Adapter layer and Contrastive Loss to tackle catas-

trophic forgetting in sequential learning. A two-stage

learning process adapts to new data during daytime

and integrates knowledge at night, enabling efﬁcient

learning while retaining past knowledge.

Ablation studies showed that removing the

Adapter and Contrastive Loss reduced accuracy by

5% and 7%, respectively, highlighting their roles in

class separation and knowledge retention.

The Adapter, analogous to the hippocampus in the

biological brain, plays a crucial role in learning new

Lifelong Learning Needs Sleep: Few-Shot Incremental Learning Enhanced by Sleep

249

samples, reﬂecting the beneﬁts of mimicking biologi-

cal brain strategies in AI development.

Future work includes applying this method to

complex datasets, optimizing hyperparameters, and

comparing it with state-of-the-art sequential learning

methods to validate its effectiveness.

ACKNOWLEDGMENTS

I am grateful to Mr Naoki Kanemoto for participating

in the discussions that contributed to this paper. I also

thank the reviewers for their constructive feedback.

REFERENCES

French, R. M. (1999). Catastrophic forgetting in connec-

tionist networks. TRENDS in Cognitive Sciences,

3(4):128–135.

Golden, R., Delanois, J. E., Sanda, P., and Bazhenov,

M. (2022). Sleep prevents catastrophic forget-

ting in spiking neural networks by forming a joint

synaptic weight representation. PLoS Comput Biol,

18(11):e1010628.

Hayes, T. L., Kaﬂe, K., Shrestha, R., Acharya, M., and

Kanan, C. (2019). Remind your neural network to

prevent catastrophic forgetting. CoRR.

Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B.,

de Laroussilhe, Q., Gesmundo, A., Attariyan, M., and

Gelly, S. (2019). Parameter-efﬁcient transfer learn-

ing for nlp. In Proceedings of the 36th International

Conference on Machine Learning (ICML), 2019, page

2790–2799.

Hsu, Y.-C., Liu, Y.-C., Ramasamy, A., and Kira, Z. (2018).

Re-evaluating continual learning scenarios: A cate-

gorization and case for strong baselines. In Ben-

gio, S., Wallach, H., Larochelle, H., Grauman, K.,

Cesa-Bianchi, N., and Garnett, R., editors, Advances

in Neural Information Processing Systems 31. Curran

Associates, Inc.

Kemker, R. and Kanan, C. (2017). Fearnet: Brain-

inspired model for incremental learning. ArXiv,

abs/1711.10563.

Kemker, R. and Kanan, C. (2018). Fearnet: Brain-inspired

model for incremental learning. In International Con-

ference on Learning Representations ICLR2018.

Kingma, D. P. and Welling, M. (2014). Auto-encoding vari-

ational bayes. CoRR.

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J.,

Desjardins, G., Rusu, A. A., Milan, K., Quan, J., and

Ramalho, T. (2017). Overcoming catastrophic forget-

ting in neural networks. Proceeding of the National

Acacemy of United States of America, 114(13):3521–

3526.

Lesort, T. L., Gepperth, A., Stoian, A., and Filliat, D.

(2019). Marginal replay vs conditional replay for con-

tinual learning. In Tetko, I. V., K

urkov

a, V., Karpov,

P., and Theis, F., editors, Artiﬁcial Neural Networks

and Machine Learning – ICANN 2019, volume 11728

of Lecture Notes in Computer Science, pages 466–

480, Munich, Germany. Springer.

Liu, X., Wu, C., Menta, M., Herranz, L., Raducanu, B.,

Bagdanov, A. D., Jui, S., and van de Weijer, J. (2020).

Generative feature replay for class-incremental learn-

ing. IEEE, pages 915–924.

Mallya, A. and Lazebnik, S. (2018). Packnet: Adding mul-

tiple tasks to a single network by iterative pruning. In

Proceedings of the IEEE Conference on Computer Vi-

sion and Pattern Recognition (CVPR), pages 7765–

7773.

Parisi, G. I., Tani, J., Weber, C., and Wermter, S. (2017).

Lifelong learning of human actions with deep neural

network. Neural Networks, 96:137–149.

Rusu, A. A., Rabinowitz, N. C., Desjardins, G., Soyer,

H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R.,

and Hadsell, R. (2016). Progressive neural networks.

CoRR.

Shin, H., Lee, J. K., Kim, J., and Kim, J. (2018). Con-

tinual learning with deep generative replay. In Ben-

gio, S., Wallach, H., Larochelle, H., Grauman, K.,

Cesa-Bianchi, N., and Garnett, R., editors, Advances

in Neural Information Processing Systems 31. Curran

Associates, Inc.

van de Ven, G. M., Siegelmann, H. T., and Tolias, A. S.

(2020). Brain-inspired replay for continual learning-

with artiﬁcial neural networks. Nature communica-

tions, 11(4069):1–14.

Yamauchi, K. and Hayami, J. (2007). Incremental learning

and model selection for radial basis function network

through sleep. IEICE TRANSACTIONS on Informa-

tion and Systems, E90-D(4):722–735.

Zenke, F., Poole, B., and Ganguli, S. (2017). Continual

learning through synaptic intelligence. In Proceed-

ings of the 34th International Conference on Machine

Learning, Sydney, Australia, PMLR 70, 2017.

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

250