Enhancing Railway Obstacle Detection System Based on Incremental

Learning

Qiushi Guo, Bin Cao, Dehao Hao, Cheng Wang, Lijun Chen and Peng Yan

China SWJTU Railway Development Co., Ltd (CSRD), China

{guoqiushi, caobin, haodehao, wangcheng, chenlijun, yanpeng}@csrd.com

Keywords:

Obstacle Detection, Railway Security, Deep Learning.

Abstract:

Obstacle detection systems face challenges related to the Catastrophic forgetting problem, where old obstacles

may be misclassiﬁed when training new unseen obstacles. Re-training a model from scratch for every new

obstacle is often impractical. In this work, we propose a continual learning-based approach to efﬁciently

update the model without repeatedly retraining on previous data, while simultaneously mitigating catastrophic

forgetting. Experimental results demonstrate the effectiveness of our proposed method.

1 INTRODUCTION

With the advancement of high-speed rail systems,

their safety and reliability have garnered signiﬁcant

public attention in recent years. Obstacles within rail-

way zones represent a major threat to railway safety.

Potential obstacles encompass a wide variety of cate-

gories, including rocks, animals, pedestrians, and tree

branches. A core challenge in railway obstacle detec-

tion is the inherent difﬁculty of predeﬁning all possi-

ble obstacles. Furthermore, the dynamic environmen-

tal conditions(Guo, 2024) surrounding railway areas

add additional complexity to this task.

Deep learning-based approaches have achieved

signiﬁcant success in domains such as mobile pay-

ment(Guo et al., 2023), disaster detection(Sazara

et al., 2019), and remote sensing(Bischke et al.,

2019). Obstacle detection in railway areas can be

framed as an Out-of-Distribution (OOD) problem

(Yang et al., 2024). In real-world scenarios, obsta-

cles may not be encountered during training, present-

ing a signiﬁcant challenge. Traditional deep learning-

based approaches struggle with this issue, as models

trained on speciﬁc datasets can only detect objects

represented within the training data. One potential so-

lution is to update existing models when new obstacle

categories emerge. For example, consider an obsta-

cle detection model at stage T that is trained to detect

pedestrians. At stage T+1, the model must be updated

to detect new obstacles, such as rocks. Consequently,

the training dataset must also be expanded to include

both categories—pedestrians and rocks.

However, this approach presents several chal-

stage 1

stage2

stageT

train

predict

fine-tune

Figure 1: Demonstration of the Catastrophic Forgetting

Problem: In the early stages, both previous and current ob-

stacle categories are accurately predicted (denoted by the

green box). However, as time progresses, the likelihood

of misprediction for objects from previous stages increases,

leading to a higher probability of incorrect classiﬁcations

(denoted by the red box).

lenges. First, model updating is time-consuming

and costly, as it requires retraining the model. Sec-

ond, training a model necessitates a large volume

of data, and storing previous object images becomes

increasingly burdensome over time. Finally, when

a model is ﬁne-tuned on new data, it often experi-

ences a loss of performance on previously learned

tasks, a phenomenon known as catastrophic forget-

ting, which can be detrimental in our scenario. As

illustrated in ﬁg. 1, obstacles such as pedestrians and

rocks may be misclassiﬁed during later stages, under-

scoring the critical nature of mitigating catastrophic

forgetting(Kirkpatrick et al., 2017) in obstacle detec-

tion tasks.

In this work, we propose a incremental learning-

Guo, Q., Cao, B., Hao, D., Wang, C., Chen, L. and Yan, P.

Enhancing Railway Obstacle Detection System Based on Incremental Learning.

DOI: 10.5220/0013268300003941

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 11th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2025), pages 413-420

ISBN: 978-989-758-745-0; ISSN: 2184-495X

413

based approach to address the aforementioned chal-

lenges. To alleviate storage pressure, we introduce

an object-level memory bank that retains information

about obstacles and their corresponding priority in-

dices. The obstacle information represents previously

encountered objects, which can be fused with current

images, while the priority index determines the like-

lihood of an object being selected for inclusion in the

model. The selected objects are then overlaid onto the

current image, which incorporates both previous and

new obstacles into a single frame.

To identify previous objects that are more likely

to be misclassiﬁed, we employ two strategies. First,

we design a policy to update the memory bank. At

the end of each training stage, the priority index is

updated based on the sum of the absolute gradient

values of the pasted images. Second, we update the

shallow layers of our model at the beginning of each

stage. This targeted modiﬁcation is based on the

premise that shallow layers capture shape-related fea-

tures(Geirhos et al., 2018), which are critical to the

accuracy of our task. The details of our methods are

provided in Section 3.

In this work, our contributions can be summarized

as follow:

• We propose a data-efﬁciency approach to fuse

previous obstacles with current new emerging ob-

stacles without increasing the number of total

training images.

• We propose a memory bank and corresponding

update policy. The memory bank is updated based

on the sum of absolute gradient of fused images.

• We update the shallow layer of current network

with previous one to retain previous learned fea-

tures.

2 RELATED WORKS

2.1 Railway Obstacles Detection

In recent years, several approaches have been ap-

plied in obstacle detection in railway areas. Rahman

et.al. (Rahman et al., 2022) propose a classiﬁcation

based method which leverage MobileNetV2 to clas-

sify the obstacled images. They claim that proposed

model outperforms other approaches; Brucker et.al.

(Brucker et al., 2023) propose utilizing a shallow net-

work to learn railway segmentation from normal rail-

way images, besides they explore the controlled in-

clusion of global information by learning to hallu-

cinate obstacle-free images; Zhang et. al. (Zhang

et al., 2023)propose an intelligent obstacle detection

system based on deep learning to improve the safety

of train operation. To further improve the robust-

ness and reliability of the system, signals of other

modal signals are introduced, LiDAR, for instance.

Bai et al .(Bai et al., 2024) fuse visual and Lidar data

in real time railway obstacle detection task. How-

ever, LiDAR is expensive compared to camera and

is easily affected by temperature, which is impracti-

cal to be widely deployed in practical scenarios. Wen

et. al.(Wen et al., 2024) propose a multi-contrastive

learning strategy to improve point cloud segmentation

in complex weather for rail-obstacle detection.

2.2 Incremental Learning

Incremental learning, also known as lifelong learning

or continual learning, has been applied across various

computer vision tasks (Douillard et al., 2021; Si et al.,

2025). The primary objective of incremental learning

is to address catastrophic forgetting (Robins, 1995),

wherein a model gradually loses its ability to recog-

nize objects from earlier stages. Some approaches

retain a limited number of previous data samples or

features for integration with current data (Zhu et al.,

2023). Other methods focus on modifying network

structures to accommodate new tasks (Frankle and

Carbin, 2018), designing effective sample selection

policies for task adaptation (Prabhu et al., 2020), or

generating pseudo-labels and synthetic images to mit-

igate class imbalance (Douillard et al., 2021). To the

best of our knowledge, incremental learning has not

yet been applied to railway obstacle detection. We

believe that a well-designed incremental learning sys-

tem could improve efﬁciency while simultaneously

addressing catastrophic forgetting.

3 METHODS

In this section, we describe the components of our

system in detail. As illustrated in ﬁg. 2, our system

comprises the following key components: an Object-

based Memory Bank, an Absolute Gradient Value Up-

date Policy, and Shallow Layer Replacement. These

components work together to integrate past informa-

tion with current images, effectively addressing the

catastrophic forgetting problem.

3.1 Reformulation

Railway Obstacle Detection

Given an Image I ∈ R

C×H×W

, a sub-area M ⊂

I, which indicates that there exists a railway area

VEHITS 2025 - 11th International Conference on Vehicle Technology and Intelligent Transport Systems

414

Memory Bank

Update Module

Priority index

obj-1

obj-2

...

obj-n

（a）Object-Level fusion

（b）Model Training stage

（c）Memory Bank update

Obstacle Gallery

Memory Bank

Figure 2: (a) Data Fusion Module: In this step, objects are retrieved from the Object Memory Bank based on their priority

index. These objects are then overlaid onto the images in the current stage to integrate both past and present obstacle infor-

mation. (b) Training Process: During the training phase, the shallow layers of the convolutional neural network (CNN) are

replaced with the corresponding layers from the previous stage at the start of each new training cycle. (c) Memory Bank

Update: The memory bank is updated at the end of each training stage, with updates determined by the absolute gradient

change of each fused image.

in the image; an obstacle set with n classes

O =

{

,...,o

}

, The Railway obstacles detec-

tion system aims to obtain a CNN model ϕ : I 7−→

{

, p

}

, where B

{

,h,w

}

. p

∈ [0, 1].

y =

(

0, if p

> H and ∃i,s.t.,B

M ̸=

1, else

(1)

where y is the prediction of the system, 1 is obstacles

detected and 0 is no obstacles; H is the threshold of

conﬁdence of detected candidates, which can be ad-

justed based on different scenarios.

Incremental Learning

Assuming we are given a data stream D =

{

,...

}

, D

∩ D

0. θ

is the parameters of

model f

(·), which is updated at stage i. The model is

initialized based on D

(·) = ψ

) (2)

where ψ

is training strategy in stage 0. At stage T ,

the model is updated in following way:

= φ

bank

, f

T −1

). (3)

Where D

bank

is memory bank, which saves informa-

tion of previous D

i,...,T −1

. φ(·) is the update strategy,

which takes previous model, memory bank and cur-

rent data as input. The updated strategies may vary

along with the change of stages.

3.2 Memory-Bank Module

Object-Level Memory Bank

Previous continual learning based approach save

whole image in Memory-Bank(Qu et al., 2021). With

the accumulation of time, the memory bank will suf-

fer from storage burden. To alleviate this issue, we

propose a object-level memory bank. We extract tar-

get obstacle objects in each stage. The extract object

patches are no bigger than the original images. In this

way, one training image contain both previous obsta-

cles and current obstacles. To simulate the practical

scenarios, the objects need to be re-scaled according

to their pasted positions.

k =

∑

patch

(4)

where s

is the size of original image. We iterate our

dataset, calculate the average k in different categories.

as illustrated in 1, object-level can save 90% space

Table 1: Object-image ratios in our scenario.

Pedestrian rocks Branch board avg.

k 0.12 0.07 0.09 0.11 0.10

compared to image-level approach.

Extracted patches are then fed into Segment-

Anything-Model(SAM) to obtain corresponding

Enhancing Railway Obstacle Detection System Based on Incremental Learning

415

mask, which will be used in our copy paste step.

Copy-Paste Object Fusion

Traditional approaches also suffer from the burden of

data accumulating. Assuming we have N stages, in

each stages we have M images. Training time of an

batch of images(B) is P. The estimated time

T is cal-

culated as follow:

T =

M × N × P

(5)

The Training costs become prohibitively expen-

sive as N increases, Not to mention the impact caused

by class imbalance.

To alleviate this problem, we introduce a copy-

paste approach to fuse history object with current cat-

egories. The procedures are as follow: ﬁrstly, pre-

vious obstacle object are sampled based on priority

index in memory bank. As illustrated part a in 2, the

sampled objects are pasted on the current stage im-

ages according to Mask. In this approach, the training

image remain the same in each stage. This approach

simultaneously reduces training time and maintains

data class balance.

I = M ⊗ P + (1 − M) ⊗ I (6)

Update Policy

The core component of the Object Memory Bank

module is the Update Policy, which serves two main

functions. First, it identiﬁes objects or categories

that are more likely to be misclassiﬁed; second,

it refreshes the Memory Bank to prevent excessive

growth. In deep neural networks, changes in gra-

dient values can indicate the model’s sensitivity to

speciﬁc input data (Selvaraju et al., 2017). Given

a trained model and an input image, gradient val-

ues are obtained through an inference process, where

higher values indicate greater sensitivity to that ob-

ject class. Additionally, objects within the same cat-

egory contribute differently to model training. To as-

sess the priority of objects in the Memory Bank, we

introduce the Absolute Gradient Value Update Pol-

icy. Given a Model f (·) and fuse image samples

{

,...,I

}

. At the end of each stage, the pri-

ority index of each image can be calculated as follow:

∑

i=1

∂ f (I

)

∂βW

(7)

where n is the number of kernels in model, W

is the

corresponding weights in kernels. β

is a parameter

to adjust the importance of each channel, which is be-

tween 0.1 to 0.8, shallow layer is with a larger value

compared to deeper layer. At stage K, a priority queue

is formed based p

, 1/k items in L

will be replaced

randomly by current stage obstacles.

3.3 Shallow-Channels Fusion

Another contributor to catastrophic forgetting is the

shift in kernel weights.(Jin et al., 2021) During model

ﬁne-tuning, the weights in kernels responsible for

recognizing previous objects are overwritten by new

weights, leading to an irreversible shift. This weight

shift can worsen over time, compounding the issue.

To address this, we propose a kernel-level informa-

tion fusion mechanism.

Previous studies have shown that shallow CNN

layers are more effective at capturing shape-based

features, while deeper layers tend to capture texture-

based features(Hermann et al., 2020). Since shape-

based features are robust and crucial for obstacle-

related tasks, we enhance our model’s resilience

against catastrophic forgetting by replacing the shal-

low CNN layers, enabling better retention of earlier

learned features. The target channel k is determined

as follow:

= arg max

∂ f (I

)

∂W

(8)

channels = [k

,.., k

] (9)

Here, I

represents a randomly sampled fused im-

age with a high priority index. At the start of each

stage, channels from the ﬁne-tuned model in the pre-

vious stage randomly replace the corresponding chan-

nels in the current model. Through experimenta-

tion, we observed that fusing channels from multiple

prior models can slightly improve robustness. How-

ever, this approach also increases storage require-

ments and inference time. As a trade-off, we choose

to replace channels using only the model from the

immediately preceding stage. yellow box), respec-

tively. On the right: after training, intra-class dis-

tances become smaller, while inter-class distances be-

come larger. (ViT) (Dosovitskiy, 2020) as its back-

bone, whereas the light-type employs a modiﬁed ver-

sion of MobileNetV2 (Sandler et al., 2018). To re-

duce the model size further, we compress the original

MobileNetV2 by reducing its width, i.e., decreasing

the number of channels. Both ViT and the modiﬁed

MobileNetV2 replace the ﬁnal layer with a fully con-

nected layer of size n × 128.

3.4 Pseudo Code

The pseudo code of our update policy is illustrated as

algorithm 1.

VEHITS 2025 - 11th International Conference on Vehicle Technology and Intelligent Transport Systems

416

(a) ballasted track

(b) ballastless track

Figure 3: Experiment sites and obstacles in our work. ballasted track(a) is around 70 meters ling and 8 meters wide; b)

ballastless track is around 100 meters long and 6 meters wide. Obstacles from left to right: rocks(20cm), rocks(30cm),

rocks(40cm), parcels, pedestrians, steel board and branches.

Algorithm 1: Update policy.

Input: Model f (·), Memory Bank M

T −1

image set in stage T

{

,...,I

}

Result: Updated Memory Bank M

Data: Priority index P = [...]

foreach I

// copy-paste to generate fused

image

f use

= m ⊗ P + (1 − m) ⊗ I

;

// Calculate absolute gradient

across layers

∑

i=1

∂ f (I

f use

)

∂βW

;

P.append(p

);

end

sort(P);

for i = 0 to n do

// Replace Memory bank objects

according to priority index

M .Patch[i] = ranndom(ob j

,P[i])

end

4 EXPERIMENTS

4.1 Conﬁguration

Our method is implemented using the PyTorch frame-

work and the model is trained on an RTX 3090Ti, with

24 GB memory. CPU processor is Intel i7-12700F

with 20 cores. We select Adam(Kingma, 2014) as the

optimizer. Starting learning rate is set to 0.001. The

batch size is set to 16 and the number of epochs to

25. Albumentation(Buslaev et al., 2020) is utilized

to perform data augmentation. Data transformations

include horizontal ﬂip, coarse dropout, and random

brightness contrast adjustments.

4.2 Dataset & Policy

Dataset

The images used in this study were captured at our

experimental site in Chengdu, China. As shown in

ﬁg. 3, the site includes two types of railway scenarios:

ballastless track and ballasted track. The obstacles

we prepared include rocks of varying sizes (20 cm,

30 cm, and 40 cm), branches, pedestrians, parcels,

and steel boards. For each stage, we collected 2,000

images—1,200 on ballastless track and 1,200 on bal-

lasted track. Obstacles were positioned at varying dis-

tances between 10 m and 50 m. The dataset was split

randomly into training and test sets, with an 80:20 ra-

tio.

Policy

The categories of obstacles—rocks (20 cm, 30 cm,

40 cm), parcels, pedestrians, steel boards, and

branches—are designated as A through G, respec-

tively. In each training stage, only data from one

Enhancing Railway Obstacle Detection System Based on Incremental Learning

417

Table 2: Results w/wo our approach across popular detection architectures.

D/A → A D/B → B D/C → C D/D → D D /E → E D/F → F D/G → G

Yolo v5(Jocher, 2020) 0.824 0.837 0.842 0.791 0.917 0.684 0.635

Yolov5+Ours 0.831 0.849 0.857 0.811 0.934 0.713 0.651

Yolo v10(Wang et al., 2024) 0.819 0.847 0.837 0.811 0.924 0.714 0.672

Yolov10+Ours 0.827 0.853 0.859 0.836 0.928 0.733 0.691

Faster Rcnn(Ren et al., 2016) 0.807 0.825 0.831 0.776 0.909 0.723 0.667

Faster Rcnn+Ours 0.813 0.834 0.844 0.791 0.924 0.737 0.681

nanodet(RangiLyu, 2021) 0.821 0.826 0.847 0.808 0.916 0.696 0.659

nanodet+Ours 0.833 0.829 0.857 0.831 0.931 0.717 0.683

Swin(Liu et al., 2021) 0.827 0.841 0.847 0.822 0.938 0.735 0.649

Swin+Ours 0.839 0.857 0.862 0.831 0.946 0.749 0.661

Table 3: Ablation study on sub-component.

copy-paste Layer-fusion GMBU mIoU

0.813

✓ 0.847

✓ 0.821

✓ 0.871

✓ ✓ ✓ 0.889

Table 4: Ablation results on pasted objects fused on images.

# objs 1 2 3 4 5 6

mIoU 0.864 0.875 0.894 0.889 0.896 0.890

category and the corresponding entries in the Object

Memory Bank are accessible. The notation D/A → A

indicates that the current task is to detect category A,

where D represents the union of all categories. We set

the training steps to 2, meaning that two categories are

trained in each step.

For example, in the stage D/A → A, the model

is ﬁne-tuned using data from category A along with

fused data from other categories. The test set consists

of all images except those from category A, as our

objective is to assess the effectiveness of the method

in addressing catastrophic forgetting.

4.3 Results

Quantitative Evaluation

The results of quantitative evaluation is illustrated

as below. We deploy our approach on several pop-

ular detection architectures(YOLO-V5, YOLO-V10,

Faster-RCNN, Swin, nanodet) to verify the effec-

tiveness of it. The results indicates that our pro-

posed methods are effective across all architectures

we listed. The improvement of mIoU range from

0.006 to 0.024. Besides, it is noticeable that the ef-

fects of Catastrophic Forgetting is different across cat-

egories. It is severe in D/F −→ F setting, which in-

dicates that the learned features of steel board may

overlap the previous ones. The setting D/A −→

A,D/B −→ B and D/C −→ C show subtle improve-

ments indicates that cubes share similar features in

terms of contours and shapes.

Ablation Study

To verify the effectiveness of each module, we con-

duct an ablation study in terms of Copy-Paste, Layer

fusion and Gradient-Memory-Bank update module.

The results are illustrated in table 3. One point that

needs to be clariﬁed is, when GMBU is not activated,

we adapt a random update policy to update the mem-

ory bank. The results indicate that each part con-

tribute to improve the whole system, while GMBU

contributes the most.

We also conduct experiments to determine the best

number of pasted objects on current images, the re-

sults are illustrated as table 4. The results indicate

that the performance increase when number of objects

range from 1 to 4 and become saturate after that.

5 DISCUSSION

In this work, we have proposed a continual learning-

based approach to mitigate the catastrophic forget-

ting problem in obstacle detection for railway scenar-

ios. The proposed Object-Memory Bank reduces both

storage and training burdens simultaneously. The

memory bank update policy, along with modiﬁcations

to the CNN layers, facilitates the retention of previ-

ously learned features in later stages. Future work

could focus on optimizing the object fusion process,

such as incorporating light and shadow information

during the image generation stage, which would en-

hance the realism of the data. Additionally, further

VEHITS 2025 - 11th International Conference on Vehicle Technology and Intelligent Transport Systems

418

research on feature-based methods is needed to com-

press the memory bank more effectively.

REFERENCES

Bai, R., Wu, Z., and Xu, T. (2024). A lightweight camera

and lidar fusion framework for railway transit obsta-

cle detection. In Proceedings of the 2024 3rd Asia

Conference on Algorithms, Computing and Machine

Learning, pages 303–308.

Bischke, B., Helber, P., Folz, J., Borth, D., and Dengel, A.

(2019). Multi-task learning for segmentation of build-

ing footprints with deep neural networks. In 2019

IEEE International Conference on Image Processing

(ICIP), pages 1480–1484. IEEE.

Brucker, M., Cramariuc, A., Von Einem, C., Siegwart,

R., and Cadena, C. (2023). Local and global infor-

mation in obstacle detection on railway tracks. In

2023 IEEE/RSJ International Conference on Intelli-

gent Robots and Systems (IROS), pages 9049–9056.

IEEE.

Buslaev, A., Iglovikov, V. I., Khvedchenya, E., Parinov, A.,

Druzhinin, M., and Kalinin, A. A. (2020). Albumen-

tations: Fast and ﬂexible image augmentations. Infor-

mation, 11(2).

Dosovitskiy, A. (2020). An image is worth 16x16 words:

Transformers for image recognition at scale. arXiv

preprint arXiv:2010.11929.

Douillard, A., Chen, Y., Dapogny, A., and Cord, M.

(2021). Plop: Learning without forgetting for con-

tinual semantic segmentation. In Proceedings of the

IEEE/CVF conference on computer vision and pattern

recognition, pages 4040–4050.

Frankle, J. and Carbin, M. (2018). The lottery ticket hypoth-

esis: Finding sparse, trainable neural networks. arXiv

preprint arXiv:1803.03635.

Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wich-

mann, F. A., and Brendel, W. (2018). Imagenet-

trained cnns are biased towards texture; increasing

shape bias improves accuracy and robustness. arXiv

preprint arXiv:1811.12231.

Guo, Q. (2024). A universal railway obstacle detection sys-

tem based on semi-supervised segmentation and opti-

cal ﬂow. arXiv preprint arXiv:2406.18908.

Guo, Q., Chen, Y., Yao, Y., Zhang, T., and Ma, J. (2023).

A real-time chinese food auto billing system based on

instance segmentation. In 2023 IEEE Region 10 Sym-

posium (TENSYMP), pages 1–5. IEEE.

Hermann, K., Chen, T., and Kornblith, S. (2020). The ori-

gins and prevalence of texture bias in convolutional

neural networks. Advances in Neural Information

Processing Systems, 33:19000–19015.

Jin, X., Sadhu, A., Du, J., and Ren, X. (2021). Gradient-

based editing of memory examples for online task-free

continual learning. Advances in Neural Information

Processing Systems, 34:29193–29205.

Jocher, G. (2020). Ultralytics yolov5.

Kingma, D. P. (2014). Adam: A method for stochastic op-

timization. arXiv preprint arXiv:1412.6980.

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J.,

Desjardins, G., Rusu, A. A., Milan, K., Quan, J.,

Ramalho, T., Grabska-Barwinska, A., et al. (2017).

Overcoming catastrophic forgetting in neural net-

works. Proceedings of the national academy of sci-

ences, 114(13):3521–3526.

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin,

S., and Guo, B. (2021). Swin transformer: Hierar-

chical vision transformer using shifted windows. In

Proceedings of the IEEE/CVF international confer-

ence on computer vision, pages 10012–10022.

Prabhu, A., Torr, P. H., and Dokania, P. K. (2020). Gdumb:

A simple approach that questions our progress in con-

tinual learning. In Computer Vision–ECCV 2020:

16th European Conference, Glasgow, UK, August 23–

28, 2020, Proceedings, Part II 16, pages 524–540.

Springer.

Qu, H., Rahmani, H., Xu, L., Williams, B., and Liu,

J. (2021). Recent advances of continual learning

in computer vision: An overview. arXiv preprint

arXiv:2109.11369.

Rahman, F. U., Ahmed, M. T., Hasan, M. M., and Jahan,

N. (2022). Real-time obstacle detection over railway

track using deep neural networks. Procedia Computer

Science, 215:289–298.

RangiLyu (2021). Nanodet-plus: Super fast and high accu-

racy lightweight anchor-free object detection model.

https://github.com/RangiLyu/nanodet.

Ren, S., He, K., Girshick, R., and Sun, J. (2016). Faster

r-cnn: Towards real-time object detection with re-

gion proposal networks. IEEE transactions on pattern

analysis and machine intelligence, 39(6):1137–1149.

Robins, A. (1995). Catastrophic forgetting, rehearsal and

pseudorehearsal. Connection Science, 7(2):123–146.

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and

Chen, L.-C. (2018). Mobilenetv2: Inverted residu-

als and linear bottlenecks. In Proceedings of the IEEE

conference on computer vision and pattern recogni-

tion, pages 4510–4520.

Sazara, C., Cetin, M., and Iftekharuddin, K. M. (2019). De-

tecting ﬂoodwater on roadways from image data with

handcrafted features and deep transfer learning. In

2019 IEEE intelligent transportation systems confer-

ence (ITSC), pages 804–809. IEEE.

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,

Parikh, D., and Batra, D. (2017). Grad-cam: Visual

explanations from deep networks via gradient-based

localization. In Proceedings of the IEEE international

conference on computer vision, pages 618–626.

Si, C., Wang, X., Yang, X., and Shen, W. (2025). Tendency-

driven mutual exclusivity for weakly supervised incre-

mental semantic segmentation. In European Confer-

ence on Computer Vision, pages 37–54. Springer.

Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J.,

and Ding, G. (2024). Yolov10: Real-time end-to-end

object detection. arXiv preprint arXiv:2405.14458.

Wen, L., Peng, Y., Lin, M., Gan, N., and Tan, R. (2024).

Multi-modal contrastive learning for lidar point cloud

Enhancing Railway Obstacle Detection System Based on Incremental Learning

419

rail-obstacle detection in complex weather. Electron-

ics, 13(1):220.

Yang, J., Zhou, K., Li, Y., and Liu, Z. (2024). Generalized

out-of-distribution detection: A survey. International

Journal of Computer Vision, pages 1–28.

Zhang, Q., Yan, F., Song, W., Wang, R., and Li, G. (2023).

Automatic obstacle detection method for the train

based on deep learning. Sustainability, 15(2):1184.

Zhu, L., Chen, T., Yin, J., See, S., and Liu, J. (2023). Con-

tinual semantic segmentation with automatic memory

sample selection. In Proceedings of the IEEE/CVF

Conference on Computer Vision and Pattern Recogni-

tion, pages 3082–3092.

VEHITS 2025 - 11th International Conference on Vehicle Technology and Intelligent Transport Systems

420