Towards Resource-Efﬁcient Deep Learning for Train Scene Semantic

Segmentation

Marie-Claire Iatrides

1,2

, Petra Gomez-Kr

amer

, Olfa Ben Ahmed

and Sylvain Marchand

L3i Laboratory, La Rochelle University, La Rochelle, France

Association Ferrocampus, Saintes, France

Xlim Institute of Research, Poitiers University, Poitiers, France

{marie-claire.iatrides, petra.gomez, sylvain.marchand}@univ-lr.fr, olfa.ben.ahmed@univ-poitiers.fr

Keywords:

Lightweight CNN, Resource-Efﬁcient ML, Semantic Segmentation, Deep Learning, Train Environment.

Abstract:

In this paper, we present a promising application of scaling techniques for segmentation tasks in a railway

environment context to highlight the advantages of task speciﬁc models tailored for on-board train use. Smaller

convolutional neural networks (CNNs) do not focus on accuracy but resource efﬁciency. Our models are scaled

using skip connections as well as quantization in order to form lightweight models trained speciﬁcally for our

context. The proposed models have been evaluated both in terms of segmentation performance and efﬁciency

on state of the art scene segmentation datasets namely RailSem19 and Cityscapes. We have obtained models

with less than 3.5M parameters and a minimum of 78.4% of segmentation accuracy showing that lightweight

models can effectively segment the railway surroundings.

1 INTRODUCTION

Autonomous systems are the next step towards opti-

mizing transportation systems. The railway industry

is one of the most developed transportation modes in

France with over 27,483 km of rail tracks operated

by the SNCF (Soci

e Nationale des Chemins de Fer

franc¸ais). Because of its direct interactions with na-

ture, the railroad network requires considerable main-

tenance. Another challenge is that a lot of the technol-

ogy involved in our railroad system has not undergone

major renovations and therefore presents numerous

possibilities for innovation. For instance, some ac-

tive regional express railways date back to the 1980s

and little improvement has been made on parts of the

infrastructure. In this context, there are several major

challenges to tackle the development of both practi-

cal and environment friendly systems in the railway

industry. One of the major concerns of railway oper-

ations is the maintenance of the vegetation in the rail

surroundings. Vegetation has multiple impacts on the

rail trafﬁc: it creates obstacles on the rails, endangers

ballast stability or can even cause derailments. As a

result, it is one of the main concerns for smaller re-

gional lines that are not isolated like most high speed

rails and are in direct interaction with nature.

We present in this article a study of resource-

efﬁcient models for the detection and segmentation

of the train’s environment. We aim to develop an on-

board system that uses solely images as input to elab-

orate an analysis of the train’s surrounding. We fo-

cus on lightweight models to limit energy consump-

tion, CO

emissions and optimize memory space on-

board. With this in mind, semantic segmentation of

the scene allows for the detection of the different ele-

ments of the environment. As an embedded system, it

should follow a certain set of constraints, mainly re-

garding memory consumption, computer power and

inference. We therefore explored scaling techniques

in order to reduce the impact of our models. Scal-

ing can be approached through multiple angles such

as limited input information, model size and mem-

ory consumption. The aim is to determine whether

images are sufﬁcient to perform predictive mainte-

nance on vegetation in the environment of trains with

a frugal deep learning approach. To that end, the

contributions of our work are a comparative study

for resource-efﬁcient convolutional neural networks

(CNNs) for semantic segmentation, an adaptation of

scaling techniques from classiﬁcation tasks to seg-

mentation, and then, a study of the effects of quanti-

zation on these CNNs and their limitations with more

complex datasets.

The remainder of this article is structured as fol-

lows. Firstly, we explore previous research related to

our task. Secondly, we deﬁne the perimeter we have

Iatrides, M.-C., Gomez-Krämer, P., Ben Ahmed, O. and Marchand, S.

Towards Resource-Efﬁcient Deep Learning for Train Scene Semantic Segmentation.

DOI: 10.5220/0013139000003912

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 3: VISAPP, pages

347-354

ISBN: 978-989-758-728-3; ISSN: 2184-4321

347

chosen for our work and its application. And then, we

describe our approach to introduce our results. Lastly,

we discuss the implications of the said results in or-

der to conclude on their implications towards future

work.

2 RELATED WORK

In this section, we brieﬂy review related work with

respect to autonomous trains and the maintenance of

their infrastructure. Furthermore, we discuss frugal

techniques in machine learning applications and im-

age segmentation methods.

2.1 Context

In recent years, there has been much progress in au-

tonomous driving surveillance systems (Zakaria et al.,

2022), but most are limited to either regular road ve-

hicles or trains operating in controlled environments

such as subways (Singh et al., 2021). While simi-

lar, operating conditions are much more complicated

for outdoor trains and require more frequent main-

tenance. In a previous work (Skibicki and Licow,

2022), researchers working with the Swedish railway

industry developed a vision-based method to detect

weed presence on tracks to evaluate the infestation

level. To the best of our knowledge, the research

community has yet to produce a complete system to

ensure maintenance of the surrounding vegetation in

the railway environment. Many works focus solely on

the application of autonomous systems for high speed

rail (Yin et al., 2020). On the other hand, research on

open environment lines are still very much lacking.

There are multiple ways to implement autonomous

systems in trains (Singh et al., 2021), each utilizing

different sources of information and applying a wide

variety of processing operations. A systematic review

of open-source datasets for railroad applications (Pap-

paterra et al., 2021) clearly shows that, while many

domains have been well explored for which data was

made available, there is still a lack of data for image

segmentation. In his thesis work (Duquene, 2023),

Duquene explores the use of imitation and reinforce-

ment learning architectures to create an autonomous

driving system for trains, primarily for speed control.

2.2 Image Segmentation

Previous works, such as RailNet (Wang et al., 2019),

introduced a deep-learning-based model for the bi-

nary semantic segmentation of railway environments

designed for railroad appreciation. This model shows

a high detection rate, but a relatively low accuracy.

The introduction of the SqueezeNet (Iandola et al.,

2016) architecture brought about the development of

light and fast semantic segmentation networks like

SqueezeUNet (Beheshti and Johnsson, 2020). This

work introduces the ﬁre module that both reduces the

size of the model compared to a regular UNet and

has a 17 times lower inference. Another comparable

work is the Squeeze-SegNet (Nanfack et al., 2017)

that fuses the enhancing methods brought by SegNet

(Badrinarayanan et al., 2017) with those of the ﬁre

module.

2.3 Scaling of Resources

Following an introduction to frugality in machine

learning (Evchenko et al., 2021), there are three main

aspects of frugality in this domain: cost of data, com-

putation process requirements and model character-

istics. There can be multiple approaches to this ob-

jective, but the most evident is to reduce the size of

the model as a whole, without modifying its struc-

ture. We call this method quantization. It converts

stored model weights into a smaller format. In a pre-

vious work, authors achieve a 4 to 8 times reduc-

tion of a CNN’s size through quantization and prun-

ing (Ahamad et al., 2022) with minimal drop in accu-

racy. In another thesis work, the author explored the

three types of quantization: data-free, gradient-based

post-training quantization (GPTQ) and quantization-

aware training (Yvinec, 2023). These methods show a

signiﬁcantly unstable trade-off between accuracy and

speed. Another method to reduce the size of a model

is through its architecture, for example, skip connec-

tions.

Before the introduction of skip connections, mod-

els were limited in their depth by the vanishing gra-

dient. ResNet (He et al., 2016) was one of the ﬁrst

works to ﬁnd a solution to this challenge with the

introduction of the residual connection. EfﬁcientNet

(Tan and Le, 2019) took this idea even further through

balance and optimization of smaller models for object

classiﬁcation. Another signiﬁcant work is the UNet

(Ronneberger et al., 2015) which implements long

skip connections between the encoder and decoder

blocks. In the ﬁeld of classiﬁcation, SqueezeNet (Ian-

dola et al., 2016) also paved the way for small model

architectures by introducing the ﬁre module. And

lastly, DenseNet (Huang et al., 2017) introduced an-

other type of skip connections with the dense block.

It allows for better performance with smaller models

and makes better use of the existing parameters. In

our work, we have modiﬁed such models to be used

and compared in a semantic segmentation task. To do

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

348

so, we adapt the methods discussed in this section to

a speciﬁc application, the railroad environment. This

context brings strict requirements in model, input and

memory size. As such, these models and techniques

show clear potential for our application. Our ap-

proach also aims to determine the limit of model size

reduction using the aforementioned techniques and to

test the performances of our models under hard re-

source constraints.

3 PERIMETER OF STUDY

In railway tasks, there is still a lack of open-source

datasets. Existing image datasets for scene un-

derstanding are mainly limited to the detection of

signs, i.e. FRSign (Harb et al., 2020), of rails, i.e.

RailSet (Zouaoui et al., 2022) or Rail-DB (Li and

Peng, 2023), and thorough semantic segmentation,

i.e. RailSem19 (Zendel et al., 2019). We describe

in the following the task we address in this article as

well as the dataset we used.

Segmentation can serve to detect rails, but also

provides a gateway to infrastructure surveillance and

predictive maintenance. Analyzing what surrounds

the train allows for a more comprehensive system that

can adapt to its surroundings. As such, this task can

lead the system to prevent obstacle collision, infras-

tructure degradation or even critical failure of the rail-

way system. RailSem19 (Zendel et al., 2019) is intro-

duced as a complete dataset with 8,500 images con-

taining dense pixel-wise annotations for 19 classes.

Varying weather and lighting conditions are repre-

sented as well as speed associated blur effects and

lighting artifacts due to tunnels and nighttime capture.

Images are extracted from video sequences with an

ego-vehicle point of view with a 1920 × 1080 pixel

resolution.

Figure 1: Segmentation of a railway section following

SNCF regulations. (© SNCF R

eseau, modiﬁed).

This dataset is inspired by the Cityscapes dataset

both in terms of classes and labeling policy. Some im-

ages of trams and city railways are even taken from

Cityscapes to form RailSem19. RailSem19 contains

a thorough analysis of the elements comprised in the

railroad environment with an accurate declination of

its signalization system. All in all, these elements are

essential in a global analysis of the environment of

trains, but we are not interested in the elements that

provide trafﬁc information in our task. For the anal-

ysis considered in this work, the original 19 classes

were modiﬁed to ﬁt into 8 classes (Table 1). These

classes represent the zones of interest deﬁned by the

SNCF for the analysis and evaluation of the vegeta-

tion in the railroad vicinity (Figure 1). Henceforth,

the modiﬁed dataset will be referred to as RailSem8.

4 SEMANTIC APPROACH

This work originates from the need of an industrial

application for predictive maintenance and environ-

ment perception in trains. In order to solve this task,

we focus on the surveillance of vegetation and its im-

pact on the region of circulation. In order to perceive

the environment, this work implements semantic seg-

mentation of images taken from the ego-perspective

of trains. In this section, we present the different mod-

els that where selected following the energy, storage

and computation power constraints necessary for on-

board systems. We also explore the implementation

of scaling techniques to the models for the develop-

ment of resource-efﬁcient deep learning techniques.

Semantic segmentation provides a pixel-wise classi-

ﬁcation into a class through mapping. As a result, it

provides a detailed description of the entities that con-

stitute the environment of trains. While this task is

relatively costly in terms of computation, results can

provide ﬁne-grained information on the input content.

4.1 Scaling Techniques

Scaling techniques were explored to limit the model’s

size and parameters without compromising results.

This work mainly focuses on the use of skip connec-

tions to limit the degradation of gradients and con-

serve high details. The downside of this architec-

ture is that the number of parameters is still rela-

tively high to be considered resource-efﬁcient. For

example, a ResNet-18, the smallest publicly avail-

able network using this type of architecture, has a

total of 11.4M parameters. As a result, this type of

skip connection is not further explored in this work.

Dense connection-based architectures like DenseNet

(Huang et al., 2017) networks have a relatively low

number of parameters with similar performances. For

example, DenseNet-56, with 56 layers as opposed to

the original DenseNet-121, only counts 1.5M parame-

Towards Resource-Efﬁcient Deep Learning for Train Scene Semantic Segmentation

349

Table 1: RailSem19 modiﬁed dense labels to RailSem8; In Frames = %age of frames with pixels in the corresponding class.

Label RS8 Rails Track Track-bed Terrain

Label RS19 rail-raised, rail-embedded tram-track, rail-track track-bed terrain

In Frames 100.0% 100.0% 87.6% 61.2%

Label RS8 Vegetation Sky Obstacles Background

Label RS19 vegetation sky human, car, truck, on-rails road, sidewalk, construction, fence, pole, trafﬁc-light, trafﬁc-sign, void

In Frames 83.3% 94.5% 13.8 ∼ 24.4% 72.1 ∼ 100.0%

ters. As a result, this technique seems very promising

for the task at hand in terms of model scaling. The

last method is the long skip. UNets are comparably

smaller models by design and are optimized for fast

processing, performance in image analysis and gener-

alization purposes. In order to proﬁt from these char-

acteristics, this type of models was also explored in

this work.

4.2 Models

Four networks were implemented for this work:

UNet, Squeeze-UNet, DenseNet-67 and DenseNet-

56. All of these models are built with 4 en-

coder blocks and decoder blocks except DenseNet-

67, which has 5 encoder and decoder blocks as de-

scribed in Table 2. Based on the UNet (Ronneberger

et al., 2015) architecture an original version was im-

plemented for this project. While being relatively

big, UNet has a much faster inference time. The

model was adapted from the aforementioned meth-

ods and modiﬁed to a UNet-like architecture in or-

der to compare it to the other architectures explored

in this work. We implemented our own Squeeze-

UNet based on the description of the SqueezeSegNet,

a modiﬁed SqueezeNet. On the other hand, DenseNet

was originally developed for classiﬁcation with 121

layers (Huang et al., 2017). In this work, we modi-

ﬁed and implemented two versions, much smaller in

scale, DenseNet-56 and DenseNet-67. These models

have much fewer operations and parameters (Tables

3 and 4) and thus show promise in terms of mem-

ory efﬁciency. All selected models were chosen as

archetypes of scaling techniques in terms of inference

speed, performance improvements, and have shown

very promising results in similar applications.

In our work, we pre-processed data with size re-

duction in order to test the limits of input resolution

towards performance. We know that the size of pro-

cess during training is directly proportional to the size

of data. When reducing the image resolution, we

inevitably reduce the model’s capacity to recognize

ﬁne details which then impacts the results and per-

formance. On the other hand, smaller images implies

higher batching of images is possible during training.

Increasing the batchsize allows for a better general-

ization during training which then improves perfor-

mance. All in all, the preprocessing allows for smaller

processes and improves generalization.

5 EXPERIMENTS AND RESULTS

All models in this work were trained from scratch on a

NVIDIA RTX A-6000 GPU with a 2.45GHz proces-

sor and AMD EPYC 7763 CPUs installed with 1To

of RAM. During initial tests, the models were trained

for 100 epochs and then for 200 epochs. All training

was done with a 1e-5 learning rate, a batchsize of 8

for training and 4 for validation. The loss function is

a cross entropy loss minimized through an Adam opti-

mizer. These parameters were chosen through empir-

ical analysis in order to ﬁnd the best combination of

input parameters and evaluation processes. For our

experiments, we divided the datasets into 60-20-20

proportions for training, validation and testing sets

with a random split manual seed set at 0 for repeti-

tion. As for data pre-processing, we have worked to

enforce input efﬁciency by reducing the resolution of

the image from its original 1920×1080 to 320×240.

5.1 Model Comparison

As presented in Table 4, most models studied are

much smaller in terms of size compared to the state-

of-the-art shallow network, EfﬁcientNet (Tan and Le,

2019) that counts 11.3M parameters. Squeeze-UNet,

counting 8.1M parameters (Table 4), was found to

perform poorly (Table 5) with a 0.691 Dice Score

compared to a minimum of 0.78 for the others and

returns masks that visually reveal a heavy error rate

(Figure 4) and was therefore pushed aside from our

analysis. Results in Figure 4 show there is a weak-

ness towards night view images, but visually have a

high performance even under rain or fog.

We also performed a convergence analysis of our

models over 100 epochs (Figure 2). The model was

tested at each modiﬁcation of the weights after the

validation step. From this graph, we can see a rapid

increase of the quality of the performance accord-

ing to the Dice score in the early stages of training

and after the 20

iteration, the system stabilizes on

a linear convergence curve with R

= 0.85. Later re-

sults in Dice score, after 200 epochs of training (Table

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

350

Table 2: Model architectures through blocks, each model is implemented with 4 or 5 encoder and decoder blocks.

Blocks UNet Squeeze-UNet DenseNet-56/-67

Encoder

[Conv3x3 + BatchNorm + ReLU] x2 [FireLayer] x2 [BatchNorm + ReLU + Conv3x3 + Dropout] x4 (x5)

MaxPool2x2 Conv4x4 [BatchNorm + ReLU + Conv1x1 + Dropout + MaxPool]

BottleNeck Conv1x1 [FireLayer] x2 [BatchNorm + ReLU + Conv3x3 + Dropout]

Decoder

ConvTranspose2x2 TransFireLayer ConvTranspose3x3

[Conv3x3 + BatchNorm + ReLU] x2 [FireLayer] x2 [BatchNorm + ReLU + Conv3x3 + Dropout] x4 (x5)

Table 3: Model size and operation counts.

Model UNet Squeeze-UNet DenseNet-56/-67

Params 31.0 M 8.1 M 1.4 M 3.5 M

FLOPs 118.2x10

111.5x10

27.6x10

64.5x10

MACs 59.0x10

55.6x10

13.6x10

31.9x10

Table 4: Model characteristics.

Model Inference time(ms) Check-point ﬁle(MB) Process(GB) Parameters

UNet 5.5 124.3 1.49 31.0 M

Squeeze-UNet 10.0 32.6 1.84 8.1 M

DenseNet-67 17.8 14.2 9.50 3.5 M

DenseNet-56 12.2 6.2 3.17 1.4 M

Figure 2: DenseNet-67 convergence graph over 100 epochs.

4), suggest a diminution of the learning curve with a

reduction of the impact of additional epochs on the

model’s performance. Between 100 and 200 epochs

for DenseNet-67, the Dice score increases of only

0.042 which gives a linear coefﬁcient of R

= 0.42

after 100 epochs thus further supporting this conclu-

sion. It also shows a risk of over-ﬁtting with further

training for this model.

Table 5: Model’s performance after 200 epochs of training.

Model Accuracy Dice F1-Score Jaccard Precision Recall

UNet 0.784 0.784 0.781 0.659 0.784 0.784

Squeeze-UNet 0.691 0.691 0.658 0.539 0.635 0.691

DenseNet-67 0.820 0.820 0.816 0.705 0.817 0.820

DenseNet-56 0.789 0.789 0.782 0.663 0.789 0.789

With models trained on 320 × 240 images for 200

epochs, results are very close (Table 5). While UNet

has the smallest process (Table 4), it is still relatively

big. On the other hand, DenseNet-67 obtains the best

performance scores on all metrics. Also we notice

only a 0.031 drop in Dice score with the DenseNet-56

which shows that smaller models are able to perform

just as well for semantic segmentation with a small

number of classes dataset. This proves that for our

task, resource-efﬁcient models have great potential

and could have industrial use. The confusion matrix

(Figure 3) shows an imbalance in the model’s perfor-

Figure 3: DenseNet-67 normalized confusion matrix.

mance. If cross-referenced with Table 1, we can see

that the model clearly performs much better in classes

that are well represented, but has more trouble with

underrepresented classes like obstacles that are only

in 13.8% to 24.4% of the frames in the dataset. This

highlights the weakness of smaller models in their ca-

pacity to generalize for rare objects.

5.2 Quantization

In this work, we further explored model scaling

through the quantization of models. Quantization

compresses weights in ﬂoat32 towards a smaller size,

for instance, int8. We tested it under two scenar-

ios post-training dynamic quantization (PTDQ) and

quantization-aware training (QAT). PTDQ is imple-

mented after training, truncating the values obtained

with a clipping range calibrated for each input. As a

result, the model loses in precision and performance.

On the other hand, QAT intervenes during training in

order to obtain a representation of the ﬂoat operations

with lesser precision thus including it in the loss cal-

culations. PTDQ is easily implemented on a GPU,

but QAT is not supported to this day for GPU. As a

result, it only performs on CPU, the model’s infer-

ence is therefore a concern. Because of ecological

ethics, the model with QAT was halted after 1 epoch

as it lasted 50 min with the resources available. Table

6 shows the results of our different scenarios.

We decided to focus mainly on our most

Towards Resource-Efﬁcient Deep Learning for Train Scene Semantic Segmentation

351

Table 6: DNet-56 before/after quantization (100 epochs).

Model Accuracy Dice F1-Score Jaccard Precision Recall

DenseNet-56 0.753 0.753 0.744 0.617 0.754 0.753

DNet-56 (PTDQ) 0.701 0.701 0.691 0.551 0.690 0.701

DNet-56 (QAT) NA NA NA NA NA NA

lightweight model, DenseNet-56, for quantization.

The difference in performance between the regular

and PTDQ version is only of 0.052 for the Dice score.

Again, it shows potential in terms of margin of error.

5.3 Task Evaluation

We implemented a comparative study between differ-

ent datasets in order to test the usefulness of our work

for other similar segmentation tasks. As Cityscapes

inspired RailSem19, it was chosen for our case study.

When tested on the original Cityscapes dataset that

contains 35 classes, DenseNets obtains very poor per-

formance results (Table 7), close to random. From

these results, we can infer that a 35 segmentation class

task is too complex for the small architectures we

have implemented. To verify this fact, we remapped

Cityscapes to match RailSem8 (Table 8) to effectively

compare performances on these datasets.

Table 7: Dice score on different datasets (200 epochs).

Model Cityscapes 35 Cityscapes 8 RailSem8

UNet NA 0.776 0.784

DenseNet-56 0.556 0.741 0.789

DenseNet-67 0.584 0.796 0.820

As the represented subject is different, class pro-

portions are also vastly unmatched. Cityscapes is

built with images taken from a vehicle ego-point

of view, but set in a different context. Where

RailSem8 represents mostly rails surrounded by na-

ture, Cityscapes shows urban areas with a much

higher concentration of humans, cars or other types

of vehicles, all considered as obstacles. After mod-

ifying Cityscapes, we notice a considerable increase

in performance of our models (Table 7). Results on

Cityscapes8 are lower than RailSem8, but with a max-

imum drop of 0.06 in Dice score for the DenseNet-56

and minimal drop of 0.008 for the UNet. The simil-

itude in tasks explains the closeness in results. An-

other aspect is that the label policy put in place for

RailSem19 is based of the one from Cityscapes. As a

result, the dense annotation is similar in style and ap-

proximation. On the other hand, Cityscapes counts

5000 images where RailSem19 has 8500, this ac-

counts for a slightly better generalization of the mod-

els using RailSem19 which, in turn, explains the small

performance drop. As a result, we can conclude that

our models can efﬁciently perform segmentation for

a low number of classes while being insufﬁcient for

more detailed tasks.

6 DISCUSSION

This work towards efﬁcient and lightweight deep

learning for semantic segmentation of railway envi-

ronments has shown promise, but there are still limi-

tations in terms of performance. While we were able

to demonstrate potential in the use of light models for

segmentation, it is paramount to appreciate the fact

that up-scaling the model size is not the only way to

enhance performance. For instance, a closer look at

segmentation results after DPTQ shows a slight in-

crease in performances for the detection of underrep-

resented classes in our dataset. Moreover, we ob-

served a drop in performance for the detection of ter-

rain, but an increase for the rail class. This observa-

tion shows that there is still room for optimization in

the computing of feature maps that could inﬂuence

the training process as a whole in favor of smaller

classes. Future work might include the exploration

of the effects of quantization on these values and the

model’s behavior regarding feature map construction

for these classes.

For resource-efﬁcient processing, smaller models

do not always mean less resources needed (Table 4).

As a result, a model with more FLOPs and parameters

like UNet can show lower computational complexity

while performing similarly if not slightly better. For

an industrial application, the systems needs to be im-

plementable on-board in trains. As a result, a slightly

bigger model like UNet could still prove more suited

to commercial use as it is closer to real-time use with

less costly equipment. Therefore, with the available

data, we consider UNet to be the most appropriate so-

lution for industrial use in a constrained system. The

skip connection mechanisms employed in the UNet

and DenseNet architectures are at the root of the dif-

ference in operation processes. All in all, UNet is

deeper in terms of number of convolution layers but

faster overall. This problematic shows the challenge

of balancing model and learning process frugality op-

timization.

Our work has proven that a lower resolution can

show promising results without compromising learn-

ing efﬁciency. Naturally, there is a great chance that

these models could perform better on higher resolu-

tion images, but that would intricately increase pro-

cess requirements and therefore clash with our con-

straints. Future works include the implementation

of attention mechanisms or data augmentation to en-

hance segmentation performance on less represented

classes such as tracks. We also plan to implement this

section of our research in a more global vegetation

assessment application for railway maintenance.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

352

Table 8: Cityscapes (CS 35) modiﬁed dense labels to Cityscapes8 (CS 8) compared to RailSem8 (RS8) labels.

Label RS8 Rails Track Track-bed Terrain Vegetation Sky Obstacles Background

Label CS 8 Void Tracks Road Terrain Vegetation Sky Obstacles Background

Label CS 35 rail track road terrain vegetation sky person, rider,

car, truck, bus,

caravan, trailer,

motorcycle,

bicycle, license

plate

unlabeled, ego vehicle, rectiﬁca-

tion border, out of roi, static, dy-

namic, ground, sidewalk, parking,

building, wall, fence, guard rail,

bridge, tunnel, pole, polegroup,

trafﬁc light, trafﬁc sign

Figure 4: Output of each model after training for 200 epochs.

7 CONCLUSION

In this paper, we propose a comparative study for

resource-efﬁcient CNNs for semantic segmentation.

The models presented consist of the implementation

of known scaling techniques such as skip connec-

tions, and the adaptation of classiﬁers for segmen-

tation tasks while making the most of the mecha-

nisms developed to reduce model size. In our work,

we constrained our models to smaller architectures

while evaluating their performances on RailSem19

modiﬁed to RailSem8 and Cityscapes with 35 and 8

classes. Results show that our models have a very

small drop of performance when quantized and are

capable of detecting the main regions of interest for

industrial use in the detection of vegetation. In the

future, we will focus on optimizing results in order to

achieve real-time performance and raise segmentation

accuracy for under-represented classes.

Towards Resource-Efﬁcient Deep Learning for Train Scene Semantic Segmentation

353

REFERENCES

Ahamad, A., Sun, C.-C., and Kuo, W.-K. (2022). Quan-

tized Semantic Segmentation Deep Architecture for

Deployment on an Edge Computing Device for Image

Segmentation. Electronics, 11(21):3561.

Badrinarayanan, V., Kendall, A., and Cipolla, R. (2017).

SegNet: A Deep Convolutional Encoder-Decoder Ar-

chitecture for Image Segmentation. IEEE Trans. Pat-

tern Anal. Mach. Intell., 39(12):2481–2495.

Beheshti, N. and Johnsson, L. (2020). Squeeze U-Net:

A Memory and Energy Efﬁcient Image Segmentation

Network. In 2020 IEEE CVPR Workshops, pages

1495–1504.

Duquene, A. P. (2023). Apprentissage machine pour la

ecision de conduite autonome de v

ehicules guid

es :

Application dans le domaine ferroviaire. PhD thesis,

Universit

e Polytechnique Hauts-de-France.

Evchenko, M., Vanschoren, J., Hoos, H. H., Schoenauer,

M., and Sebag, M. (2021). Frugal Machine Learning.

arXiv:2111.03731.

Harb, J., R

ena, N., Chosidow, R., Roblin, G., Potarusov,

R., and Hajri, H. (2020). FRSign: A Large-

Scale Trafﬁc Light Dataset for Autonomous Trains.

arXiv:2002.05665.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Resid-

ual Learning for Image Recognition. In 2016 IEEE

CVPR, pages 770–778.

Huang, G., Liu, Z., van der Maaten, L., and Weinberger,

K. Q. (2017). Densely Connected Convolutional Net-

works. In 2017 IEEE CVPR, pages 2261–2269.

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K.,

Dally, W. J., and Keutzer, K. (2016). SqueezeNet:

AlexNet-level accuracy with 50x fewer parameters

and ≤ 0.5mb model size. arXiv:1602.07360.

Li, X. and Peng, X. (2023). Rail Detection: An Efﬁcient

Row-based Network and A New Benchmark. ACMM

2022 arXiv:2304.05667.

Nanfack, G., Elhassouny, A., and Thami, R. O. H. (2017).

Squeeze-SegNet: A new fast Deep Convolutional

Neural Network for Semantic Segmentation. ICMV

2017.

Pappaterra, M. J., Flammini, F., Vittorini, V., and Be

sinovi

N. (2021). A Systematic Review of Artiﬁcial Intelli-

gence Public Datasets for Railway Applications. In-

frastructures, 6(10):136.

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net:

Convolutional Networks for Biomedical Image Seg-

mentation. MICCAI 2015 arXiv:1505.04597.

Singh, P., Dulebenets, M. A., Pasha, J., Gonzalez, E. D.

R. S., Lau, Y.-Y., and Kampmann, R. (2021). Deploy-

ment of Autonomous Trains in Rail Transportation:

Current Trends and Existing Challenges. IEEE Ac-

cess, 9:91427–91461.

Skibicki, J. D. and Licow, R. (2022). A Visual Method

of Measuring Railway-Track Weed Infestation Level.

Metrology, 2(2):230–240.

Tan, M. and Le, Q. V. (2019). EfﬁcientNet: Rethinking

Model Scaling for Convolutional Neural Networks.

ICML 2019, pages 6105–6114.

Wang, Y., Wang, L., Hu, Y. H., and Qiu, J. (2019). Rail-

Net: A Segmentation Network for Railroad Detection.

IEEE Access, 7:143772–143779.

Yin, M., Li, K., and Cheng, X. (2020). A review on artiﬁcial

intelligence in high-speed rail. Transportation Safety

and Environment, 2(4):247–259.

Yvinec, E. (2023). Efﬁcient Neural Networks : Post Train-

ing Pruning and Quantization. PhD thesis, Sorbonne

Universit

Zakaria, B., Ben Ahmed, O., Amamra, A., Bradai, A.,

and Beghdad Bey, K. (2022). PSCS-Net: Perception

Optimized Image Reconstruction Network for Au-

tonomous Driving Systems. IEEE ITS, 24(2):1–16.

Zendel, O., Murschitz, M., Zeilinger, M., Steininger, D.,

Abbasi, S., and Beleznai, C. (2019). RailSem19: A

Dataset for Semantic Rail Scene Understanding. In

2019 IEEE CVPR Workshops, pages 1221–1229.

Zouaoui, A., Mahtani, A., Hadded, M. A., Ambellouis, S.,

Boonaert, J., and Wannous, H. (2022). RailSet: A

Unique Dataset for Railway Anomaly Detection. In

2022 IEEE IPAS, pages 1–6.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

354