Gam-UNet for Semantic Segmentation

Rahma Aloui

1 a

, Pranav Martini

1 b

, Pandu Devarakota

2 c

, Apurva Gala

2 d

and Shishir K. Shah

1 e

University of Houston, Houston, TX, U.S.A.

Shell Information Technology International Inc., Houston, TX, U.S.A.

Keywords:

Image Segmentation, UNet Variants, Gabor Filters, Spatial-Channel Squeeze-and-Excitation, Multi-Scale

Feature Fusion, Gabor Convolution, Retinal Vessels Images, Seismic Images.

Abstract:

Accurate delineation of critical features, such as salt boundaries in seismic imaging and ﬁne structures in med-

ical images, is essential for effective analysis and decision-making. Traditional convolutional neural networks

(CNNs) often face difﬁculties in handling complex data due to variations in scale, orientation, and noise. These

limitations become particularly evident during the transition from proof-of-concept to real-world deployment,

where models must perform consistently under diverse conditions. To address these challenges, we propose

GAM-UNet, an advanced segmentation architecture that integrates learnable Gabor ﬁlters for enhanced edge

detection, SCSE blocks for feature reﬁnement, and multi-scale fusion within the U-Net framework. This ap-

proach improves feature extraction across varying scales and orientations. Trained using a combined Binary

Cross-Entropy and Dice loss function, GAM-UNet demonstrates superior segmentation accuracy and conti-

nuity, outperforming existing U-Net variants across diverse datasets.

1 INTRODUCTION

Segmentation tasks involving curved lines, such as

those in seismic and medical imaging, pose signiﬁ-

cant challenges for traditional Convolutional Neural

Networks (CNNs), as shown in Figure 1. Poor pre-

dictions, often discontinuous or imprecise, can lead

to suboptimal results. These tasks demand high ac-

curacy for identifying boundaries amidst noisy back-

grounds while capturing ﬁne details of varying shapes

and sizes. However, CNNs often struggle with vari-

ations in the orientation and scale of structures, re-

sulting in inconsistent performance and discontinu-

ous predictions, which are particularly detrimental in

ﬁne-grained segmentation. While CNNs are adept at

capturing hierarchical spatial features, they inherently

lack the ability to manage geometric transformations,

often relying on extensive data augmentation. How-

ever, this can lead to overﬁtting and fails to fully ad-

dress the challenges posed by these tasks.

Previous techniques have explored Gabor ﬁlters

for their ability to capture multi-scale and multi-

https://orcid.org/0009-0008-2599-0553

https://orcid.org/0000-0001-8871-9068

https://orcid.org/0000-0003-1989-0261

https://orcid.org/0009-0005-2905-1113

https://orcid.org/0000-0003-4093-6906

Figure 1: Examples of applications for curved lines seg-

mentation (a)Image (b) Ground Truth (c) Poor prediction.

orientation features, particularly for edge detection

and texture analysis. However, their application in

segmentation tasks remains underexplored. In this

work, we integrate learnable Gabor ﬁlters within

the U-Net architecture through a Gabor convolu-

tion mechanism, enabling the ﬁlters to adapt during

training and improving segmentation across varying

scales, orientations, and textures.

To complement Gabor ﬁlters, attention mecha-

nisms are incorporated into the U-Net architecture to

focus on critical regions, enhancing the detection of

relevant features. Additionally, multi-scale feature fu-

sion captures both ﬁne details and broader contexts,

enabling the model to handle objects of varying sizes.

524

Aloui, R., Martini, P., Devarakota, P., Gala, A. and Shah, S. K.

Gam-UNet for Semantic Segmentation.

DOI: 10.5220/0013182000003912

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 3: VISAPP, pages

524-531

ISBN: 978-989-758-728-3; ISSN: 2184-4321

Together, these techniques ensure precise boundary

detection while maintaining a robust understanding of

overall image context.

Building on these advancements, we propose

GAM-UNet, a novel architecture that combines Ga-

bor Feature Extraction Modules (GFEM), attention

mechanisms, and multi-scale feature fusion within

the U-Net framework. SCSE (Spatial and Channel

Squeeze and Excitation) blocks dynamically recali-

brate feature maps, emphasizing spatial and channel-

wise information. This integrated approach signiﬁ-

cantly improves segmentation continuity and robust-

ness, making GAM-UNet well-suited for challenging

segmentation tasks. The primary contributions of this

paper are as follows:

• Gabor Feature Extraction in UNet Integration

of Gabor Feature Extraction Modules (GFEM)

into the UNet architecture to improve edge and

texture detection in image segmentation tasks.

• SCSE Implementation. Incorporation of Spatial-

Channel Squeeze-and-Excitation (SCSE) blocks

to dynamically recalibrate feature maps, enhanc-

ing relevant feature extraction.

• Multi-Scale Feature Fusion. Implementation of

a multi-scale feature fusion framework, exploring

both Cascade and Skip Connection with Concate-

nation strategies to effectively capture contextual

information and reduce the semantic gap.

2 RELATED WORKS

Recent advances in semantic segmentation have fo-

cused on improving CNN-based architectures, par-

ticularly U-Net variants, using attention mechanisms,

Gabor ﬁlters, and multi-scale feature fusion. Our ap-

proach builds on these techniques, targeting feature

recalibration for complex tasks such as seismic and

medical image segmentation.

2.1 Gabor Filters in CNN

Gabor ﬁlters are widely recognized for their ability

to capture texture and edge details across speciﬁc ori-

entations and scales. Sarwar et al. (Sarwar et al.,

2017) demonstrated their use in CNNs for texture-

based classiﬁcation, while Ozbulak et al. (Ozbulak

and Ekenel, 2018) reﬁned Gabor ﬁlters for improved

classiﬁcation on datasets like CIFAR-10. Luan et

al. (Luan et al., 2018b) embedded Gabor ﬁlters into

CNNs to enhance feature extraction and interpretabil-

ity in segmentation tasks. Trainable Gabor ﬁlters, in-

troduced by Alekseev and Bobe (Alekseev and Bobe,

2019), adapt dynamically to input data, improving ro-

bustness in texture-sensitive tasks. Yuan et al. (Yuan

et al., 2022) further reﬁned this concept for greater

ﬂexibility in various applications. For segmentation,

Wang and Alkhalifah (Wang and Alkhalifah, 2023)

employed adaptable Gabor kernels for seismic data,

while AGNet (Luan et al., 2018a) merged Gabor

and convolutional features to reduce semantic gaps.

Building on these developments, our method incor-

porates trainable Gabor ﬁlters that dynamically adjust

during training, enabling effective capture of edge and

texture features across diverse orientations and scales,

particularly in complex seismic and medical images.

2.2 Attention Mechanisms in U-Net

Attention mechanisms enhance U-Net models by

enabling them to focus on critical image features.

Squeeze and excitation (SE) blocks (Roy et al.,

2018) recalibrate feature channels, while the Con-

volutional Block Attention Module (CBAM) (Chen

et al., 2020) adds spatial attention for improved

foreground-background distinction. Coordinate At-

tention (CA) (Li et al., 2020) embeds positional

cues, enhancing segmentation precision. USE-Net

(Rundo et al., 2019) integrates SE blocks for prostate

zonal segmentation, and Shen et al. (Shen et al.,

2022) applied SE transformers to microvessel seg-

mentation. SSA-UNet (Jiang et al., 2024) combines

SE blocks with self-attention for brain segmentation,

demonstrating the effectiveness of feature recalibra-

tion in medical imaging. Our method integrates SCSE

blocks within the Gabor convolution layers to empha-

size spatial and channel-wise information. This recal-

ibration enables precise focus on important structures,

crucial for tasks involving seismic and medical data.

2.3 Multi-Scale Feature Fusion

Multi-scale feature fusion combines information from

different resolution levels, effectively integrating ﬁne

details with broader contextual information. UNet++

(Zhou et al., 2018) improves this process by re-

designing skip connections, while U2-Net (Qin et al.,

2020) employs Residual U-blocks for better multi-

scale feature blending. MA-Unet (Cai and Wang,

2020) incorporates attention mechanisms to aggre-

gate multi-scale features, preserving detailed struc-

tures and larger patterns. Our model employs multi-

scale feature fusion in the decoder, merging features

from various levels to capture small-scale details and

overall context. This approach is particularly effective

for seismic and medical segmentation, where both

precision and contextual understanding are critical.

Gam-UNet for Semantic Segmentation

525

Figure 2: Model Architecture.

3 METHODOLOGY

3.1 GAM-UNet

GAM-UNet integrates key mechanisms to improve

segmentation in seismic and medical images. The

encoder features a Gabor Feature Extraction Module

(GFEM), combining Gabor convolution (GConv) and

SCSE (Spatial and Channel Squeeze-and-Excitation)

blocks to capture orientation- and scale-speciﬁc de-

tails while recalibrating feature importance. The de-

coder incorporates multi-scale feature fusion (MSFF)

to combine features across scales, minimizing in-

formation loss and enhancing segmentation perfor-

mance. Figure 2 illustrates the architecture, showing

GFEM in the encoder and MSFF in the decoder.

3.2 Gabor Feature Extraction Module

The Gabor Feature Extraction Module (GFEM) in-

tegrates Gabor convolution and standard convolution

ﬁlters in a dual-branch architecture to enhance fea-

ture extraction. The upper branch applies Gabor ﬁl-

ters with SCSE blocks to focus on ﬁne texture extrac-

tion, while the lower branch uses conventional 3 × 3

convolutions to capture broader spatial features. The

outputs of both branches are concatenated, merging

ﬁne and general features for enriched representation.

A 1 × 1 convolution followed by a 3 × 3 convolution

further integrates these features, ensuring the network

effectively combines detailed and general information

for accurate segmentation.

3.2.1 Gabor Convolution

At the core of the GFEM is the Gabor convolu-

tion operation, as shown in Figure 3, where feature

maps from the previous encoder layer are convolved

channel-wise with a set of Gabor ﬁlters. The num-

ber of ﬁlters is determined by the product of the se-

lected orientations and scales, ensuring comprehen-

sive coverage of directional and multi-scale features

in the data. Here, C represents the number of input

Figure 3: Visualization of the GCONV operation.

feature maps, while C

′

denotes the number of feature

maps generated after applying the Gabor ﬁlters. A

ReLU activation is applied, followed by a 1 × 1 con-

volution to project the feature maps back to C dimen-

sions, maintaining consistency between input and out-

put feature map sizes. This design preserves feature

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

526

map integrity while enabling effective detection of di-

rectional textures and edges.

The Gabor ﬁlters’ characteristics are inﬂuenced

by several key parameters, including their aspect ra-

tio (γ), which determines the ﬁlter’s shape. Smaller γ

values (e.g., 0.5) produce elongated ﬁlters that excel

in detecting directional edges, while values closer to

1 yield circular ﬁlters that capture broader textures.

Wavelength (λ), representing the spatial frequency,

is initialized with positive values greater than 0 and

less than the ﬁlter size. This constraint ensures that

λ aligns with the scale of the features in the data

and avoids unrealistic frequencies. Orientation (θ),

ranging from 0

◦

to 180

◦

, enables the ﬁlters to de-

tect features in all directions. Gaussian size (σ), set

within [0.5, 1.5], governs spatial localization, balanc-

ing ﬁne and coarse feature detection. Phase offset (ψ)

is ﬁxed at 0 to emphasize sharp transitions and im-

prove boundary detection. These parameters, updated

during training, allow the ﬁlters to dynamically adapt

to dataset-speciﬁc characteristics, capturing both ﬁne

details and broader structures.

During training, λ, θ, σ, and γ are updated through

backpropagation. To ensure numerical stability, these

parameters are scaled during training and rescaled af-

ter each update, preventing extreme values that could

degrade performance.

Gabor ﬁlters remain consistent across all network

levels, maintaining coherence in feature extraction.

Future work will explore varying these parameters

across network levels to enhance hierarchical feature

extraction.

The ﬁlter size signiﬁcantly impacts feature detec-

tion: smaller ﬁlters like 3 × 3 excel at highlighting

ﬁne textures and directional edges, while larger sizes

(5 × 5 or 7 × 7) capture broader features but may lose

ﬁner details. To balance this trade-off, ﬁlter sizes are

systematically tested, starting with 3 × 3.

The Gabor wavelet function is deﬁned as:

g(x, y, λ, θ, ψ, σ, γ) = e

−

′2

+γ

′2

2σ

· e

i(2π

′

+ψ)

, (1)

where x

′

and y

′

are rotated coordinates based on θ.

The Gaussian envelope provides spatial localization,

while the complex exponential captures frequency

and phase information.

By incorporating trainable Gabor ﬁlters, GAM-

UNet dynamically adapts to dataset characteristics,

effectively capturing both ﬁne details and broader

structures for precise segmentation of complex tex-

tures, such as seismic top salt boundaries and thin

medical features like retinal vessels.

3.2.2 Spatial Channel Squeeze-and-Excitation

SCSE mechanisms recalibrate both spatial and chan-

nel information in the Gabor convolution outputs to

enhance feature selectivity. The SE block, illustrated

in Figure 2 in Section 3.1, uses global average pool-

ing to ”squeeze” feature maps into channel descrip-

tors that summarize their importance. A gating mech-

anism, comprising fully connected layers and a sig-

moid activation, adjusts the channel responses by am-

plifying important features and suppressing less rele-

vant ones. Additionally, spatial recalibration empha-

sizes critical regions such as ﬁne textures and edges,

improving the model’s ability to capture subtle varia-

tions in complex segmentation tasks. Applying SCSE

to the Gabor-ﬁltered outputs effectively enhances tex-

ture and edge detection, leveraging the strengths of

Gabor ﬁlters for precise segmentation.

3.3 Multi-Scale Feature Fusion (MSFF)

Figure 4: Multi-scale feature fusion: (a) Cascade connec-

tion (b) Skip Connection with Concatenation.

MSFF is essential for capturing both local and global

contexts, a widely used technique in segmentation

models (Zhou et al., 2018; Qin et al., 2020; Cai

and Wang, 2020). Our model employs a multi-scale

skip connection scheme that combines features from

various semantic levels via upsampling, concatena-

tion, and convolution in the decoder. Two connection

strategies are explored: Cascade and Skip Connection

with Concatenation, as illustrated in Figure 4.

• Cascade Connection. Features from multiple

scales are upsampled and concatenated into a uni-

ﬁed representation.

• Skip Connection with Concatenation. Features

are upsampled and concatenated with the output

of the next block, forming a direct concatenated

pathway.

By progressively aggregating features from dif-

ferent decoder stages, our method reduces overﬁtting

and enhances ﬁne detail capture. This structure im-

proves segmentation performance by maintaining ro-

bust feature representations across semantic levels.

Gam-UNet for Semantic Segmentation

527

3.4 Loss Function

To achieve precise segmentation, GAM-UNet em-

ploys a weighted combined loss function that inte-

grates Binary Cross-Entropy Loss (L

BCE

) for pixel-

wise accuracy and Dice Loss (L

DICE

) to enhance

structural similarity and address class imbalance.

This combination ensures a balance between pixel-

level accuracy and structural integrity, making GAM-

UNet highly effective for segmenting curvilinear

structures in seismic and medical images.

Binary Cross-Entropy Loss (L

BCE

) penalizes

pixel-wise differences between predicted probabili-

ties and the ground truth, while Dice Loss (L

DICE

)

measures the overlap between predicted segmenta-

tion and the ground truth, ensuring structural consis-

tency:

BCE

= −

∑

i=1

log( ˆy

) + (1 − y

)log(1 − ˆy

)], (2)

DICE

= 1 −

∑

i=1

ˆy

∑

i=1

∑

i=1

ˆy

, (3)

where y

is the ground truth label and ˆy

is the pre-

dicted probability for pixel i.

4 EXPERIMENTS

We evaluated GAM-UNet on two segmentation tasks:

top salt layer in seismic images and retinal vessel in

medical images. The model, implemented in PyTorch

and optimized with the Adam optimizer (learning

rate: 0.0001), converged in approximately 80 epochs.

Key metrics—Dice Coefﬁcient, Precision, and Re-

call—were used to assess segmentation accuracy and

structural consistency across diverse datasets.

4.1 Datasets

We evaluated GAM-UNet on two segmentation tasks:

top salt boundary in seismic images and retinal vessel

in medical images. Each task utilized datasets split

into 80% for training, 10% for validation, and 10%

for testing, without data augmentation to maintain the

original image characteristics.

• Seismic Image Datasets. Four seismic datasets

were used to assess the model’s robustness.

Datasets 1 to 3 are proprietary, containing 2D

slices extracted from 3D seismic cubes with sizes

ranging from 5k to 25k images. The fourth dataset

is the publicly available Dutch North Sea dataset

(DNS) (SEG SEAM Project, 2024), widely used

in seismic research. All datasets were prepared

by extracting 256 × 256 image slices along inline

and crossline sections, preserving geological de-

tails and curvilinear boundaries.

Figure 5: Seismic image slice and its label.

• Medical Image Datasets. To address the limited

availability of retinal vessel images online, six

datasets—ARIA, CHASE, DR-Hagis, DRIVE,

HRF, and STARE (Sarhan et al., 2021)—were

combined into a single dataset comprising ap-

proximately 400 images. This provided diverse

vessel structures with varying thickness, resolu-

tion, and background complexity.

Figure 6: Retinal vessel datasets.

4.2 Comparative Approaches

This section summarizes the key conﬁgurations of the

models used for comparative analysis:

• Fully Convolutional Network (FCN) (Long

et al., 2015). A model that replaces fully con-

nected layers with convolutions, enabling pixel-

wise predictions through upsampling while pre-

serving spatial information.

• UNet (Ronneberger et al., 2015). A symmetric

encoder-decoder architecture with skip connec-

tions, combining high-resolution encoder features

with decoder layers to enhance segmentation.

• UNet++ (Zhou et al., 2018). An extended UNet

with nested skip connections for reﬁned feature

extraction and improved handling of complex

structures.

• Attention UNet (Att-UNet) (AlSalmi and

Elsheikh, 2024). Enhances segmentation by in-

corporating attention gates that focus on relevant

regions while suppressing irrelevant areas.

• Learnable Gabor Kernels for Seismic Inter-

pretation (LGK-UNet) (Wang and Alkhalifah,

2023). Employs learnable Gabor ﬁlters to adapt to

different scales and orientations, improving fea-

ture extraction for seismic image segmentation.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

528

Table 1: Performance comparison for seismic and medical datasets across Dice, Precision, and Recall metrics.

Seismic Datasets Medical Dataset

Models Dataset 1 Dataset 2 Dataset 3 DNS Retinal Vessels

Dice Precision Recall Dice Precision Recall Dice Precision Recall Dice Precision Recall Dice Precision Recall

FCN (Long et al., 2015) 0.64 0.94 0.61 0.82 0.90 0.80 0.35 0.83 0.25 0.63 0.84 0.57 0.51 0.70 0.43

UNet (Ronneberger et al., 2015) 0.77 0.93 0.73 0.80 0.89 0.78 0.34 0.87 0.25 0.68 0.91 0.61 0.66 0.91 0.54

UNet++ (Zhou et al., 2018) 0.78 0.95 0.74 0.82 0.88 0.79 0.33 0.90 0.23 0.67 0.85 0.60 0.62 0.70 0.58

Att-UNet (AlSalmi and Elsheikh, 2024) 0.79 0.96 0.75 0.83 0.89 0.82 0.31 0.91 0.21 0.71 0.87 0.66 0.66 0.58 0.77

LGK-UNet (Wang and Alkhalifah, 2023) 0.80 0.95 0.76 0.83 0.89 0.81 0.34 0.90 0.24 0.72 0.88 0.67 0.65 0.64 0.69

Ours 0.89 0.88 0.91 0.84 0.89 0.82 0.65 0.78 0.59 0.85 0.86 0.85 0.70 0.61 0.83

5 RESULTS

This section compares our model’s performance with

other segmentation models using Dice, Precision, and

Recall metrics on seismic and medical datasets. Ab-

lation studies evaluate architectural components, and

the Gabor ﬁlters’ role in feature extraction is ana-

lyzed.

5.1 Models Performance Comparison

We assessed the performance of GAM-UNet on two

distinct dataset types: seismic datasets (Datasets 1

to 3 and DNS) and a combined medical dataset fo-

cused on retinal vessel segmentation. These datasets

present challenges such as noise, complex structures,

and high-resolution demands. For evaluation, GAM-

UNet was compared against the comparative models

cited in Section 4.2. Our model consistently achieves

higher Dice and Recall scores across both seismic and

medical datasets, as shown in Table 1. The Dice score

reﬂects accurate boundary and structure capture, bal-

ancing precision and recall. High Recall demonstrates

the model’s ability to identify target structures com-

prehensively, even in noisy or complex data. While

Precision scores are slightly lower due to the Gabor

ﬁlters enhancing edges but causing thicker bound-

aries, the strong Dice and Recall scores underscore

the model’s robustness and effectiveness in segmen-

tation tasks.

In seismic datasets, these improvements enable

accurate segmentation of top salt boundaries, ensur-

ing continuity and better detection in noisy geologi-

cal environments. As shown in Figure 7, GAM-UNet

provides more detailed and gap-free results compared

to other models, particularly in complex formations

where traditional approaches struggle.

For medical datasets, GAM-UNet excels in seg-

menting retinal vessels, effectively capturing ﬁner

branching structures critical for diagnosis. Its abil-

ity to detect smaller vessels missed by other mod-

els demonstrates its strength in handling complex

vascular structures, making it suitable for precision-

demanding clinical applications.

5.2 Ablation Study

To evaluate the impact of different architectural com-

ponents on performance and parameter efﬁciency, we

conducted ablation studies on the Netherlands DNS

dataset. The analysis examined the contributions of

the Gabor Filter-based Feature Extraction Module

(GFEM), the SCSE mechanism, and multi-scale fea-

ture fusion with both cascade and Skip Connection

with Concatenation strategies.

Table 2: Ablation Study on Key Components.

Model Conﬁguration Dice Precision Recall Param (M)

Baseline (UNet) 0.68 0.91 0.61 6.66

+ GFEM (Gabor Only) 0.80 0.89 0.76 6.99

+ GFEM (Gabor + SCSE) 0.83 0.90 0.79 7.02

+ GFEM + SCSE + Cascade 0.85 0.86 0.85 7.04

+ GFEM + SCSE + Residual 0.84 0.86 0.84 7.04

Each addition signiﬁcantly enhanced the model’s

performance, as shown in Table 2. Introducing

the Gabor Filter-based Feature Extraction Module

(GFEM) increased Dice and Recall scores by +0.12

and +0.15, respectively, with a modest parameter rise

from 6.66M to 6.99M. Adding the SCSE mechanism

further improved feature maps, achieving a Dice score

of 0.83 with minimal parameter growth (7.02M),

highlighting the beneﬁts of spatial and channel at-

tention. Incorporating multi-scale feature fusion with

cascade connections yielded the best results, reaching

a Dice score and Recall of 0.85 with 7.04M param-

eters. While the residual variant performed compa-

rably (Dice = 0.84), it fell short of the cascade con-

ﬁguration. These ﬁndings validate the slight param-

eter increases by demonstrating substantial gains in

segmentation performance, particularly for complex

structures.

We further explored the inﬂuence of the loss func-

tion by conducting an ablation study on the different

weightings of the binary cross-entropy loss (L

BCE

)

and Dice loss (L

DICE

). As seen in Figure 8, the

weighting between L

BCE

and L

DICE

plays a crucial

role in the model’s performance. The optimal bal-

ance was found with α = 0.8 for L

BCE

and β = 0.2

for L

DICE

, yielding the best results. This indicates

that optimizing structural similarity through L

DICE

Gam-UNet for Semantic Segmentation

529

Figure 7: Segmentation results on DNS and retinal vessels datasets (a) Original image, (b) GT, (c) FCN, (d) UNet, (e) Att-

Unet, (f) Unet++, (g) GLK-Unet, (h) Ours

Figure 8: Ablation Study on Loss Function Weights.

essential for achieving high-quality segmentation, en-

suring accurate boundaries and overlap between the

predicted and ground truth segments.

Table 3: Ablation Study on Kernel Sizes, Orientations, and

Scales.

Kernel Size Orien Scales Dice Precision Recall

3 × 3

4 2 0.8321 0.8580 0.8105

8 4 0.8465 0.8649 0.8352

6 3 0.8560 0.8669 0.8521

12 6 0.8432 0.8601 0.8350

5 × 5

4 2 0.7742 0.8540 0.7200

8 4 0.7854 0.8673 0.7421

6 3 0.7920 0.8716 0.7502

12 6 0.7800 0.8600 0.7330

7 × 7

4 2 0.7595 0.8402 0.7101

8 4 0.7681 0.8455 0.7250

6 3 0.7750 0.8601 0.7203

12 6 0.7620 0.8500 0.7150

Our analysis evaluated Gabor ﬁlter conﬁgurations

by varying kernel sizes, orientations, and scales. As

shown in Table 3, the optimal conﬁguration—6 orien-

tations, 3 scales, and a kernel size of 3 × 3—achieved

the highest Dice score (0.8560), Precision (0.8669),

and Recall (0.8521). The 3 × 3 kernels excelled in

capturing ﬁne details and directional textures, which

are crucial for segmenting thin, curvilinear structures

like retinal vessels and top salt layers. Larger kernels

(5 × 5, 7 × 7) caused smoothing, leading to a loss of

critical edge details and occasionally blank, feature-

less images, which reduced segmentation accuracy.

The number of orientations, set to twice the num-

ber of scales, balanced feature coverage and redun-

dancy. Testing conﬁrmed that the 3 × 3 setup offered

the best trade-off between computational efﬁciency

and segmentation accuracy, while larger conﬁgura-

tions provided diminishing returns.

5.3 Analysis of Gabor Filters for

Seismic and Medical Images

The Gabor ﬁlters adapt their parameters to the unique

characteristics of seismic and medical images. For

seismic datasets, ﬁlters focus on mid-range orienta-

tions (0.200π to 0.664π, or 36° to 120°), enabling

the detection of geological features like salt bound-

aries and faults. Lambda values (0.850 to 1.983) cap-

ture features across scales, while an elliptical Gamma

(0.547) emphasizes elongated structures. A Sigma of

1.0 balances ﬁne and coarse feature detection, and a

Psi of 0.5π ensures sharp edge transitions, collectively

enhancing seismic feature representation (Figure 9).

Figure 9: Sample Gabor ﬁlters and their feature maps for

(right) DNS dataset and (left) Retinal vessels dataset.

For medical images, broader orientations (0.003π

to 0.648π, or 0.5° to 116°) address the multi-

directional nature of blood vessels and tissue bound-

aries. Lambda values (0.955 to 1.995) capture both

thin vessels and larger anatomical features. Un-

like seismic ﬁlters, a nearly circular Gamma (0.902)

suits rounded structures. The consistent Sigma (1.0)

and Psi (0.5π) ensure uniformity in phase transitions.

These domain-speciﬁc adaptations emphasize seismic

ﬁlters’ focus on elongated, stratiﬁed patterns, while

medical ﬁlters capture ﬁner, multi-directional details

like branching vessels. This dynamic parameteriza-

tion enables the effective segmentation of intricate

patterns across both datasets.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

530

6 CONCLUSION

This paper presented GAM-UNet, a semantic seg-

mentation model integrating Gabor Feature Extrac-

tion Modules (GFEM), SCSE mechanisms, and

multi-scale feature fusion within the UNet frame-

work. This combination enhances the model’s ability

to capture ﬁne edges and textures in complex tasks

involving seismic and medical images. The learn-

able Gabor ﬁlters dynamically adapt to the orientation

and scale of features, enabling the detection of ﬁne,

elongated structures in seismic images and intricate,

multi-directional features in medical images, such as

blood vessels. Our experiments highlighted the su-

perior performance of the 3×3 Gabor ﬁlter conﬁgu-

ration, which preserved segmentation continuity and

reduced noise. Analysis across datasets demonstrated

the model’s adaptability—capturing broad, linear fea-

tures in seismic data and intricate textures in medical

images—underscoring its effectiveness in segmenting

ﬁne, curvilinear structures within the tested datasets.

ACKNOWLEDGMENT

We thank Shell Information Technology International

Inc. for providing the dataset, which was instrumental

in this research.

REFERENCES

Alekseev, A. and Bobe, A. (2019). Gabornet: Gabor ﬁl-

ters with learnable parameters in deep convolutional

neural network. In 2019 International Conference on

Engineering and Telecommunication (EnT). IEEE.

AlSalmi, H. and Elsheikh, A. H. (2024). Automated seismic

semantic segmentation using attention u-net. GEO-

PHYSICS, 89(1).

Cai, Y. and Wang, Y. (2020). Ma-unet: An improved ver-

sion of unet based on multi-scale and attention mech-

anism for medical image segmentation. arXiv.

Chen, B., Zhang, Z., Liu, N., Tan, Y., Liu, X., and Chen, T.

(2020). Spatiotemporal convolutional neural network

with convolutional block attention module for micro-

expression recognition. Information, 11:380.

Jiang, S., Chen, X., and Yi, C. (2024). Ssa-unet:

Whole brain segmentation by u-net with squeeze-and-

excitation block and self-attention block from the 2.5d

slice image. IET Image Processing.

Li, H., Qiu, K., Chen, L., Mei, X., Hong, L., and Tao,

C. (2020). Scattnet: Semantic segmentation net-

work with spatial and channel attention mechanism

for high-resolution remote sensing images. IEEE

Geosci. Remote Sens. Lett., 18:905–909.

Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-

volutional networks for semantic segmentation. In

Proceedings of the IEEE Conference on Computer Vi-

sion and Pattern Recognition, pages 3431–3440.

Luan, S., Chen, C., Zhang, B., Han, J., and Liu, J. (2018a).

Gabor convolutional networks. IEEE Transactions on

Image Processing, 27(9):4357–4366.

Luan, S., Zhang, B., Zhou, S., Chen, C., Han, J., Yang, W.,

and Liu, J. (2018b). Gabor convolutional networks.

IEEE Trans. Image Process., 27(9):4357–4366.

Ozbulak, G. and Ekenel, H. K. (2018). Gabor initialized

convolutional neural networks for transfer learning. In

Signal Processing and Communications Application

Conference.

Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O. R.,

and Jagersand, M. (2020). U2-net: Going deeper with

nested u-structure for salient object detection. Pattern

Recognition, 106:107404.

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:

Convolutional networks for biomedical image seg-

mentation. 9351:234–241.

Roy, A. G., Navab, N., and Wachinger, C. (2018). Recali-

brating fully convolutional networks with spatial and

channel ‘squeeze and excitation’ blocks. IEEE Trans.

Med. Imaging, 38:540–549.

Rundo, L., Militello, C., Vitabile, S., Gilardi, M. C., and

Leonardi, R. (2019). Use-net: Incorporating squeeze-

and-excitation blocks into u-net for prostate zonal seg-

mentation of multi-institutional mri datasets. Neuro-

computing, 365:31–43.

Sarhan, A., Rokne, J., Alhajj, R., and Crichton, A. (2021).

Transfer learning through weighted loss function and

group normalization for vessel segmentation from

retinal images. In 2020 25th ICPR, pages 9211–9218.

IEEE.

Sarwar, S. S., Panda, P., and Roy, K. (2017). Gabor ﬁlter as-

sisted energy efﬁcient fast learning convolutional neu-

ral networks. In IEEE ACM International Symposium

on Low Power Electronics and Design, pages 1–6.

SEG SEAM Project (2024). SEG SEAM Artiﬁ-

cial Intelligence Dataset. https://seg.org/seam/

artiﬁcial-intelligence/.

Shen, X., Xu, J., Jia, H., Fan, P., Dong, F., Yu, B., and

Ren, S. (2022). Self-attentional microvessel segmen-

tation via squeeze-excitation transformer unet. Com-

puterized medical imaging and graphics : the ofﬁcial

journal of the Computerized Medical Imaging Society,

97:102055.

Wang, F. and Alkhalifah, T. (2023). Learnable gabor kernels

in convolutional neural networks for seismic interpre-

tation tasks. arXiv.

Yuan, Y., Wang, L.-N., Zhong, G., Gao, W., Jiao, W., Dong,

J., Shen, B., Xia, D., and Xiang, W. (2022). Adaptive

gabor convolutional networks. Pattern Recognition,

124:108495.

Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N., and

Liang, J. (2018). Unet++: A nested u-net architecture

for medical image segmentation. In Deep Learning

in Medical Image Analysis and Multimodal Learning

for Clinical Decision Support, DLMIA ML-CDS 2018,

volume 11045, pages 3–11. Springer, Cham.

Gam-UNet for Semantic Segmentation

531