
part-based and prototype models tested here were
struggling to extract robust prototypes from the sup-
port set with less or incomplete information to learn
about the object class. A new training paradigm, re-
ferred to as BAM with attention, has been proposed.
In this approach, BAM model with an attention mod-
ule is re-trained in conjunction with attention module
and evaluated using highly augmented support im-
ages. Although, it still face challenges in extracting
robust features from the support set, it demonstrates
less confusion and greater capacity for generalization
compared to other models. We believe that our find-
ings can illuminate future investigations into the is-
sues of bias or semantic ambiguity problems.
REFERENCES
Catalano, N. and Matteucci, M. (2024). Few shot seman-
tic segmentation: a review of methodologies, bench-
marks, and open challenges.
Chen, L.-C., Barron, J. T., Papandreou, G., Murphy, K., and
Yuille, A. L. (2016). Semantic image segmentation
with task-specific edge detection using cnns and a dis-
criminatively trained domain transform. In Proceed-
ings of the IEEE conference on computer vision and
pattern recognition, pages 4545–4554.
Cheng, J., Wang, P.-s., Li, G., Hu, Q.-h., and Lu, H.-
q. (2018). Recent advances in efficient computa-
tion of deep convolutional neural networks. Frontiers
of Information Technology & Electronic Engineering,
19:64–77.
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., and
Wierstra, D. (2015). Draw: A recurrent neural net-
work for image generation. cite arxiv:1502.04623.
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B.,
Liu, T., Wang, X., Wang, G., Cai, J., et al. (2018). Re-
cent advances in convolutional neural networks. Pat-
tern recognition, 77:354–377.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep
Residual Learning for Image Recognition. In Pro-
ceedings of 2016 IEEE Conference on Computer Vi-
sion and Pattern Recognition, CVPR ’16, pages 770–
778. IEEE.
Hu, J., Shen, L., and Sun, G. (2018). Squeeze-and-
excitation networks. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 7132–7141.
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y.,
and Liu, W. (2019). Ccnet: Criss-cross attention
for semantic segmentation. In Proceedings of the
IEEE/CVF international conference on computer vi-
sion, pages 603–612.
Janocha, K. and Czarnecki, W. M. (2017). On loss func-
tions for deep neural networks in classification. arXiv
preprint arXiv:1702.05659.
Lang, C., Cheng, G., Tu, B., and Han, J. (2022). Learning
what not to segment: A new perspective on few-shot
segmentation. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 8057–8067.
Li, G., Jampani, V., Sevilla-Lara, L., Sun, D., Kim, J., and
Kim, J. (2021). Adaptive prototype learning and al-
location for few-shot segmentation. In Proceedings
of the IEEE/CVF conference on computer vision and
pattern recognition, pages 8334–8343.
Lin, G., Milan, A., Shen, C., and Reid, I. (2017). Refinenet:
Multi-path refinement networks for high-resolution
semantic segmentation. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 1925–1934.
Liu, Y., Zhang, X., Zhang, S., and He, X. (2020). Part-
aware prototype network for few-shot semantic seg-
mentation. In European Conference on Computer Vi-
sion, pages 142–158. Springer.
Lu, Z., He, S., Zhu, X., Zhang, L., Song, Y.-Z., and Xiang,
T. (2021). Simpler is better: Few-shot semantic seg-
mentation with classifier weight transformer. In ICCV.
Moradi, R., Berangi, R., and Minaei, B. (2020). A survey
of regularization strategies for deep models. Artificial
Intelligence Review, 53(6):3947–3986.
Rakelly, K., Shelhamer, E., Darrell, T., Efros, A. A.,
and Levine, S. (2018). Few-shot segmentation
propagation with guided networks. arXiv preprint
arXiv:1806.07373.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-
net: Convolutional networks for biomedical image
segmentation. In Medical image computing and
computer-assisted intervention–MICCAI 2015: 18th
international conference, Munich, Germany, October
5-9, 2015, proceedings, part III 18, pages 234–241.
Springer.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., and Batra, D. (2020). Grad-cam: visual
explanations from deep networks via gradient-based
localization. International journal of computer vision,
128:336–359.
Srivastava, R. K., Greff, K., and Schmidhuber, J. (2015).
Training very deep networks. Advances in neural in-
formation processing systems, 28.
Wang, K., Liew, J. H., Zou, Y., Zhou, D., and Feng, J.
(2019a). Panet: Few-shot image semantic segmen-
tation with prototype alignment. In The IEEE Inter-
national Conference on Computer Vision (ICCV).
Wang, X., Zhang, X., Cao, Y., Wang, W., Shen, C., and
Huang, T. (2023a). Seggpt: Segmenting everything in
context.
Wang, Y., Luo, N., and Zhang, T. (2023b). Focus on query:
Adversarial mining transformer for few-shot segmen-
tation. Advances in Neural Information Processing
Systems, 36:31524–31542.
Wang, Y.-X., Ramanan, D., and Hebert, M. (2019b). Meta-
learning to detect rare objects. In Proceedings of the
IEEE/CVF International Conference on Computer Vi-
sion, pages 9925–9934.
Woo, S., Park, J., Lee, J., and Kweon, I. S. (2018).
CBAM: convolutional block attention module. CoRR,
abs/1807.06521.
Beyond Data Augmentations: Generalization Abilities of Few-Shot Segmentation Models
437