
on Intelligent Transportation Systems, 24(9):10118–
10137.
Li, H. and Wu, X.-J. (2018). Densefuse: A fusion approach
to infrared and visible images. IEEE Transactions on
Image Processing, 28(5):2614–2623.
Li, J., Zhu, J., Li, C., Chen, X., and Yang, B. (2022). Cgtf:
Convolution-guided transformer for infrared and visi-
ble image fusion. IEEE Transactions on Instrumenta-
tion and Measurement, 71:1–14.
Li, Y., Wang, J., Miao, Z., and Wang, J. (2020). Unsuper-
vised dense attention network for infrared and visi-
ble image fusion. Multimedia Tools and Applications,
79(45):34685–34696.
Liang, L., Shen, X., and Gao, Z. (2024). Ifici: Infrared
and visible image fusion based on interactive compen-
sation illumination. Infrared Physics & Technology,
136:105078.
Liu, J., Liu, Z., Wu, G., Ma, L., Liu, R., Zhong, W., Luo,
Z., and Fan, X. (2023). Multi-interactive feature learn-
ing and a full-time multi-modality benchmark for im-
age fusion and segmentation. In Proceedings of the
IEEE/CVF international conference on computer vi-
sion, pages 8115–8124.
Liu, K., Li, M., Chen, C., Rao, C., Zuo, E., Wang, Y., Yan,
Z., Wang, B., Chen, C., and Lv, X. (2024a). Dsfu-
sion: Infrared and visible image fusion method com-
bining detail and scene information. Pattern Recogni-
tion, page 110633.
Liu, X., Huo, H., Li, J., Pang, S., and Zheng, B. (2024b).
A semantic-driven coupled network for infrared and
visible image fusion. Inf. Fusion, 108(C).
Liu, X., Xu, X., Xie, J., Li, P., Wei, J., and Sang, Y.
(2024c). Fdenet: Fusion depth semantics and edge-
attention information for multispectral pedestrian de-
tection. IEEE Robotics and Automation Letters,
9(6):5441–5448.
Ma, J., Liang, P., Yu, W., Chen, C., Guo, X., Wu, J., and
Jiang, J. (2020). Infrared and visible image fusion
via detail preserving adversarial learning. Information
Fusion, 54:85–98.
Ma, J., Tang, L., Fan, F., Huang, J., Mei, X., and Ma, Y.
(2022). Swinfusion: Cross-domain long-range learn-
ing for general image fusion via swin transformer.
IEEE/CAA Journal of Automatica Sinica, 9(7):1200–
1217.
Ma, J., Yu, W., Liang, P., Li, C., and Jiang, J. (2019). Fu-
siongan: A generative adversarial network for infrared
and visible image fusion. Information fusion, 48:11–
26.
Mustafa, H. T., Yang, J., Mustafa, H., and Zareapoor, M.
(2020). Infrared and visible image fusion based on
dilated residual attention network. Optik, 224:165409.
Peng, C., Tian, T., Chen, C., Guo, X., and Ma, J. (2021).
Bilateral attention decoder: A lightweight decoder for
real-time semantic segmentation. Neural Networks,
137:188–199.
Sun, Y., Cao, B., Zhu, P., and Hu, Q. (2022). Detfusion: A
detection-driven infrared and visible image fusion net-
work. In Proceedings of the 30th ACM international
conference on multimedia, pages 4003–4011.
Sun, Y., Fu, Z., Sun, C., Hu, Y., and Zhang, S. (2021).
Deep multimodal fusion network for semantic seg-
mentation using remote sensing image and lidar data.
IEEE Transactions on Geoscience and Remote Sens-
ing, 60:1–18.
Tang, L., Deng, Y., Ma, Y., Huang, J., and Ma, J. (2022a).
Superfusion: A versatile image registration and fusion
network with semantic awareness. IEEE/CAA Journal
of Automatica Sinica, 9(12):2121–2137.
Tang, L., Yuan, J., and Ma, J. (2022b). Image fusion in
the loop of high-level vision tasks: A semantic-aware
real-time infrared and visible image fusion network.
Information Fusion, 82:28–42.
Tang, L., Yuan, J., Zhang, H., Jiang, X., and Ma, J. (2022c).
Piafusion: A progressive infrared and visible image
fusion network based on illumination aware. Infor-
mation Fusion, 83:79–92.
Tang, L., Zhang, H., Xu, H., and Ma, J. (2023). Rethink-
ing the necessity of image fusion in high-level vision
tasks: A practical infrared and visible image fusion
network based on progressive semantic injection and
scene fidelity. Information Fusion, 99:101870.
Velesaca, H., Bastidas, G., Rouhani, M., and Sappa, A.
(2024). Multimodal image registration techniques: a
comprehensive survey. Multimedia Tools and Appli-
cations, 83:1–29.
Wang, C., Yang, G., Sun, D., Zuo, J., Wang, E., and Wang,
L. (2021). Frequency domain fusion algorithm of in-
frared and visible image based on compressed sensing
for video surveillance forensics. In 2021 IEEE 20th
International Conference on Trust, Security and Pri-
vacy in Computing and Communications (TrustCom),
pages 832–839. IEEE.
Wang, Y., Li, G., and Liu, Z. (2023). Sgfnet: semantic-
guided fusion network for rgb-thermal semantic seg-
mentation. IEEE Transactions on Circuits and Sys-
tems for Video Technology, 33(12):7737–7748.
Wang, Z., Wu, Y., Wang, J., Xu, J., and Shao, W. (2022).
Res2fusion: Infrared and visible image fusion based
on dense res2net and double nonlocal attention mod-
els. IEEE Transactions on Instrumentation and Mea-
surement, 71:1–12.
Xiao, Y., Meng, F., Wu, Q., Xu, L., He, M., and Li, H.
(2024). Gm-detr: Generalized muiltispectral detection
transformer with efficient fusion encoder for visible-
infrared detection. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion, pages 5541–5549.
Xu, D., Zhang, N., Zhang, Y., Li, Z., Zhao, Z., and Wang,
Y. (2022). Multi-scale unsupervised network for in-
frared and visible image fusion based on joint at-
tention mechanism. Infrared Physics & Technology,
125:104242.
Yan, H., Zhang, C., and Wu, M. (2022). Lawin transformer:
Improving semantic segmentation transformer with
multi-scale representations via large window atten-
tion. arXiv preprint arXiv:2201.01615.
Yang, B., Hu, Y., Liu, X., and Li, J. (2024). Cefusion: An
infrared and visible image fusion network based on
cross-modal multi-granularity information interaction
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
186