on Intelligent Transportation Systems, 24(9):10118–
Li, H. and Wu, X.-J. (2018). Densefuse: A fusion approach
to infrared and visible images. IEEE Transactions on
Image Processing, 28(5):2614–2623.
Li, J., Zhu, J., Li, C., Chen, X., and Yang, B. (2022). Cgtf:
Convolution-guided transformer for infrared and visi-
ble image fusion. IEEE Transactions on Instrumenta-
tion and Measurement, 71:1–14.
Li, Y., Wang, J., Miao, Z., and Wang, J. (2020). Unsuper-
vised dense attention network for infrared and visi-
ble image fusion. Multimedia Tools and Applications,
Liang, L., Shen, X., and Gao, Z. (2024). Ifici: Infrared
and visible image fusion based on interactive compen-
sation illumination. Infrared Physics & Technology,
Liu, J., Liu, Z., Wu, G., Ma, L., Liu, R., Zhong, W., Luo,
Z., and Fan, X. (2023). Multi-interactive feature learn-
ing and a full-time multi-modality benchmark for im-
age fusion and segmentation. In Proceedings of the
IEEE/CVF international conference on computer vi-
sion, pages 8115–8124.
Liu, K., Li, M., Chen, C., Rao, C., Zuo, E., Wang, Y., Yan,
Z., Wang, B., Chen, C., and Lv, X. (2024a). Dsfu-
sion: Infrared and visible image fusion method com-
bining detail and scene information. Pattern Recogni-
tion, page 110633.
Liu, X., Huo, H., Li, J., Pang, S., and Zheng, B. (2024b).
A semantic-driven coupled network for infrared and
visible image fusion. Inf. Fusion, 108(C).
Liu, X., Xu, X., Xie, J., Li, P., Wei, J., and Sang, Y.
(2024c). Fdenet: Fusion depth semantics and edge-
attention information for multispectral pedestrian de-
tection. IEEE Robotics and Automation Letters,
Ma, J., Liang, P., Yu, W., Chen, C., Guo, X., Wu, J., and
Jiang, J. (2020). Infrared and visible image fusion
via detail preserving adversarial learning. Information
Fusion, 54:85–98.
Ma, J., Tang, L., Fan, F., Huang, J., Mei, X., and Ma, Y.
(2022). Swinfusion: Cross-domain long-range learn-
ing for general image fusion via swin transformer.
IEEE/CAA Journal of Automatica Sinica, 9(7):1200–
Ma, J., Yu, W., Liang, P., Li, C., and Jiang, J. (2019). Fu-
siongan: A generative adversarial network for infrared
and visible image fusion. Information fusion, 48:11–
Mustafa, H. T., Yang, J., Mustafa, H., and Zareapoor, M.
(2020). Infrared and visible image fusion based on
dilated residual attention network. Optik, 224:165409.
Peng, C., Tian, T., Chen, C., Guo, X., and Ma, J. (2021).
Bilateral attention decoder: A lightweight decoder for
real-time semantic segmentation. Neural Networks,
Sun, Y., Cao, B., Zhu, P., and Hu, Q. (2022). Detfusion: A
detection-driven infrared and visible image fusion net-
work. In Proceedings of the 30th ACM international
conference on multimedia, pages 4003–4011.
Sun, Y., Fu, Z., Sun, C., Hu, Y., and Zhang, S. (2021).
Deep multimodal fusion network for semantic seg-
mentation using remote sensing image and lidar data.
IEEE Transactions on Geoscience and Remote Sens-
ing, 60:1–18.
Tang, L., Deng, Y., Ma, Y., Huang, J., and Ma, J. (2022a).
Superfusion: A versatile image registration and fusion
network with semantic awareness. IEEE/CAA Journal
of Automatica Sinica, 9(12):2121–2137.
Tang, L., Yuan, J., and Ma, J. (2022b). Image fusion in
the loop of high-level vision tasks: A semantic-aware
real-time infrared and visible image fusion network.
Information Fusion, 82:28–42.
Tang, L., Yuan, J., Zhang, H., Jiang, X., and Ma, J. (2022c).
Piafusion: A progressive infrared and visible image
fusion network based on illumination aware. Infor-
mation Fusion, 83:79–92.
Tang, L., Zhang, H., Xu, H., and Ma, J. (2023). Rethink-
ing the necessity of image fusion in high-level vision
tasks: A practical infrared and visible image fusion
network based on progressive semantic injection and
scene fidelity. Information Fusion, 99:101870.
Velesaca, H., Bastidas, G., Rouhani, M., and Sappa, A.
(2024). Multimodal image registration techniques: a
comprehensive survey. Multimedia Tools and Appli-
cations, 83:1–29.
Wang, C., Yang, G., Sun, D., Zuo, J., Wang, E., and Wang,
L. (2021). Frequency domain fusion algorithm of in-
frared and visible image based on compressed sensing
for video surveillance forensics. In 2021 IEEE 20th
International Conference on Trust, Security and Pri-
vacy in Computing and Communications (TrustCom),
pages 832–839. IEEE.
Wang, Y., Li, G., and Liu, Z. (2023). Sgfnet: semantic-
guided fusion network for rgb-thermal semantic seg-
mentation. IEEE Transactions on Circuits and Sys-
tems for Video Technology, 33(12):7737–7748.
Wang, Z., Wu, Y., Wang, J., Xu, J., and Shao, W. (2022).
Res2fusion: Infrared and visible image fusion based
on dense res2net and double nonlocal attention mod-
els. IEEE Transactions on Instrumentation and Mea-
surement, 71:1–12.
Xiao, Y., Meng, F., Wu, Q., Xu, L., He, M., and Li, H.
(2024). Gm-detr: Generalized muiltispectral detection
transformer with efficient fusion encoder for visible-
infrared detection. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion, pages 5541–5549.
Xu, D., Zhang, N., Zhang, Y., Li, Z., Zhao, Z., and Wang,
Y. (2022). Multi-scale unsupervised network for in-
frared and visible image fusion based on joint at-
tention mechanism. Infrared Physics & Technology,
Yan, H., Zhang, C., and Wu, M. (2022). Lawin transformer:
Improving semantic segmentation transformer with
multi-scale representations via large window atten-
tion. arXiv preprint arXiv:2201.01615.
Yang, B., Hu, Y., Liu, X., and Li, J. (2024). Cefusion: An
infrared and visible image fusion network based on
cross-modal multi-granularity information interaction
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications