基于Transformer背景建模与CAM细节增强的红外与可见图像融合
CSTR:
作者:
作者单位:

1南京邮电大学 电子与光学工程学院、柔性电子(未来技术)学院,江苏 南京 210023;2南京理工大学 电子工程与光电技术学院,江苏 南京210094;3南京邮电大学 自动化学院,江苏 南京 210023

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:


Infrared and visible image fusion based on transformer background modeling and CAM detail enhancement
Author:
Affiliation:

1College of Electronic and Optical Engineering & College of Flexible Electronics (Future Technology),Nanjing University of Posts and Telecommunications, Nanjing 210023, China;2College of Electronic Engineering and Optoelectronic Technology, Nanjing University of Science and Technology, Nanjing 210094, China;3College of Automation, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

Fund Project:

Supported by the Key Laboratory Fund of Yunnan Province, High-Quality Generation Network from Visible Images to Infrared and Low-Light Images (2025-LLDIVN-GD-01-06).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对红外与可见光图像融合中背景结构建模不足、细节纹理表达不充分的问题,提出一种基于Swin Transformer背景建模与残差通道注意力细节增强的双分支图像融合网络(Swin Transformer Background Modeling and Residual Channel Attention Detail Enhancement for Image Fusion, ST-RCAFuse),具体方法是:在编码阶段,设计背景与细节两个分支。其中,背景分支以Swin Transformer为核心,通过窗口自注意力机制 (Window-based Self-Attention, WSA)实现全局与局部背景结构的高效建模,并引入坐标注意力机制(Coordinate Attention, CoordAtt)以增强特征的空间方向性;细节分支采用残差通道注意力模块(Residual Channel Attention Block, RCAB)以提取纹理细节与高频信息。在解码阶段,背景和细节两类特征通过逐级融合与重建生成高质量融合图像。实验选择FLIR数据集进行网络训练,并构建了“域内+常规域外+极端场景”的多维度测试体系:选取FLIR测试集作为域内测试组验证同源数据分布下的基础性能,选取TNO、RoadScene数据集作为常规域外测试组验证跨场景泛化能力,同时采用夜间强光干扰测试集与极低照度野外测试集,全面评估模型在复杂恶劣环境下的鲁棒性。实验结果表明,所提ST-RCAFuse在常规公开数据集上融合视觉效果最优,信息熵、空间频率、标准差、平均梯度等核心指标均取得领先;在夜间强光干扰、极低照度野外等极端场景下,仍能有效抑制干扰、保留细节和增强目标,融合性能显著优于现有对比方法,充分验证了其在多场景、多极端条件下的优异泛化能力、鲁棒性与实用价值。

    Abstract:

    To address the problems of insufficient background structure modeling and inadequate detail texture representation in infrared and visible image fusion, a dual-branch image fusion network based on Swin Transformer Background Modeling and Residual Channel Attention Detail Enhancement (ST-RCAFuse) was proposed. The specific design method was as follows. In the encoding stage, two branches, namely a background branch and a detail branch, were designed. In the background branch, the Swin Transformer was adopted as the core component. The Window-based Self-Attention (WSA) mechanism was used to achieve efficient modeling of both global and local background structures. Coordinate Attention (CoordAtt) was introduced to enhance the spatial directionality of features. In the detail branch, the Residual Channel Attention Block (RCAB) was employed to extract texture details and high-frequency information. In the decoding stage, the background and detail features were progressively fused and reconstructed to generate high-quality fused images. The FLIR dataset was selected for network training, and a multi-dimensional evaluation framework consisting of in-domain, conventional out-of-domain, and extreme scenarios was established. Specifically, the FLIR test set was used as the in-domain test group to evaluate the baseline performance under the same data distribution. The TNO and RoadScene datasets were adopted as conventional out-of-domain test groups to assess cross-scene generalization capability. In addition, two extreme scenario test sets, including a nighttime strong illumination interference dataset and an ultra-low-light field dataset, were constructed to comprehensively evaluate the robustness of the model under complex and adverse conditions. The experimental results demonstrate that the proposed ST-RCAFuse achieves superior visual quality on standard public datasets, with leading performance in key metrics such as entropy, spatial frequency, standard deviation, and average gradient. Furthermore, under extreme conditions such as nighttime strong illumination interference and ultra-low-light field environments, the method effectively suppresses interference, preserves fine details and enhances salient targets. The fusion performance is significantly better than that of existing comparison methods, fully validating its excellent generalization capability, robustness, and practical value across diverse scenarios and challenging conditions.

    参考文献
    相似文献
    引证文献
引用本文
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2026-03-24
  • 最后修改日期:2026-04-24
  • 录用日期:2026-04-28
  • 在线发布日期: 2026-04-28
  • 出版日期:
文章二维码