Infrared and visible image fusion based on transformer background modeling and CAM detail enhancement
CSTR:
Author:
Affiliation:

1College of Electronic and Optical Engineering & College of Flexible Electronics (Future Technology),Nanjing University of Posts and Telecommunications, Nanjing 210023, China;2College of Electronic Engineering and Optoelectronic Technology, Nanjing University of Science and Technology, Nanjing 210094, China;3College of Automation, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

Clc Number:

TP391

Fund Project:

Supported by the Key Laboratory Fund of Yunnan Province, High-Quality Generation Network from Visible Images to Infrared and Low-Light Images (2025-LLDIVN-GD-01-06).

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    To address the problems of insufficient background structure modeling and inadequate detail texture representation in infrared and visible image fusion, a dual-branch image fusion network based on Swin Transformer Background Modeling and Residual Channel Attention Detail Enhancement (ST-RCAFuse) was proposed. The specific design method was as follows. In the encoding stage, two branches, namely a background branch and a detail branch, were designed. In the background branch, the Swin Transformer was adopted as the core component. The Window-based Self-Attention (WSA) mechanism was used to achieve efficient modeling of both global and local background structures. Coordinate Attention (CoordAtt) was introduced to enhance the spatial directionality of features. In the detail branch, the Residual Channel Attention Block (RCAB) was employed to extract texture details and high-frequency information. In the decoding stage, the background and detail features were progressively fused and reconstructed to generate high-quality fused images. The FLIR dataset was selected for network training, and a multi-dimensional evaluation framework consisting of in-domain, conventional out-of-domain, and extreme scenarios was established. Specifically, the FLIR test set was used as the in-domain test group to evaluate the baseline performance under the same data distribution. The TNO and RoadScene datasets were adopted as conventional out-of-domain test groups to assess cross-scene generalization capability. In addition, two extreme scenario test sets, including a nighttime strong illumination interference dataset and an ultra-low-light field dataset, were constructed to comprehensively evaluate the robustness of the model under complex and adverse conditions. The experimental results demonstrate that the proposed ST-RCAFuse achieves superior visual quality on standard public datasets, with leading performance in key metrics such as entropy, spatial frequency, standard deviation, and average gradient. Furthermore, under extreme conditions such as nighttime strong illumination interference and ultra-low-light field environments, the method effectively suppresses interference, preserves fine details and enhances salient targets. The fusion performance is significantly better than that of existing comparison methods, fully validating its excellent generalization capability, robustness, and practical value across diverse scenarios and challenging conditions.

    Reference
    Related
    Cited by
Get Citation
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 24,2026
  • Revised:April 24,2026
  • Adopted:April 28,2026
  • Online: April 28,2026
  • Published:
Article QR Code