Abstract:Infrared and visible image fusion aims to integrate complementary information from thermal radiation and reflected imaging across spectra, simultaneously highlighting salient targets and preserving texture details in complex scenes, thereby providing more comprehensive inputs for both human perception and machine vision. To further improve fusion image quality and its performance on downstream tasks, this paper proposes a segmentation and detection-driven infrared and visible image fusion network. The unified framework consists of a fusion network and two task-driven branches: a target discriminator and a segmentation branch, which guide the fusion network to retain richer high-level semantics through their respective loss functions. To enhance feature representation capabilities, we designed the dense connection and gradient residuals module (DCGRM) based on dense blocks for deep feature extraction. Furthermore, a large kernel attention (LKA) module is introduced in the decoding stage to focus on key regions and reduce information loss, thereby further improving the quality of fused images. Experiments on three public datasets demonstrate that the proposed method effectively integrates the complementary strengths of both modalities, highlighting salient targets while preserving rich details. It outperforms the compared methods in multiple fusion metrics and achieves real-time inference speed. Moreover, benefiting from its task-driven design, the proposed method also exhibits performance advantages on downstream vision tasks such as segmentation and detection.