Abstract:The result of infrared and visible image fusion should highlight the significant targets of the infrared image while preserving the visible light texture details. In order to satisfy the above requirements, this paper proposes an automated encoder-based infrared and visible image fusion method. The encoder constructs both a base encoder and a detail encoder according to the optimization objective. The base encoder extracts low-frequency information from the image, while the detail encoder captures high-frequency information. Since this extraction method may miss some information, we introduce a compensation encoder to supplement the missing information. Additionally, we introduce multi-scale decomposition for the encoder to extract image features more comprehensively. The image features obtained by the encoders are then fed into the decoder. The decoder first adds the low-frequency, high-frequency and compensatory information to obtain multi-scale features. An attention map is derived from these multi-scale features and multiplied with the fused image at the corresponding scale. The Fusion module is introduced in the multi-scale fusion process to achieve image reconstruction. The network proposed in this paper demonstrates its effectiveness on the TNO, RoadScene, and LLVIP datasets. Experiments show that our network can better perceive changes in light, effectively extract image detail information, and produce fused images that are more aligned with human visual perception.