Abstract:Existing deep learning methods have achieved significant results in infrared small target detection, but their high computational cost makes them unsuitable for resource-constrained scenarios. There is an urgent need to explore knowledge distillation methods that can balance light weightiness and high accuracy to improve the operational efficiency of infrared small target detection networks. However, due to the extreme characteristics of infrared small targets, conventional distillation methods for infrared detection networks suffer from the loss and diffusion of knowledge about small targets during knowledge transfer and the mismatch of hierarchical feature representations between teacher and student networks. This impairs the student network"s ability to learn features of small targets, hindering further improvement in detection capabilities. To address these issues, this paper proposes a feature distillation method guided by a multi-scale spatial attention mechanism. First, a multi-scale spatial attention (MSA) mechanism is designed to capture and fuse multi-scale information of target features, thereby effectively acquiring the target region. Then, an L2 normalization strategy for features is designed to address the differences in feature distribution between teacher and student networks. Finally, an adaptive weighted mean square error (AWMSE) loss function is proposed to guide the student network to strengthen its learning of key target regions. Experimental results on two recognized datasets (NUDT-SIRST, NUAA-SIRST) demonstrate that the proposed distillation method achieves superior detection performance, with the student network even matching the detection performance of the teacher network. Furthermore, the lightweight model after distillation achieves more than 2x inference acceleration when deployed on HUAWEI and NVIDIA edge devices.