Abstract:Aiming at the recognition difficulties of infrared small target detection caused by few pixels, weak texture and no color, an infrared small target detection algorithm for aerial unmanned aerial vehicle (UAV) is proposed. Based on multi-frame infrared sequences, the ResNet-50 network is used to extract deep features frame by frame, and the multi-scale deformable attention mechanism and cubic interpolation are combined to enhance the detail representation of small targets; the frame decoder and target decoder are designed to generate the spatiotemporal mask of the target instance through the video-level query vector. The Hungarian algorithm is used to optimize the classification, mask and similarity losses during training; the high-confidence query mask is fused during inference, and the inter-frame difference method is used to suppress static noise. On the DSAT dataset, the accuracy of the algorithm in this paper reaches 0.6356 and the F-score is 0.6475, and the performance is significantly improved. Through multi-scale feature fusion and temporal modeling, the algorithm effectively solves the problems of missed detection and false alarm of infrared small target detection algorithm, and provides a high-precision solution for UAV detection.