一种基于多帧集合预测的面向空中无人机的红外小目标检测算法 |
投稿时间:2025-04-17 修订日期:2025-06-01 点此下载全文 |
引用本文: |
摘要点击次数: 13 |
全文下载次数: 0 |
|
|
中文摘要:红外小目标检测是计算机视觉中的重要研究课题,随着自主无人机的发展,空中无人机的检测变得越来越重要,近些年涌现出大量基于深度学习的检测算法,这些方法通常通过外观特征识别目标,然而,弱纹理和无颜色的目标给现存算法带来挑战。为此,该文提出一种针对空中无人机的红外小目标检测算法。具体而言,首先对于输入的多帧红外图像序列,利用主干网络(ResNet-50)逐帧提取深度特征;针对小目标分辨率低的问题,通过像素解码器融合多尺度可变形注意力机制(解析C5/C4/C3层级特征)与三次插值操作,恢复高分辨率逐像素嵌入特征,增强小目标的细节表征能力。接着,引入帧解码器和目标解码器构建时序关系:帧解码器通过可学习查询向量解析单帧特征,目标解码器利用视觉Transformer建模多帧间的时空关联,生成视频级查询向量以捕获目标实例的时空掩码。在训练阶段,模型通过匈牙利算法将预测结果与真实标签进行最优匹配并基于匹配结果联合优化分类损失、掩码损失及相似性损失,实现端到端训练。在线推断时,将视频划分为片段处理,融合高置信度查询的掩码输出最终检测结果,同时结合帧间差分法抑制静态背景噪声。该文在DSAT数据集上实验验证所提算法的有效性,实验结果精度达到0.6356,F得分达到0.6475,显著优于现存方法,表明所提算法可在复杂背景下对红外小目标精确定位。 |
中文关键词:目标检测 信号处理;实例分割;空中小目标;无人机检测 |
|
An infrared small target detection algorithm for aerial drones based on multi-frame set prediction |
|
|
Abstract:Infrared small object detection has become a crucial research area in aerial surveillance systems, particularly due to the growing strategic importance of detecting aerial drones as autonomous drone technology advances. Recent developments in computer vision have led to various deep learning-based detection algorithms, primarily relying on appearance-based features. However, existing methods struggle with targets that have weak textural patterns and chromatic deficiencies. To overcome these challenges, a novel multi-frame ensemble prediction framework for infrared small target detection in aerial drones was introduced in this paper. The proposed architecture consists of three key phases: A ResNet-50 backbone network extracts frame-level deep features from input sequences; a pixel decoder, enhanced with multi-scale deformable attention mechanisms (analyzing C5/C4/C3 hierarchical features) and bicubic interpolation operations, improves spatial resolution and enhances the features of small targets; a dual-decoder structure, including a frame decoder (using learnable query vectors) and a target decoder (employing a Vision Transformer), collaboratively establishes spatial-temporal correlations to generate video-level query vectors capturing spatiotemporal masks of target instances. During training, the Hungarian algorithm optimizes bipartite matching between predictions and ground truth annotations. This, combined with joint optimization of classification loss, mask loss, and similarity loss, enables end-to-end learning. For inference, the system implements video segmentation using adaptive mask fusion from high-confidence queries, along with frame difference techniques for static background suppression. Experiments on the DSAT dataset demonstrate the superiority of this approach, achieving a precision of 0.6356 and an F-score of 0.6475, significantly outperforming existing methods in complex backgrounds. These results highlight the effectiveness of the proposed framework in accurately detecting small infrared targets in challenging environments. |
keywords:object detection, signal processing, instance segmentation, small aerial objects, UAV Detection |
HTML> 查看/发表评论 下载PDF阅读器 |
|
|
|
|
|