基于自监督预训练与跨尺度对比学习的多模态遥感图像融合
DOI:
作者:
作者单位:

复旦大学 信息学院 电子工程系

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划项目(课题四的编号:2022YFB3903404)


Multimodal Remote Sensing Image Fusion Based on Self-supervised Pre-training and Cross-scale Contrastive Learning
Author:
Affiliation:

Fudan University

Fund Project:

National Key Research and Development Program of China under Grant 2022YFB3903404

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    自监督预训练方法具有强大的特征提取和模型迁移能力,然而,目前多模态遥感图像融合中的预训练方法只对所提取多模态特征进行拼接等操作实现简单融合,而未针对多模态信息的融合设计专有模块,导致多模态互补信息融合不充分;其次,这些方法未考虑和利用遥感图像内部的跨尺度一致性先验,导致其对多模态遥感信息的提取和整合有限,因而使得各种下游任务的性能有待提高。针对上述问题,提出一种基于自监督预训练与跨尺度对比学习的多模态遥感图像融合方法,主要包括三部分:1)通过引入交叉注意力融合机制初步融合不同模态提取的特征,再借助于编码器模块进一步提取特征,从而实现各模态互补信息的显式聚合和提取;2)通过引入跨模态融合机制,使每种模态能从所有模态的特征中提取有用的补充信息,分别解码后重构各模态输入;3)基于遥感图像的跨尺度一致性约束,引入跨尺度对比学习,以增强对单模态信息的提取,实现更鲁棒的预训练。在多个公开多模态遥感图像融合数据集上的实验结果表明,与现有方法相比,所提出算法在多种下游任务中均取得了显著的性能提升,在Globe230k数据集上达到了79.01的平均交并比(mIoU)、92.56 的总体准确率(OA)和 88.05 的平均 F1 分数(mF1),且具有扩展性好、超参数易设置的优点。

    Abstract:

    Self-supervised pre-training methods have strong capabilities in feature extraction and model transfer. However, current pre-training methods in multimodal remote sensing image (RSI) fusion only perform simple fusion operations such as concatenation on the extracted multimodal features without designing dedicated modules for the integration of multimodal information, leading to insufficient fusion of complementary information across modalities. Secondly, these methods do not consider and utilize the cross-scale consistency priors within RSIs, resulting in limited extraction and integration of multimodal remote sensing information, and thus the performance of various downstream tasks needs to be improved. In response to the above issues, a multimodal RSI fusion method based on self-supervised pre-training and cross-scale contrastive learning is proposed, which mainly includes three parts: 1) By introducing a cross-attention fusion mechanism to preliminarily integrate features extracted from different modalities, and then using encoder modules to further extract features, explicit aggregation and extraction of complementary information from each modality are achieved; 2) By introducing a cross-modality fusion mechanism, each modality can extract useful supplementary information from the features of all modalities, and reconstruct each modality’s input after separate decoding; 3) Based on the cross-scale consistency constraints of RSIs, cross-scale contrastive learning is introduced to enhance the extraction of single-modality information, achieving more robust pre-training. Experimental results on multiple public multimodal RSI fusion datasets demonstrate that, compared with existing methods, the proposed algorithm has achieved significant performance improvements in various downstream tasks. On the Globe230k dataset, our method achieves an average intersection over union (mIoU) of 79.01%, an overall accuracy (OA) of 92.56%, and an average F1 score (mF1) of 88.05%, and it has the advantages of good scalability and easy hyperparameter setting.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-11-03
  • 最后修改日期:2024-12-06
  • 录用日期:2024-12-12
  • 在线发布日期:
  • 出版日期:
文章二维码