Lightweight Remote Sensing Multimodal Large Language Model Based on Knowledge Distillation
DOI:
CSTR:
Author:
Affiliation:

Fudan University

Clc Number:

Fund Project:

National Key Research and Development Program of China

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Remote sensing multimodal large language models (MLLMs), which integrate rich visual-linguistic modal information, have shown great potential in areas such as remote sensing image analysis and interpretation. However, existing knowledge distillation methods primarily focus on the compression of unimodal large language models, neglecting the alignment of features across modalities, thus hindering the performance of large language models in cross-modal tasks. To address this issue, a lightweighting method for remote sensing MLLMs based on knowledge distillation is proposed. This method achieves effective alignment of multimodal information by aligning the outputs across modalities at the feature level. By introducing the reverse Kullback-Leibler divergence as the loss function and combining optimization strategies such as teacher mixed sampling and single-step decomposition, the generalization and stability of the student model are further enhanced. Experimental results demonstrate that the proposed method achieves higher accuracy and efficiency in four downstream tasks of remote sensing image scene classification, visual question answering, visual localization, and image description, significantly reducing the number of model parameters and the demand for computational resources, thereby providing a new solution for the efficient application of MLLMs in the field of remote sensing.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:November 14,2024
  • Revised:December 19,2024
  • Adopted:December 31,2024
  • Online:
  • Published:
Article QR Code