CNN结合Transformer的高光谱图像和 LiDAR 数据协同地物分类方法
DOI:
CSTR:
作者:
作者单位:

哈尔滨理工大学黑龙江省激光光谱技术及应用重点实验室哈尔滨150080

作者简介:

通讯作者:

中图分类号:

TH761

基金项目:

黑龙江省重点研发计划项目(JD2023SJ19)资助


Collaborative land classification method using CNN combined with Transformer for hyperspectral images and LiDAR data
Author:
Affiliation:

Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在高光谱图像与LiDAR数据协同分类的研究领域中,尽管CNN和Transformer在图像处理和数据分析中分别展现出对局部特征和全局依赖关系的敏锐洞察力,但二者的协同机制尚未充分挖掘,跨模态特征互补潜力未被有效释放。故提出了一种CNN结合Transformer的高光谱图像和LiDAR数据的多模态遥感数据协同地物分类方法。首先,该模型通过主成分分析对高光谱图像进行降维处理以去除光谱的冗余信息,继而利用CNN分层捕获局部纹理特征,同时借助Transformer自注意力机制构建全局光谱-空间表征。然后通过双向特征交互机制,将Transformer输出的全局上下文信息注入CNN特征通道,同时将CNN通道提取的局部细节反馈至Transformer支路,经特征耦合单元实现跨尺度特征对齐,强化模型对高光谱图像全局结构与局部细节的联合提取能力。对于LiDAR数据,采用动态卷积级联模块有效捕获高程信息和上下文关系,最终通过跨模态特征融合模块实现双源数据特征的深度交互与融合,在双模态语义互补中提升复杂地物的分类精度。在Houston2013、Trento和Augsburg这3个公开数据集上的实验表明,该方法总体分类精度分别达到99.85%、99.68%和97.34%,平均准确率分别达到99.87%、99.34%和90.60%,较GLT、HCT等主流方法的分类精度有所提高,充分证明所提方法进行多模态数据协同分类的优势和有效性。

    Abstract:

    In the field of collaborative classification between hyperspectral images and LiDAR data, although CNN and Transformer have shown keen insight into local features and global dependencies in image processing and data analysis, their collaborative mechanisms have not been fully explored, and the potential for cross-modal feature complementarity has not been effectively unleashed. Therefore, this article proposes a multimodal collaborative land-cover classification method for remote sensing data that combines CNN with Transformer for hyperspectral images and LiDAR data. Firstly, the model performs dimensionality reduction on hyperspectral images through principal component analysis to remove redundant spectral information. Then, it uses CNN layers to capture local texture features, and constructs a global spectral-spatial representation using the Transformer self-attention mechanism. Then, through a bidirectional feature interaction mechanism, the global contextual information from the Transformer is injected into the CNN feature channels, while the local details extracted by the CNN are fed back into the Transformer branch. Cross-scale feature alignment is achieved through the feature coupling unit, enhancing the joint extraction ability of the model for the global structure and local details of hyperspectral images. For LiDAR data, a dynamic convolution cascade module is used to effectively capture elevation information and contextual relationships. Finally, a cross-modal feature fusion module is used to achieve deep interaction and fusion of dual source data features, improving the classification accuracy of complex land features in the complementary semantics of dual modalities. Experiments on three publicly available datasets—Houston 2013, Trento, and Augsburg—showed that the overall classification accuracy of our proposed method reached 99.85%, 99.68%, and 97.34%, respectively, with average accuracies of 99.87%, 99.34%, and 90.60%. This improvement in classification accuracy compared to mainstream methods such as GLT and HCT fully demonstrates the advantages and effectiveness of our proposed method for multimodal data collaborative classification.

    参考文献
    相似文献
    引证文献
引用本文

吴海滨,左云逸,王爱丽,吕浩然,王敏慧. CNN结合Transformer的高光谱图像和 LiDAR 数据协同地物分类方法[J].仪器仪表学报,2025,46(8):286-301

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-11-07
  • 出版日期:
文章二维码