基于梯度算子和注意力的多模态融合目标检测
DOI:
CSTR:
作者:
作者单位:

哈尔滨工程大学智能科学与工程学院哈尔滨150000

作者简介:

通讯作者:

中图分类号:

TH741TP391.41

基金项目:

江淮前沿技术协同创新中心追梦基金课题(2023ZM01Z025)项目资助


Multi-modal fusion object detection based on gradient operator and attention
Author:
Affiliation:

School of Intelligent Science and Engineering, Harbin Engineering University, Harbin 150000, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    红外与可见光图像具有很好的互补特性,可以利用这2种模态图像的融合来适应自动驾驶等领域对于目标检测高精度和高鲁棒性的要求。现有多模态目标检测算法往往模型庞大,推理耗时长,无法在边缘设备上部署,而采用直接融合等方法又无法充分发挥不同模态的优势,因此提出了一种基于梯度算子和注意力机制的融合目标检测算法。引入梯度算子设计定制化卷积来捕获图像纹理;红外支路引入坐标注意力发挥其目标定位优势;引入权重生成网络对2个模态的特征进行自适应加权融合。算法结构模块化,轻量化,适合部署在边缘设备上。在数据集上实验,得到mAP@0.50和mAP@0.5∶0.95指标值比可见光单模态检测提升了6.3%和7.2%,比红外提升了11.3%和9.8%。推理帧率可达22.7,满足实时性要求。

    Abstract:

    Infrared and visible images have good complementary characteristics, and the fusion of these two modal images can be used to meet the requirements of high accuracy and high robustness of target detection in automatic driving and other fields. The existing multimodal object detection algorithms often have large models and long reasoning time, which cannot be deployed on edge devices, and the direct fusion method cannot give full play to the advantages of different modalities. Therefore, we propose a fusion object detection algorithm based on gradient operator and attention mechanism. The gradient operator was introduced to design a customized convolution to capture the image texture. The infrared branch introduces coordinate attention to play its advantage of target positioning. The weight generation network is introduced to adaptively weight the features of the two modalities. The algorithm structure is modular and lightweight, which is suitable for deployment on edge devices. Experiments on the dataset show that the mAP@0.50 and mAP@0.5∶0.95 index values are 6.3% and 7.2% higher than the single mode detection of visible light, and 11.3% and 9.8% higher than the infrared. The inference frame rate can reach 22.7, which meets the real-time requirement.

    参考文献
    相似文献
    引证文献
引用本文

李学钊,王伟,薛冰.基于梯度算子和注意力的多模态融合目标检测[J].仪器仪表学报,2024,45(11):224-232

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-01-26
  • 出版日期:
文章二维码