结合YOLOv8和多模态特征融合的3D目标检测算法
DOI:
CSTR:
作者:
作者单位:

贵州大学

作者简介:

通讯作者:

中图分类号:

TN958.98

基金项目:

贵州省科学技术基金资助项目(黔科合基础[2016]1054)


A 3D object detection algorithm on YOLOv8 combined with multimodal feature fusion
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对点云与图像的空间维度不同导致多模态特征难以对齐的问题,提出融合YOLOv8的多模态3D目标检测网络。首先,设计基于YOLOv8的数据增强模块将图像映射到三维空间,生成与点云对齐的伪云,并使用冻结权重的YOLOv8增强点云和伪云。然后,构建双流编码器并行提取多模态特征。最后,设计基于注意力机制的感兴趣区域(region of interest,RoI)特征融合模块和基于门控的RoI特征融合模块来聚合多模态RoI特征。在KITTI验证集上,本文提出的算法在困难级别对汽车、行人和骑行者的3D平均精度分别达到79.28%、58.70%和76.04%,较原始算法分别提高0.62%、3.07%和7.54%,验证了本文算法的有效性。

    Abstract:

    Aiming at the problem of the difficulty of aligning multimodal features caused by the different spatial dimensions of point clouds and images, we propose a 3D object detection algorithm on YOLOv8 combined with multimodal feature fusion. First, using the YOLOv8-based data enhancement module to map the image to 3D space, we generate a pseudo-cloud aligned with the point cloud and enhance the point cloud and pseudo-cloud using YOLOv8 with frozen weights. Then, a dual-stream encoder is constructed to extract multimodal features in parallel. Finally, an attention mechanism-based RoI fusion module and a RoI gating fusion module are designed to aggregate multimodal RoI features. On the KITTI validation set, the proposed algorithm achieves better performance of a 3D average accuracy of 79.28%, 58.70%, and 76.04% for cars, pedestrians, and cyclists at the difficult level, boosting 0.62%, 3.07%, and 7.54% over the existing algorithm. These results illustrate the clear advantages of our method.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-12-19
  • 最后修改日期:2025-01-03
  • 录用日期:2025-01-07
  • 在线发布日期:
  • 出版日期:
文章二维码
×
《国外电子测量技术》
2025年投稿方式有变