一种改进的鸟瞰图视角下相机/ 激光雷达融合感知算法
DOI:
CSTR:
作者:
作者单位:

1.微惯性仪表与先进导航技术教育部重点实验室南京210096; 2.东南大学仪器科学与工程学院南京210096

作者简介:

通讯作者:

中图分类号:

TH701

基金项目:


An improved camera-LiDAR fusion perception algorithm in the bird′s-eye view perspective
Author:
Affiliation:

1.Key Laboratory of Microinertial Instrument and Advanced Navigation Technology, Ministry of Education, Nanjing 210096, China; 2.School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在自动驾驶感知任务中,通过将不同模态的信息投影到统一的空间表示,实现基于鸟瞰图的相机和激光雷达特征多模态融合已成为主流研究范式。虽然BEVFusion等代表性框架能够实现较高的三维目标检测精度,但其在二维图像特征向BEV空间的视角转换过程中依赖深度预测,该模块不仅模型复杂、参数冗余,还存在推理效率低、内存消耗高等问题,对硬件资源提出了较高的要求,限制了模型在边缘设备或资源受限场景中的部署与应用。针对上述问题,在BEVFusion框架基础上,围绕视角转换过程的精度与效率瓶颈展开研究,提出了一种融合相机与激光雷达信息的BEV视觉特征优化算法。该算法利用激光雷达提供的深度信息替代图像深度预测,通过将其嵌入图像特征表达过程,实现对原有视角转换路径的结构性简化,并对BEV空间构建与池化模块进行了精简重构,有效降低了计算复杂度。实验结果表明,在保持三维物体检测精度不变的前提下,优化后方案将关键模块推理时间缩短至原方案的16%,端到端推理速度提升83%,峰值显存占用降低27%,同时显著减轻了对输入图像分辨率的限制,增强了模型对算力资源的适应能力,提升了其在实际部署中的可行性。

    Abstract:

    In autonomous driving perception tasks, a multi-modal fusion of camera and LiDAR features based on a bird′s-eye view has become a mainstream research paradigm to combine information from different modalities into a unified spatial representation. Although representative frameworks such as BEVFusion achieve high 3D object detection accuracy, they rely heavily on depth prediction during the perspective transformation from 2D image features to the BEV space. This depth module is often complex, parameter-intensive, and results in low inference efficiency and high memory consumption, posing challenges for deployment on edge devices or resource-constrained platforms. To address these issues, we build upon the BEVFusion framework and focus on improving the accuracy and efficiency of the perspective transformation process. A BEV visual feature optimization algorithm is proposed, which integrates camera and LiDAR information by embedding LiDAR-provided depth data into the image feature representation, replacing the original depth prediction module. Additionally, the BEV space construction and pooling modules are restructured for computational efficiency. Experimental results show that, without compromising 3D detection accuracy, the proposed method reduces the inference time of key modules to 16% of the original, improves end-to-end inference speed by 83%, and lowers peak memory usage by 27%. It also significantly reduces sensitivity to input image resolution, enhancing adaptability to varying compute resources and improving deployment feasibility in real-world applications.

    参考文献
    相似文献
    引证文献
引用本文

夏若炎,徐晓苏.一种改进的鸟瞰图视角下相机/ 激光雷达融合感知算法[J].仪器仪表学报,2025,46(5):170-182

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-08-12
  • 出版日期:
文章二维码