基于体素的全稀疏三维目标检测器
DOI:
CSTR:
作者:
作者单位:

1.东南大学自动化学院南京210096; 2.东南大学复杂工程系统测量与控制教育部重点实验室南京210096

作者简介:

通讯作者:

中图分类号:

TP391.4TH865

基金项目:

江苏省前沿引领技术基础研究专项(BK20192004C)、江苏省重大科技专项(BG2024003)项目资助


VoxelFSD: voxel-based fully sparse detector with sparse convolution for 3D object detection
Author:
Affiliation:

1.School of Automation, Southeast University, Nanjing 210096, China; 2.Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Southeast University, Nanjing 210096, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对目前基于体素的三维目标检测方法由于过于依赖密集二维骨干网络而导致在大范围点云感知上实时性不佳问题,提出了一种基于体素的全稀疏三维目标检测器VoxelFSD,有效提升在大范围点云上检测的实时性表现。该模型由3个关键部分组成:首先,并行卷积分支模块(PCB),扩大模型的感受野,充分提取物体特征,并且有效处理物体中心特征丢失对结果的影响;其次,稀疏候选框生成(SRPN)检测头,以稀疏的方式预测物体定位框,在点云模态下,相比密集预测方式能够减少冗余计算,从而提升模型在大范围点云预测中的计算效率;最后,注意力融合模块候选区域检测头(AFM-ROI),在二阶段检测中,利用交叉注意力机制有效融合提取的三维骨干特征和压缩后的鸟瞰图特征,进一步精炼物体特征,得到更好的检测效果。在现有基于体素检测框架上舍弃密集2D骨干,并引入PCB模块和SRPN检测头,提出了全稀疏结构的单阶段轻量级检测器VoxelFSD-S。VoxelFSD-S在速度和精度上相比现有基于体素的轻量化模型达到了更好的平衡,并且能够在大范围点云场景中满足实时性要求。在VoxelFSD-S基础上,进一步引入AFM-ROI提出了两阶段检测器VoxelFSD-T。VoxelFSD-T牺牲部分推理速度但能够显著提升模型精度。VoxelFSD-S和VoxelFSD-T在KITTI数据集测试集上精度分别达到77.67%和81.50%。

    Abstract:

    Voxel-based 3D object detection methods often suffer from poor real-time performance when processing large-scale LiDAR point clouds due to their heavy dependence on dense 2D backbone networks. In this paper, we propose VoxelFSD, a voxel-based fully sparse 3D object detector that significantly enhances the real-time capability of long-range detection. The model features three core components: Firstly, parallel convolutional branches (PCB), which expand the receptive field and comprehensively extract object features while mitigating the impact of missing object center features; Then, a sparse region proposal network (SRPN) head that predicts objects sparsely, reducing redundant computations compared to dense prediction and thus improving efficiency for large-scale point clouds; Finally, an ROI head with an attention fusion module (AFM-ROI) that employs cross-attention to effectively fuse 3D backbone features with compressed bird′s eye view (BEV) features in the second stage, refining object representation for improved detection accuracy. By removing the dense 2D backbone from traditional voxel-based detectors and integrating PCB and SRPN, we first present VoxelFSD-S, a fully sparse, single-stage, lightweight detector that achieves a superior balance between speed and accuracy relative to existing lightweight voxel-based models. Building upon VoxelFSD-S, we introduce VoxelFSD-T, a two-stage detector enhanced with AFM-ROI, which boosts accuracy with minimal additional computational cost. On the KITTI test set, VoxelFSD-S and VoxelFSD-T achieve accuracies of 77.67% and 81.50% , respectively.

    参考文献
    相似文献
    引证文献
引用本文

周伟典,洪濡,盖绍彦,达飞鹏.基于体素的全稀疏三维目标检测器[J].仪器仪表学报,2025,46(5):242-250

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-08-12
  • 出版日期:
文章二维码