Txing

欢迎来到 | 伽蓝之堂

0%

Autonomous Driving | VAD: Vectorized Scene Representation for Efficient Autonomous Driving

  • 本文是地平线发表的一篇论文,放弃栅格而使用向量表达场景信息,提出了一个端到端模型,在nuScenes上取得了sota效果
  • code:https://github.com/hustvl/VAD

1. Introduction

  • 场景表征的两种方式对比
  • 一些端到端模型直接学习原始的传感器数据,输出规划结果,没有进行场景表征,这会导致缺少可解释性;
  • 栅格化虽然简单,但是损失了实例级的信息,计算量大
  • VAD模型实现了sota的算法效果,降低了碰撞概率,运行速度更快;
  • Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In ECCV, 2020

Perception

  • BEV表征的模型:
    • LSS: Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In ECCV, 2020.
    • BEVFormer: Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270, 2022
    • MapTR:Maptr: Structured modeling and learning for online vectorized hd map construction. arXiv preprint arXiv:2208.14437, 2022

Motion Prediction

  • Vip3d: End-toend visual trajectory prediction via 3d agent queries. arXiv preprint arXiv:2208.01582, 2022

  • Fiery: Future instance prediction in bird’seye view from surround monocular cameras. In ICCV, 2021

  • Perceive, interact, predict: Learning dynamic and static clues for end-to-end motion prediction. arXiv preprint arXiv:2212.02181, 2022.

  • Unified perception and prediction in birds-eye-view for vision-centric autonomous driving. arXiv preprint arXiv:2205.09743, 2022.

Planning

  • 忽略感知和运动预测,直接预测规划轨迹和控制信号,直接而简单,缺少可解释性,难优化
    • Exploring the limitations of behavior cloning for autonomous driving. In ICCV, 2019
    • Multimodal fusion transformer for end-to-end autonomous driving. In CVPR, 2021
  • 强化学习方法
    • Gri: General reinforced imitation and its application to vision-based autonomous driving. arXiv preprint arXiv:2111.08575, 2021.
    • Learning to drive from a world on rails. In ICCV, 2021
    • End-to-end model-free reinforcement learning for urban driving using implicit affordances. In CVPR, 2020.
  • dense cost map
  • Mp3: A unified model to map, perceive, predict and plan. In CVPR, 2021.
  • Lookout: Diverse multi-future prediction and planning for self-driving. In ICCV, 2021
  • Plant: Explainable planning transformers via object-level representations. arXiv preprint arXiv:2210.14222, 2022.

3. Method

  • VAD用查询BEV特征的方式获得各个元素的向量表征

3.1. Vectorized Scene Learning

Vectorized Map

  • VAD使用map query 从BEV特征中获取地图信息,并越策地图向量 ,其中, 表示预测地图向量的数量, 表示每个地图向量中点的数量;
  • 只标记三种地图元素:
    • lane divider
    • road boundary
    • pedestrian crossing

Vectorized Agent Motion

  • a group of agent queries

  • Overall architecture of VAD

3.2. Planning via Interaction

Ego-Agent Interaction

  • 随机初始化的ego query ,和agent queries在transformer decoder中交互
  • 参数说明:
  • ego position
  • agent positions
  • a single layer MLP
  • query position embedding
  • key position embedding

Ego-Map Interaction

  • updated ego query
  • map queries

Planning Head

  • driving commands

    • turn left
    • turn right
    • go straight
  • planning trajectory

  • the current status of the ego vehicle (optional) as ego features

  • Illustration of Vectorized Planning Constraints

Ego-Agent Collision Constraint

  • Specifically, we first filter out low-confidence agent predictions by a threshold

  • 使用多模态预测轨迹中概率最高的轨迹为最终输出

其中,表示计算的是横向或纵向的指标,表示到最近的车辆的距离,表示距离阈值;表示碰撞loss

Ego-Boundary Overstepping Constraint

  • 该项loss使得车辆保持在可行驶区域内行驶

其中, is the map boundary threshold;表示第t个点到最近车道线的距离;

Ego-Lane Directional Constraint

  • first, we filter out low-confidence map predictions according to

  • Then we find the closest lane divider vector

  • Finally, the loss for this constraint is the angular difference averaged over time between the lane vector and the ego vector最后,该约束的损失是车道矢量和ego矢量之间的角度差随时间的平均值: 其中,表示规划的自车向量;表示向量之间的角度误差

3.4. End-to-End Learning

Vectorized Scene Learning Loss

  • 使用曼哈顿距离(Manhattan distance)度量地图点的回归损失,predicted map points and the ground truth map points

  • 使用focal loss求地图分类的损失,总的map loss定义为

  • 对于运动预测,使用 损失,预测智能体分布,再用focal loss预测agent类别,对于预测轨迹,最小化 minFDE作为场景表征的预测; 再用 损失计算生成的轨迹和gt轨迹的回归误差,再用focal loss作为多模态运动分类的损失,所有的运动预测损失表示为

Vectorized Constraint Loss

该损失包括3.2提到的三个损失:和他车距离,车道边界,行驶方向

Imitation Learning Loss

预测轨迹和gt轨迹对比计算模仿损失: 最终,在端到端训练过程中的损失为:

4. Experiments

nuScenes 1000 个场景,每个场景20s数据,23类合计1.4M 3Dbounding box;过去2s预测未来3s

Open-loop planning performance
消融实验
闭环测试和运行速度
Qualitative results of VAD

总结

  • 一个挺有意思的端到端框架,简单直接,声称效果好于UniAD,具体实现需要参考代码