Txing

欢迎来到 | 伽蓝之堂

0%

Autonomous Driving | Query-Centric Trajectory Prediction

  • QCNet在场景encoder中设计了一个query-centric范式,允许复用过去计算出来的表征特征;其在所有目标agent之间共享不变的场景特征进一步允许多agent轨迹并行decode;
  • 其次,即使给定丰富的场景编码,现有的解码策略也很难捕获agent未来行为固有的多模态,特别是当预测范围很长时;
  • To tackle this challenge, we first employ anchor-free queries to generate trajectory proposals in a recurrent fashion, which allows the model to utilize different scene contexts when decoding waypoints at different horizons
  • Our approach ranks 1 st on Argoverse 1 and Argoverse 2 motion forecasting benchmarks, outperforming all methods on all main metrics by a large margin. Meanwhile, our model can achieve streaming scene encoding and parallel multi-agent decoding thanks to the query-centric design ethos
  • code:https://github.com/ZikangZhou/QCNet

1. Introduction

  • 问题1:预测的场景编码解码效率低,用factorized attention-based Transformers能提高精度和效率

    • Wayformer: Motion forecasting via simple & efficient attention networks. arXiv preprint arXiv:2207.05844, 2022.
    • Scene transformer: A unified architecture for predicting multiple agent trajectories. In Proceedings of the International Conference on Learning Representations (ICLR), 2022.
    • Hivt: Hierarchical vector transformer for multi-agent motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
  • 问题2:预测不确定性巨大,长期预测困难;

  • 场景中大量元素在临近帧被重复编码,因此使用query-centric的方式提高处理效率

    Illustration of our query-centric reference frame
  • 场景中的所有元素都有独立的全局坐标,这允许在处理不同的agent的时候能复用元素的编码,甚至是以并行的方式进行解码;

  • 提出了一种迭代的方式进行预测;

  • 结合了灵活的anchor-free和anchor-based方法;先用anchor-free方法生成anchors,然后用anchor-based的方法细化轨迹;

  • Scene context fusion

  • Multimodal future distribution

3. Approach

3.1. Input and Output Formulation

3.2. Query-Centric Scene Context Encoding

Overview of the encoder in an online mode
Overview of the decoding pipeline

3.3. Query-Based Trajectory Decoding

Mode2Scene and Mode2Mode Attention

Reference Frames of Mode Queries

Anchor-Free Trajectory Proposal
  • 参考了DETR的思想:End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision (ECCV), 2020

    code:https://github.com/facebookresearch/detr

Anchor-Based Trajectory Refinement

3.4. Training Objectives

  • 参考HiVT的设计,参数化i个agent的未来轨迹为混合 Laplace 分布: 其中, 表示混合系数;表示Laplace分布的位置参数和尺度参数;

  • 损失函数:

    • Laplace分布的分类损失,负对数似然loss:
    • 轨迹精调loss:
    • 轨迹提案损失:
  • Argoverse2 算法对比

总结

QCNet的整体思路并不复杂,但文章写的时候不是很清晰,很多公式细节也没有给出,需要结合代码进一步了解