- QCNet在场景encoder中设计了一个query-centric范式,允许复用过去计算出来的表征特征;其在所有目标agent之间共享不变的场景特征进一步允许多agent轨迹并行decode;
- 其次,即使给定丰富的场景编码,现有的解码策略也很难捕获agent未来行为固有的多模态,特别是当预测范围很长时;
- To tackle this challenge, we first employ anchor-free queries to generate trajectory proposals in a recurrent fashion, which allows the model to utilize different scene contexts when decoding waypoints at different horizons
- Our approach ranks 1 st on Argoverse 1 and Argoverse 2 motion forecasting benchmarks, outperforming all methods on all main metrics by a large margin. Meanwhile, our model can achieve streaming scene encoding and parallel multi-agent decoding thanks to the query-centric design ethos
- code:https://github.com/ZikangZhou/QCNet
1. Introduction
问题1:预测的场景编码解码效率低,用factorized attention-based Transformers能提高精度和效率
- Wayformer: Motion forecasting via simple & efficient attention networks. arXiv preprint arXiv:2207.05844, 2022.
- Scene transformer: A unified architecture for predicting multiple agent trajectories. In Proceedings of the International Conference on Learning Representations (ICLR), 2022.
- Hivt: Hierarchical vector transformer for multi-agent motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
问题2:预测不确定性巨大,长期预测困难;
场景中大量元素在临近帧被重复编码,因此使用query-centric的方式提高处理效率
场景中的所有元素都有独立的全局坐标,这允许在处理不同的agent的时候能复用元素的编码,甚至是以并行的方式进行解码;
提出了一种迭代的方式进行预测;
结合了灵活的anchor-free和anchor-based方法;先用anchor-free方法生成anchors,然后用anchor-based的方法细化轨迹;
2. Related Work
Scene context fusion
Multimodal future distribution
3. Approach
3.1. Input and Output Formulation
3.2. Query-Centric Scene Context Encoding
3.3. Query-Based Trajectory Decoding
Mode2Scene and Mode2Mode Attention
Reference Frames of Mode Queries
Anchor-Free Trajectory Proposal
参考了DETR的思想:End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision (ECCV), 2020
code:https://github.com/facebookresearch/detr
Anchor-Based Trajectory Refinement
3.4. Training Objectives
参考HiVT的设计,参数化i个agent的未来轨迹为混合 Laplace 分布:
其中, 表示混合系数; 表示Laplace分布的位置参数和尺度参数; 损失函数:
- Laplace分布的分类损失,负对数似然loss:
- 轨迹精调loss:
- 轨迹提案损失:
- Laplace分布的分类损失,负对数似然loss:
总结
QCNet的整体思路并不复杂,但文章写的时候不是很清晰,很多公式细节也没有给出,需要结合代码进一步了解