Autonomous Driving | Query-Centric Trajectory Prediction

QCNet在场景encoder中设计了一个query-centric范式，允许复用过去计算出来的表征特征；其在所有目标agent之间共享不变的场景特征进一步允许多agent轨迹并行decode；
其次，即使给定丰富的场景编码，现有的解码策略也很难捕获agent未来行为固有的多模态，特别是当预测范围很长时；
To tackle this challenge, we first employ anchor-free queries to generate trajectory proposals in a recurrent fashion, which allows the model to utilize different scene contexts when decoding waypoints at different horizons
Our approach ranks 1 st on Argoverse 1 and Argoverse 2 motion forecasting benchmarks, outperforming all methods on all main metrics by a large margin. Meanwhile, our model can achieve streaming scene encoding and parallel multi-agent decoding thanks to the query-centric design ethos
code：https://github.com/ZikangZhou/QCNet

问题1：预测的场景编码解码效率低，用factorized attention-based Transformers能提高精度和效率
- Wayformer: Motion forecasting via simple & efficient attention networks. arXiv preprint arXiv:2207.05844, 2022.
- Scene transformer: A unified architecture for predicting multiple agent trajectories. In Proceedings of the International Conference on Learning Representations (ICLR), 2022.
- Hivt: Hierarchical vector transformer for multi-agent motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
问题2：预测不确定性巨大，长期预测困难；
场景中大量元素在临近帧被重复编码，因此使用query-centric的方式提高处理效率
场景中的所有元素都有独立的全局坐标，这允许在处理不同的agent的时候能复用元素的编码，甚至是以并行的方式进行解码；
提出了一种迭代的方式进行预测；
结合了灵活的anchor-free和anchor-based方法；先用anchor-free方法生成anchors，然后用anchor-based的方法细化轨迹；

Mode2Scene and Mode2Mode Attention

Reference Frames of Mode Queries

参考了DETR的思想：End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision (ECCV), 2020

code：https://github.com/facebookresearch/detr

QCNet的整体思路并不复杂，但文章写的时候不是很清晰，很多公式细节也没有给出，需要结合代码进一步了解

Txing