Autonomous Driving | M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction

M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction

本文尝试将一个 joint prediction problem 解耦为多个 marginal prediction problems，将轨迹交互的车辆作为pair进行考虑，然后使用marginal轨迹预测模型和条件预测模型获得轨迹的联合的似然度。

1 Introduction

联合轨迹预测可以避免车辆未来轨迹碰撞的情况，因此需要将车辆的特征放在一个公共模块内进行预测；
- 预测不通车辆的goals会面临goal随着车辆数指数增长的情况（一辆车通常几百个候选点）；
- 后处理去除有碰撞的轨迹，临时方案；
M2I使用两个marginal分布相乘近似joint分布；该方案假设存在一个influencer和一个reactor；influencer行为独立，不考虑reactor；reactor则会考虑influencer的行为
使用 marginal 预测 influencer 轨迹，使用 conditional predictor 预测 reactor
使用启发式的方式预标注了车辆间的行为影响关系 pre-label the influencer-reactor relation based on a heuristic
在waymo open motion dataset 上取得了sota的成绩

为了处理多模态轨迹预测问题，可以使用GMMs，每个混合的分量代表了一种行为模态；
另一种方法不同于参数化预测的分布，一些生成式模型（GANs，VAEs）产生轨迹采样近似分布空间，但这些模型采样效率低，需要很多样本才能覆盖不同的驾驶场景；
一些模型预测 high-level intention，例如：
- goal targets，
  - TPNet: Trajectory proposal network for motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6797–6806, 2020.
  - GOHOME: Graphoriented heatmap output for future motion estimation. arXiv preprint arXiv:2109.01827, 2021
  - DenseTNT: End-toend trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15303–15312, 2021
- 选道，
  - LaPred: Lane-aware prediction of multimodal future trajectories of dynamic agents
  - Learning to predict vehicle trajectories with model-based planning
- 机动动作（maneuver actions）
  - A flexible and explainable vehicle motion prediction and inference framework combining semisupervised aog and st-lstm. IEEE Transactions on Intelligent Transportation Systems, 2020.
  - Multi-modal trajectory prediction of surrounding vehicles with maneuver based lstms. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 1179–1184. IEEE, 2018.
  - HYPER: Learned hybrid trajectory prediction via factored inference and adaptive sampling. In International Conference on Robotics and Automation (ICRA), 2022.
  - Trajectory prediction with linguistic representations. In International Conference on Robotics and Automation (ICRA), 2022.

2.1 Interactive Trajectory Prediction

手工设计的交互模型（hand-crafted interaction model），不能建模高度复杂的非线性交互过程
- social forces
  - Social force model for pedestrian dynamics. Physical review E, 51(5):4282, 1995.
- energy functions
  - Who are you with and where are you going? In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1345–1352, 2011.
一些基于学习的模型可以取得更好的精度
- FeiFei Li等人设计social pooling mechanisms 获取拥挤场景下周围邻近行人的影响
  - Social LSTM: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 961–971, 2016.
  - Social GAN: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2255–2264, 2018.
- 一些文章用GNN预测 agent-to-agent 的交互
  - SpAGNN: Spatially-aware graph neural networks for relational behavior forecasting from sensor data. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 9491–9497. IEEE, 2020
  - Implicit latent variable model for scene-consistent motion forecasting. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 2020
  - Social-STGCNN: A social spatiotemporal graph convolutional neural network for human trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14424–14432, 2020
- 一些文章利用 attention 和 transformer mechanisms 学习多智能体交互行为
- SocialBiGAT: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. Advances in Neural Information Processing Systems, 32, 2019
- End-to- end contextual perception and prediction with interaction transformer. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5784–5791. IEEE, 2020.
- Scene Transformer: A unified architecture for predicting multiple agent trajectories. In International Conference on Learning Representations (ICLR), 2022.

2.2 Conditional Trajectory Prediction

即假设他车轨迹已知的情况下预测自车轨迹

3 Approach

问题定义：

得到观测量，两个分量分别是地图信息和agents的状态；
目标是要去预测未来T个时刻的agents轨迹
利用marginal预测和条件预测近似表示的联合分布为：其中，是Influencer，是Reactor
如果两个agent没有交互，那么概率为：
算法框图：
如果交互的agent多余两个，则根据链式法则计算相关概率其中，N表示交互的agent数量，表示Influencer agents的集合
M2I采用了多个encoder-decoder结构，如图

M2I includes three models that share the same context encoder. The relation predictor includes a relation prediction head to predict distribution over relation types. The marginal predictor adopts a trajectory prediction head to produce multi-modal prediction samples. The conditional trajectory predictor takes an augmented scene context input as the influencer future trajectory

实现的算法效果

Joint metrics on the interactive validation and test set. The best performed metrics are bolded and the grey cells indicate the ranking metric used by the WOMD benchmark. M2I outperforms both Waymo baselines and challenge winners. Compared to the current state-of-the art model SceneTransformer, it improves the mAP metric by a large margin over vehicles and all agents, demonstrating its advantage in learning a more accurate probability distribution and producing fewer false positive predictions.

总结

M2I整体思路比较新颖直接，先预测交互的车辆，然后预测Influencer的轨迹，再根据Influencer轨迹预测Reactor轨迹；但是在预测Reactor agent的轨迹时，只考虑单条Influencer的预测轨迹，缺失了多模态的信息；整体性能比scene transformer差，只有mAP指标相当，模型还是比较粗糙的；

M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction

1 Introduction

2 Related work

2.1 Interactive Trajectory Prediction

2.2 Conditional Trajectory Prediction

3 Approach

总结