M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction
本文尝试将一个 joint prediction problem 解耦为多个 marginal prediction problems,将轨迹交互的车辆作为pair进行考虑,然后使用marginal轨迹预测模型和条件预测模型获得轨迹的联合的似然度。
1 Introduction
联合轨迹预测可以避免车辆未来轨迹碰撞的情况,因此需要将车辆的特征放在一个公共模块内进行预测;
- 预测不通车辆的goals会面临goal随着车辆数指数增长的情况(一辆车通常几百个候选点);
- 后处理去除有碰撞的轨迹,临时方案;
M2I使用两个marginal分布相乘近似joint分布;该方案假设存在一个influencer和一个reactor;influencer行为独立,不考虑reactor;reactor则会考虑influencer的行为
使用 marginal 预测 influencer 轨迹,使用 conditional predictor 预测 reactor
使用启发式的方式预标注了车辆间的行为影响关系 pre-label the influencer-reactor relation based on a heuristic
在waymo open motion dataset 上取得了sota的成绩
2 Related work
- 为了处理多模态轨迹预测问题,可以使用GMMs,每个混合的分量代表了一种行为模态;
- 另一种方法不同于参数化预测的分布,一些生成式模型(GANs,VAEs)产生轨迹采样近似分布空间,但这些模型采样效率低,需要很多样本才能覆盖不同的驾驶场景;
- 一些模型预测 high-level intention,例如:
- goal targets,
- TPNet: Trajectory proposal network for motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6797–6806, 2020.
- GOHOME: Graphoriented heatmap output for future motion estimation. arXiv preprint arXiv:2109.01827, 2021
- DenseTNT: End-toend trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15303–15312, 2021
- 选道,
- LaPred: Lane-aware prediction of multimodal future trajectories of dynamic agents
- Learning to predict vehicle trajectories with model-based planning
- 机动动作(maneuver actions)
- A flexible and explainable vehicle motion prediction and inference framework combining semisupervised aog and st-lstm. IEEE Transactions on Intelligent Transportation Systems, 2020.
- Multi-modal trajectory prediction of surrounding vehicles with maneuver based lstms. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 1179–1184. IEEE, 2018.
- HYPER: Learned hybrid trajectory prediction via factored inference and adaptive sampling. In International Conference on Robotics and Automation (ICRA), 2022.
- Trajectory prediction with linguistic representations. In International Conference on Robotics and Automation (ICRA), 2022.
- goal targets,
2.1 Interactive Trajectory Prediction
手工设计的交互模型(hand-crafted interaction model),不能建模高度复杂的非线性交互过程
- social forces
- Social force model for pedestrian dynamics. Physical review E, 51(5):4282, 1995.
- energy functions
- Who are you with and where are you going? In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1345–1352, 2011.
- social forces
一些基于学习的模型可以取得更好的精度
FeiFei Li等人设计social pooling mechanisms 获取拥挤场景下周围邻近行人的影响
- Social LSTM: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 961–971, 2016.
- Social GAN: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2255–2264, 2018.
一些文章用GNN预测 agent-to-agent 的交互
- SpAGNN: Spatially-aware graph neural networks for relational behavior forecasting from sensor data. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 9491–9497. IEEE, 2020
- Implicit latent variable model for scene-consistent motion forecasting. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, 2020
- Social-STGCNN: A social spatiotemporal graph convolutional neural network for human trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14424–14432, 2020
一些文章利用 attention 和 transformer mechanisms 学习多智能体交互行为
SocialBiGAT: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. Advances in Neural Information Processing Systems, 32, 2019
End-to- end contextual perception and prediction with interaction transformer. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5784–5791. IEEE, 2020.
Scene Transformer: A unified architecture for predicting multiple agent trajectories. In International Conference on Learning Representations (ICLR), 2022.
2.2 Conditional Trajectory Prediction
即假设他车轨迹已知的情况下预测自车轨迹
3 Approach
问题定义:
得到观测量
,两个分量分别是地图信息和agents的状态; 目标是要去预测未来T个时刻的agents轨迹
利用marginal预测和条件预测近似表示
的联合分布为: 其中, 是Influencer, 是Reactor 如果两个agent没有交互,那么概率为:
算法框图:
如果交互的agent多余两个,则根据链式法则计算相关概率
其中,N表示交互的agent数量, 表示Influencer agents的集合 M2I采用了多个encoder-decoder结构,如图
- 实现的算法效果
总结
M2I整体思路比较新颖直接,先预测交互的车辆,然后预测Influencer的轨迹,再根据Influencer轨迹预测Reactor轨迹;但是在预测Reactor agent的轨迹时,只考虑单条Influencer的预测轨迹,缺失了多模态的信息;整体性能比scene transformer差,只有mAP指标相当,模型还是比较粗糙的;