A survey of deep RL and IL for autonomous driving policy learning
首先介绍5类结合了IL和RL的自动驾驶模型。First, a taxonomy of the literature studies is constructed from the system perspective, among which five modes of integration of DRL/DIL models into an AD architecture are identified.
其次介绍自动驾驶中具体的RL和IL任务和公式。Second, the formulations of DRL/DIL models for conducting specified AD tasks are comprehensively reviewed, where various designs on the model state and action spaces and the reinforcement learning rewards are covered.
最后介绍RL和IL如何解决自动驾驶模型与参与者和环境交互的安全问题。Finally, an in-depth review is conducted on how the critical issues of AD applications regarding driving safety, interaction with other traffic participants and uncertainty of the environment are addressed by the DRL/DIL models.
task-driven and problem-driven perspectives
- [1] C. Urmson and W. Whittaker, “Self-driving cars and the urban challenge,” IEEE Intelligent Systems, vol. 23, no. 2, pp. 66–68, 2008.
- [2] S. Thrun, “Toward robotic cars,” Communications of the ACM, vol. 53, no. 4, pp. 99–106, 2010.
- [3] A. Eskandarian, Handbook of intelligent vehicles. Springer, 2012, vol. 2.
- [4] S. M. Grigorescu, B. Trasnea, T. T. Cocias, and G. Macesanu, “A survey of deep learning techniques for autonomous driving,” J. Field Robotics, vol. 37, no. 3, pp. 362–386, 2020.
驾驶策略基于多个等级的抽象(multiple levels of abstraction),例如行为规划、运动规划和控制(behavior planning, motion planning and control)
- [13, 15] survey the motion planning and control methods of automated vehicles before the era of DL.
- [29–33] review general DRL/DIL methods without considering any particular applications.
- [4] addresses the deep learning techniques for AD with a focus on perception and control, while [34] addresses control only.
- [35] provides a taxonomy of AD tasks to which DRL models have been applied and highlights the key challenges.
- G. Shani, J. Pineau, and R. Kaplow, “A survey of point-based pomdp solvers,” Autonomous Agents and Multi-Agent Systems, vol. 27, no. 1, pp. 1–51, 2013.
- W. S. Lovejoy, “A survey of algorithmic methods for partially observed markov decision processes,” Annals of Operations Research, vol. 28, no. 1, pp. 47–65, 1991.
强化学习(reinforcement learning)和模仿学习(imitation learning)算法分类示意图:
learning from demonstrations (LfD)
- A demonstration dataset
,表示一系列的轨迹 是state-action pairs(状态行为对)的序列 - 专家策略
- 待优化的模仿策略
- A demonstration dataset
Behavior Clone (BC) :
BC在训练集中表现良好,但在泛化性上表现差,covariate shift [66, 67]
DAgger——S. Ross, G. J. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in International Conference on Artificial Intelligence and Statistics, ser. JMLR Proceedings, vol. 15, 2011, pp. 627–635.
SafeDAgger——J. Zhang and K. Cho, “Query-efficient imitation learning for end-to-end autonomous driving,” arXiv preprint arXiv:1605.06450, 2016.
Inverse Reinforcement Learning (IRL):
guided cost learning (GCL)——C. Finn, S. Levine, and P. Abbeel, “Guided cost learning: Deep inverse optimal control via policy optimization,” in International Conference on Machine Learning, 2016, pp. 49–58.
(it handles unknown dynamics in high-dimensional complex systems and learns complex neural network cost functions through an efficient sample-based approximation.)
Generative Adversarial Imitation Learning (GAIL):
Generative adversarial imitation learning (GAIL) [81] directly learns a policy from expert demonstrations while requiring neither the reward design in RL nor the expensive RL process in the inner loop of IRL.
,其中 是一个正则熵项,生成器和判别器通过下式更新: Fu et al.[84] proposed adversarial inverse reinforcement learning (AIRL) based on an adversarial reward learning formulation, which can recover reward functions that are robust to dynamics changes.
- J. Fu, K. Luo, and S. Levine, “Learning robust rewards with adversarial inverse reinforcement learning,” CoRR, vol. abs/1710.11248, 2017.
AD(Autonomous driving)系统的几个模块: