1. Introduction
- 端到端自动驾驶模型使用原始的传感器输入生成车辆的运动规划和预测;
- 端到端自动驾驶模型当前的挑战:
- multi-modality
- interpretability
- causal confusion
- robustness
- world models
- 端到端的方法可以分为强化学习和模仿学习两大类
1.1 Motivation of an End-to-end system
- 传统流程中,每个模型只处理一个具体的任务,感知做检测,提升mAP(mean average precision);规划生成安全舒适的轨迹等;
- 分模块的处理会导致信息损失,多模型的部署造成计算障碍和趋向次优结果;
- 端到端的优势:
- 感知预测和规划联合训练,简化流程并且方便;
- 系统的中间表征通过最终的任务来优化;
- 共享的backbone提升了计算效率;
- 数据驱动的优化迭代优化方便;
- 端到端模型并不意味着一个只有规划或者控制输出的黑箱模型,它也可以有中间状态和表征的输出,
- “Mp3: A unified model to map, perceive, predict and plan,” in CVPR,2021.
- “Planning-oriented autonomous driving,” in CVPR, 2023.
1.2 Roadmap
- 模仿学习范式:
- “Urban driving with conditional imitation learning,” in ICRA, 2020.
- “Exploring the limitations of behavior cloning for autonomous driving,” in ICCV, 2019.
- “Learning to drive by imitation: An overview of deep behavior cloning methods,” TIV, 2020.
- “A survey on imitation learning techniques for end-to-end autonomous vehicles,” TITS, 2022.
- “Imitation learning: Progress, taxonomies and challenges,” TNNLS, 2022.
- 强化学习范式:
- "Learning to drive in a day," in ICRA, 2019.
- “Cirl: Controllable imitative reinforcement learning for vision-based self-driving,” in ECCV, 2018.
- "End-toend model-free reinforcement learning for urban driving using implicit affordances," in CVPR, 2020.
- "Gri: General reinforced imitation and its application to vision-based autonomous driving," arXiv.org, vol. 2111.08575, 2021.
- “A survey of deep RL and IL for autonomous driving policy learning,” TITS, 2021.
- “Deep reinforcement learning for autonomous driving: A survey,” TITS, 2021.
- The policy distillation paradigm proposed in LBC:“Learning by cheating,” in CoRL, 2020.
- 仿真平台:carla and nuPlan
1.3 Comparison to Related Surveys
- “Computer vision for autonomous vehicles: Problems, datasets and state-of-the-art,” arXiv.org, vol. 1704.05519, 2017.
- “A survey of end-to-end driving: Architectures and training methods,” TNNLS, 2020.
- “Motion planning for autonomous driving: The state of the art and future perspectives,” arXiv.org, vol. 2303.09824, 2023.
- “A review of end-to-end autonomous driving in urban environments,” IEEE Access, 2022.
- “Learning to drive by imitation: An overview of deep behavior cloning methods," TIV, 2020.
- “A survey on imitation learning techniques for end-to-end autonomous vehicles,” TITS, 2022.
- “Imitation learning: Progress, taxonomies and challenges,” TNNLS, 2022.
- “A survey of deep RL and IL for autonomous driving policy learning,” TITS, 2021.
- “Deep reinforcement learning for autonomous driving: A survey,” TITS, 2021.
2 Methods
2.1 Imitation Learning
- 模仿学习通过专家示教学习专家的行为策略;最典型的算法例如行为克隆(behacior
cloneing, BC)
- “A framework for behavioural cloning,” in Machine Intelligence 15, 1995.
- Inverse Optimal Control (IOC), also known as Inverse Reinforcement
Learning (IRL) 也是一种根据专家示教学习的方法,只不过学习的是奖励函数;
- “Maximum entropy inverse reinforcement learning,” in AAAI, 2008.
2.1.1 Behavior Cloning
- 训练目标是匹配智能体的行为策略,通过最小化定义的loss,在监督学习的框架下训练;
- 模型的损失定义为
,其中 表示学到策略和专家策略的差异 - BC的优势在于简单高效,不需要进行手工奖励设计;
- 面临的问题:
- 训练中假设了样本独立同分布(iid),这会导致问题:covariate shift
- 处理方法:
- DAgger: “A reduction of imitation learning and structured prediction to no-regret online learning,” in AISTATS, 2011.
- “Active imitation learning: Formal and practical reductions to iid learning,” JMLR, 2014.
- “Efficient reductions for imitation learning,” in AISTATS, 2010.
- “Reinforcement and imitation learning via interactive no-regret learning,” arXiv.org, vol. 1406.5979, 2014.
- 模型错误评估了状态和特征之间的相关性,造成因果混淆(causal
confusion):
- 处理方法:
- “Fighting copycat agents in behavioral cloning from observation histories,” in NIPS, 2020.
- “Keyframe-focused visual imitation learning,” in ICML, 2021.
- “Object-aware regularization for addressing causal confusion in imitation learning,” in NeurIPS, 2021.
- “Fighting fire with fire: avoiding dnn shortcuts through priming,” in ICML, 2022.
- 处理方法:
- 训练中假设了样本独立同分布(iid),这会导致问题:covariate shift
2.1.2 Inverse Optimal Control
传统的IOC算法通过专家示教的MDP学习未知的奖励函数;而专家的奖励函数可以表示为特征的线性组合
- “Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations,” in ICML, 2019.
- “Guided cost learning: Deep inverse optimal control via policy optimization,” in ICML, 2016.
- “Sqil: Imitation learning via reinforcement learning with sparse rewards,” arXiv.org, vol. 1905.11108, 2019.
- “Self-imitation learning by planning,” in ICRA, 2021.
但是在自动驾驶场景,奖励是隐式的并且不便于优化
Generative Adversarial Imitation Learning (GAIL),用对抗的方式区分专家和学到的策略,概念和GAN类似;
使用非学习的算法采样轨迹
并最小化损失;因此问题就变成了两步,如何设计损失?如何采样轨迹并用端到端的方式优化,如下图所示
2.2 Reinforcement Learning
- RL方法会允许潜在的不安全的行为出现,行为探索;要求的数据量大于监督学习;因此绝大部分RL算法停留在仿真阶段
- 没有RL端到端训练模型的报告,可能是因为获得的梯度信息不足
- 在carla上取得sota的RL模型:
- “End-toend model-free reinforcement learning for urban driving using implicit affordances,” in CVPR, 2020.
- “Gri: General reinforced imitation and its application to vision-based autonomous driving,” arXiv.org, vol. 2111.08575, 2021.
- RL的难点在于如何从仿真到实际系统中:
- 模型需要稠密的奖励信号,在每一个step提供反馈
- 当前的奖励函数比较简单,例如保持前进 and 避免碰撞,及其线性组合
- 这些奖励过于简单,鼓励了危险的行为,因此受到批评
- “Reward (mis)design for autonomous driving,” arXiv.org, vol. 2104.13906, 2021.
- RL于world model相结合比较容易:
- “Dream to control: Learning behaviors by latent imagination,” in ICLR, 2020.
- “Mastering atari with discrete world models,” in ICLR, 2021.
- “Recurrent world models facilitate policy evolution,” in NeurIPS, 2018.
3 Benchmarking
- 目前仿真环境、指标、数据集上都没有对齐,需要做的有两个方向:
- 在线/闭环的仿真评测;
- 离线/开环的人驾数据上的评测;
3.1 Online Evaluation (Closed-loop)
- 仿真评估的三个子任务:
- 参数初始化
- 交通流仿真
- 传感器仿真
3.1.1 Parameter Initialization
仿真器需要配置许多参数,例如3d条件、天气、光照等,还有一些低维的额属性,例如物体在传感器中的位姿;
当前的仿真器处理这些问题通过两个方式:
- 程序生成(Procedural
Generation):之前的3d场景是认为搭建的,现在可以使用算法按照参数设置来自动生成;但这非常耗时,还需要大量专业的知识;Procedural
generation algorithms combine rules, heuristics, and randomization to
create diverse road networks, traffic patterns, lighting conditions, and
object placements
- “Scenic: a language for scenario specification and scene generation,” in PLDI, 2019
- “Did we test all scenarios for automated and autonomous driving systems?,” in ITSC, 2019.
- 程序生成(Procedural
Generation):之前的3d场景是认为搭建的,现在可以使用算法按照参数设置来自动生成;但这非常耗时,还需要大量专业的知识;Procedural
generation algorithms combine rules, heuristics, and randomization to
create diverse road networks, traffic patterns, lighting conditions, and
object placements
数据驱动(Data-Driven):从数据中学习参数的初始化值。最简单的方法是从log数据中获取仿真初始化参数;但是这些log数据中可能很难包括极端的case;另一个方案是利用模型学习真实世界数据的潜在结构和分布,然后这些数据可以用于生成全新的场景;
3.1.2 Traffic Simulation
交通流仿真顾名思义是要在仿真环境中模拟一些交通参与者,并给他们赋予真实的行为;
交通参与者通常包括:卡车、小汽车、摩托车、自行车、行人等;
一般有2中方式来生成这些障碍物:
Rule-Based:
仿真器用预先定义的规则来生成交通参与者的行为;这种直接方式生成的方式可能不会那么真实;IDM系统( Intelligent Driver Model (IDM))是这一方案的典型代表,基于车辆的速度、加速度、前方车辆的速度、期望的安全距离来设计后车的行为;但这中方案也不足以仿真城市车流中的复杂交互行为;
- "CARLA: An open urban driving simulator," in CoRL, 2017
- "Congested traffic states in empirical observations and microscopic simulations," Physical review E, 2000.
Data-Driven:
现实人驾车流的行为是高度交互和复杂的,包括变道、汇入、急停等;数据驱动的交通仿真器会利用真实世界的驾驶数据来建模这些行为;
但是为了学习这些复杂的行为,需要大量标注数据进行训练;
- “Simnet: Learning reactive self-driving simulations from realworld observations,” in ICRA, 2021.
- “Trafficgen: Learning to generate diverse and realistic traffic scenarios,” in ICRA, 2023.
- “Trafficsim: Learning to simulate realistic multi-agent behaviors,” in CVPR, 2021.
- “Guided conditional diffusion for controllable traffic simulation,” in ICRA, 2023.
- “Bits: Bi-level imitation for traffic simulation,” in ICRA, 2023.
- “TrafficBots: Towards world models for autonomous driving simulation and motion prediction,” in ICRA, 2023.
3.1.3 Sensor Simulation
如何生成原始的传感器数据也有两种方案:
- Graphics-Based:
- 图形化的方案使用3d模型近似真实的物理场景,但这个方法依赖繁重的场景计算、难以并行优化、以及3d模型的精度;
- “Synthetic datasets for autonomous driving: A survey,” arXiv.org, vol. 2304.12205, 2023.
- 图形化的方案使用3d模型近似真实的物理场景,但这个方法依赖繁重的场景计算、难以并行优化、以及3d模型的精度;
- Data-Driven:
- 数据驱动的方案使用真实的传感器数据来训练模型,在仿真的场景中仿真的传感器数据可能和自车以不同的方式移动;典型的方案是使用
Neural Radiance Fields (NeRF)
,通过学习真实场景的集合表征,生成新视角的数据;
- “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020.
- 相比于基于图形化的方法,该方法可以生成更加真实的数据;但它也额外需要单独的训练过程和较长的渲染时间;
- 另一个仿真的方案是使用domain adaptation的方式,最小化真实数据和Graphics-Based的数据的分布偏移,再用GAN或者style transfer技术提升场景的真实性;
- 数据驱动的方案使用真实的传感器数据来训练模型,在仿真的场景中仿真的传感器数据可能和自车以不同的方式移动;典型的方案是使用
Neural Radiance Fields (NeRF)
,通过学习真实场景的集合表征,生成新视角的数据;
3.1.4 Benchmarks
- 一些开源的仿真环境:
3.2 Offline Evaluation (Open-loop)
与预先记录的专家的行为做对比,因此要求评估数据中包括以下信息:
- sensor readings
- goal locations
- corresponding future driving trajectories
开环评估的优势包括:
- 不需要仿真器,方案易于实现
- 使用真实的车流的传感数据
缺点:
- 并不是在部署模型的真实测试数据分布中进行度量
- 和真值轨迹对比的方式不适合多模态轨迹的场景(例如:提前/延后汇入变道车道都是可行的)
- 预测的轨迹依赖未来的观测信息(例如:在即将变红的绿灯前停车)
- 轨迹可能超出专家轨迹所在车道;
- 要求一个复杂的轨迹数据集;(nuScenes,Argoverse,Waymo,nuPlan)
4 Challenges
4.1 Input Modality
4.1.1 Multi-sensor Fusion
一些传感器类型和融合方案:
RGB images:丰富的语义视觉信息
LiDARs or stereo cameras:立体视觉信息
speedometers and IMUs:车速、加速度信息
不同的传感器有不同的视角和数据分布,在融合过程中会造成巨大的gap
多传感器融合主要是在感知领域被讨论的,包括object detection,tracking,semantic segmentation等,并且主要分为三种方案:前/中/后融合
- Early fusion:在特征提取器之前将传感信息结合;之后放入共享的特征提取器中;
- late fusion:将多个模态输入提取特征后融合起来,但是效果不好;
- middle fusion:稀疏编码输入,在网络内进行融合,例如用transformer架构进行特征融合
4.1.2 Language as Input
当前的language-guided navigation works基本在机器人或者仿真器中验证了效果,但是缺少包含有意义的语言提示的大规模基准。
4.2 Visual Abstraction
- 城市驾驶环境中视觉输入和video game的相比高度不同,一般采用预训练的方法获得视觉encoder
4.3 World Model and Model-based RL
Model-based reinforcement learning允许模型agent和学到的世界模型进行交互,而不是和真实的环境进行交互;降低了原本需要仿真器的成本(例如使用carla就会很慢)
“Dream to control: Learning behaviors by latent imagination,” in ICLR, 2020.
“Iso-dream: Isolating and leveraging noncontrollable visual dynamics in world models,” in NeurIPS, 2022.
在原始图像数据中学习自动驾驶的世界模型是不合适的,太多重要的小细节,例如交通灯颜色会在预测图像中被弄错;
- MILE:“Modelbased imitation learning for urban driving,” in NeurIPS, 2022.
- SEM2:“Enhance sample efficiency and robustness of end-to-end urban autonomous driving via semantic masked world model,” in NeurIPS Workshops, 2022.
- DeRL:“Deductive reinforcement learning for visual autonomous urban driving navigation,” TNNLS, 2021.
然而,驾驶环境是高度复杂和动态的,依然需要进一步的研究来确认如何建模世界模型;
4.4 Multi-task Learning with Policy Prediction
- 多任务学习通过多个heads联合训练相关任务的性能,共享层中域知识的使用能够提升模型的健壮性
- “Multi-task feature learning,” in NeurIPS, 2006.
- 语义分割任务、深度估计任务、perspective images, 3D object detection (LiDAR encoder) 有助于帮助模型理解环境特征,然后帮助后续的规划过程;
- 大尺度数据集和多模态输入的对齐和注释也是重要的挑战;
4.5 Policy Distillation
- 使用“Teacher-Student” paradigm,先训练一个teacher网络,然后蒸馏到student网络;
- teacher是专家数据,student模型不仅要学习扩展感知特征,还要学习驾驶策略,任务压力会比较大,但是会获得更好的泛化能力;
- 另一些模型在特征层面蒸馏知识,涉及多个蒸馏目标:
- action distribution prediction
- value estimation
- latent features
- TCP:“Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,” in NeurIPS, 2022.
- 蒸馏过程可能造成因果混淆,例如teacher模型可以访问红绿灯的基本状态,但是student模型只能观察到图像个别像素级的变化,这可能会造成因果混淆;
4.6 Interpretability
端到端模型通常被视为一个黑盒,实现可解释是一个挑战;
一些体现可解释性的方式:
- 注意力机制
- 可解释辅助任务
- 损失学习
- freespace:“Safe local motion planning with self-supervised freespace forecasting,” in CVPR, 2021.
- 自然语言
- BDD-X dataset:“Textual explanations for self-driving vehicles,” in ECCV, 2018.
- ADAPT:“Adapt: Action-aware driving caption transformer,” in ICRA, 2023
- 不确定性模型
- aleatoric uncertainty:任务的不确定性
- epistemic uncertainty:数据和模型的不确定性
- “Visualbased autonomous driving deployment from a stochastic and uncertainty-aware perspective,” in IROS, 2019.
- “Probabilistic end-to-end vehicle navigation in complex dynamic environments with multimodal sensor fusion,” RA-L, 2020.
4.7 Causal Confusion
因果混淆问题:多帧图像输入情况下,自车模仿其他车行驶,这是不正确的学习,一些文章使用单帧的输入来避免这种问题
- “Offroad obstacle avoidance through end-to-end learning,” in NeurIPS, 2005.
单帧输入的模仿学习:
“Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,” in NeurIPS, 2022.
“Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,” PAMI, 2022.
“Safetyenhanced autonomous driving using interpretable sensor fusion transformer,” in CoRL, 2022.
-
- 模型不清楚刹车的原因是速度低还是红灯;
一些方案:
- ChauffeurNet使用BEV空间的中间视觉特征 “Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,” in RSS, 2019
- “Fighting copycat agents in behavioral cloning from observation histories,” in NIPS, 2020.
- 增加关键帧的loss权重(关键帧上决策变化的几帧)“Keyframe-focused visual imitation learning,” in ICML, 2021.
- OREO将图像映射到表示语义对象的离散代码:“Object-aware regularization for addressing causal confusion in imitation learning,” in NeurIPS, 2021.
- “Resolving copycat problems in visual imitation learning via residual action prediction,” in ECCV, 2022.
4.8 Robustness
- 关于鲁棒性主要涉及三个子问题:
- 数据的长尾分布
- 数据协方差漂移
- 域适应
4.8.1 Long-tailed Distribution
造成长尾问题的一个原因是数据的不平衡(imbalance)
处理数据不平衡问题的方法:
- over-sampling:
- “Relay backpropagation for effective learning of deep convolutional neural networks,” in ECCV, 2016.
- “A systematic study of the class imbalance problem in convolutional neural networks,” NN, 2018.
- “What is the effect of importance weighting in deep learning?,” in ICML, 2019.
- “Lvis: A dataset for large vocabulary instance segmentation,” in CVPR, 2019.
- “Large-scale object detection in the wild from imbalanced multi-labels,” in CVPR, 2020.
- under-sampling:
- “knn approach to unbalanced data distributions: a case study involving information extraction,” in ICML Workshops, 2003.
- “Exploratory undersampling for class-imbalance learning,” TCYB, 2008.
- “Redundancy-driven modified tomek-link based undersampling: A solution to class imbalance,” Pattern Recognition Letters, 2017.
- data augmentation:
- “Dynamic few-shot visual learning without forgetting,” in CVPR, 2018.
- “mixup: Beyond empirical risk minimization,” in ICLR, 2017.
- “Remix: rebalanced mixup,” in ECCV, 2020.
- weighting-based approaches:
- “Learning deep representation for imbalanced classification,” in CVPR, 2016.
- “Learning to model the tail,” in NeurIPS, 2017.
- “Focal loss for dense object detection,” in ICCV, 2017
- “Classbalanced loss based on effective number of samples,” in CVPR, 2019.
- over-sampling:
自动驾驶数据中多数数据并不有趣,一些工作尝试生成有趣的数据LBC:“Learning by cheating,” in CoRL, 2020.
“Scalable end-to-end autonomous vehicle testing via rare-event simulation,” in NeurIPS, 2018. 提出了一个应用重要抽样策略加速评估稀有事件概率的仿真框架。
Bayesian Optimization生成对抗场景“Generating adversarial driving scenarios in high-fidelity simulators,” in ICRA, 2019.
一种通过可微分运动学模型使用梯度的安全临界扰动优化算法。“King: Generating safety-critical driving scenarios for robust imitation via kinematics gradients,” in ECCV, 2022.
长尾问题的解决需要更好的利用真实世界的数据,并且还需有真实的测试框架(环境)来评估end2end模型;
4.8.2 Covariate Shift
协方差偏移的原因是专家数据的分布和agent的数据分布不一致,导致agent在未见过的场景中测试或者面对和训练时反映不同的其他智能体,最终产生不正确和危险的行为
DAgger (Dataset Aggregation):DAgger是一个迭代的训练过程,在每次迭代中推出当前训练好的策略来收集新的数据,并使用专家来标记访问过的状态。然而,DAgger的一个缺点是需要有专家在线查询
- DAgger “A reduction of imitation learning and structured prediction to no-regret online learning,” in AISTATS, 2011.
- SafeDAgger “Query-efficient imitation learning for end-to-end simulated driving,” in AAAI, 2017.
LBC使用dagger的方法,设置论事大的样本采样频率高“Learning by cheating,” in CoRL, 2020.
4.8.3 Domain Adaptation
域适应是迁移学习的一种技术,旨在将源任务上学到的策略迁移到相似的目标任务上;
这里我们讨论的场景是,源域有标签,而目标域没有标签或标签数量有限的场景
自动驾驶场景中的一些域适应问题:
- Sim-to-real: 仿真到实际部署的差距;the large gap between simulators used for training and the real world used for deployment.
- Geography-to-geography: 不同视角观察到的环境不一样different geographic locations with varying environmental appearances.
- Weather-to-weather: 天气条件导致的数据分布变化changes in sensor inputs caused by weather conditions such as rain, fog, and snow.
- Day-to-night: 日夜光照条件变化的问题illumination variations in the sensor input.
- Sensor-to-sensor: 传感器差异造成的问题possible differences in sensor characteristics, e.g., resolution and relative position.
VISRI将仿真的图像向真实图像映射,RL智能体再从转换后的图像中学习“Virtual to real reinforcement learning for autonomous driving,” in BMVC, 2017.
domain-invariant feature learning域不变性特征学习将两个域的图像映射到一个公共的潜在空间进行训练,“Learning to drive from simulation without real world labels,” in ICRA, 2019.
域随机化:训练阶段随机渲染和物理设置
“A versatile and efficient reinforcement learning framework for autonomous driving,” arXiv.org, vol. 2110.11573, 2021.
“Simulation-based reinforcement learning for real-world autonomous driving,” in ICRA, 2020.
当前自动驾驶sim-to-real的迁移主要通过两种方式:源域数据到目标域数据的映射,另一种是学习具有域不变性的特征;
NeRF技术将真实世界的log数据纳入仿真
- “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020.
- “Block-nerf: Scalable large scene neural view synthesis,” in CVPR, 2022.
5 FUTURE TRENDS
几个未来有潜力的方向:
- Zero-shot and Few-shot Learning
- 自动驾驶始终会遇到超出训练集的corner case,在遇到这些场景时该怎么做需要端到端模型具有一定零样本学习的能力
- Modular End-to-end Planning
- 端到端是行业趋势,具有可解释性,tesla、wayve都在推动
- Data Engine
- 大量的高质量数据始终是最重要的,还需要自动标注pipeline;后续还有场景生成和编辑
- Foundation Model
- 当前大基础模型热点在语言和视觉领域;一个理想中的框架应该是训练一个video预测器预测未来的感知;但其目标需要足够复杂才足以在规划任务中表现良好
- finetuning
- “Flamingo: a visual language model for few-shot learning,” in NeurIPS, 2022.
- “Internimage: Exploring large-scale vision foundation models with deformable convolutions,” in CVPR, 2023
- Vehicle-to-everything (V2X)
- 处理超出感知范围的障碍物和阻塞是一个重要难点;Vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), and vehicle-to-everything (V2X) systems 提供了解决方案,用不同来源的信息补充盲点
总结
本篇综述出自香港大学和上海AI Lab等单位,质量还是比较不错的,视角较广;但这篇与其说是end-2-end的综述,不如说是自动驾驶算法的综述,end2end的方案讲得其实并不多,end-2-end方面还需进一步调研