Imitation Learning | Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation (Stanford University)

低成本、全身的机器人遥操作平台（ $32k）可以完成复杂的任务，比如煎虾并装盘端走

项目链接：https://mobile-aloha.github.io/

1. INTRODUCTION

模仿学习在机器人任务中已经实现了一些效果：
- 移动机器人车道保持（lane-following in mobile robots）
- 简单的拿起放下的操作技能（simple pick-and-place manipulation skills）
  - Rt-1: Robotics transformer for real-world control at scale. In arXiv preprint arXiv:2212.06817, 2022.
  - Open X-Embodiment: Robotic learning datasets and RT-X models. https://arxiv.org/abs/2310.08864, 2023.
- 更精细的操作技巧，如涂抹披萨酱或插入电池
  - Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023
  - Learning fine-grained bimanual manipulation with low-cost hardware. RSS, 2023.
当前的问题：
- 缺乏即插即用的全身遥操作硬件；成本太高（TIAGo $200k）
- 先前复杂任务下的高性能双手移动操作效果欠佳；
使用静态双手操作数据集进行机器人操作训练
- Lucy Xiaoyang Shi, Archit Sharma, Tony Z Zhao, and Chelsea Finn. Waypoint-based imitation learning for robotic manipulation. CoRL, 2023.
- Tony Z Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware. RSS, 2023
尽管数据集中机器人手臂的安装位置有差异，不同数据集训练任务有差异，但本文的验证中，使用这些数据都有积极效果
当前最好的模仿学习算法：
- including ACT [104]
  - Learning fine-grained bimanual manipulation with low-cost hardware. RSS, 2023
- Diffusion Policy [18].
  - Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023.
能够在每项任务仅50次人工演示的情况下完成80%以上的任务，与没有联合训练的情况相比，成功率提高了34%。

2. INTRODUCTION

Mobile Manipulation

当前的方案大多基于model-based control，例如 DARPA Robotics Challenge
- Eric Krotkov, Douglas Hackett, Larry Jackel, Michael Perschbacher, James Pippine, Jesse Strauss, Gill Pratt, and Christopher Orlowski. The darpa robotics challenge finals: Results and perspectives. The DARPA Robotics Challenge Finals: Humanoid Robots To The Rescue, 2018.
一些学习方案：
- 预定义技能基元（ predefined skill primitives ）
  - Fully autonomous real-world reinforcement learning with applications to mobile manipulation. In Conference on Robot Learning, 2021
  - Bohan Wu, Roberto Martin-Martin, and Li FeiFei. M-ember: Tackling long-horizon mobile manipulation via factorized domain transfer. ICRA, 2023
  - Tidybot: Personalized robot assistance with large language models. IROS, 2023
- 分解动作空间的强化学习（reinforcement learning with decomposed action spaces）
  - Multi-skill mobile manipulation for object rearrangement. ICLR, 2023
  - Robot learning of mobile manipulation with reachability behavior priors. IEEE Robotics and Automation Letters, 2022.
  - Combining learning-based locomotion policy with modelbased manipulation for legged mobile manipulators. IEEE Robotics and Automation Letters, 2022.
  - Relmogen: Integrating motion generation in reinforcement learning for mobile manipulation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021
  - Adaptive skill coordination for robotic mobile manipulation. arXiv preprint arXiv:2304.00410, 2023
- 全身控制目标（whole-body control objectives）
  - Deep whole-body control: learning a unified policy for manipulation and locomotion. In Conference on Robot Learning, 2022
  - Causal policy gradient for wholebody mobile manipulation. arXiv preprint arXiv:2305.04866, 2023
  - Harmonic mobile manipulation. arXiv preprint arXiv:2312.06639, 2023
与这些使用动作基元、状态估计、深度图像或bounding box的现有作品不同，模仿学习允许移动操纵器通过直接将原始RGB图像映射到全身动作来进行端到端学习
之前工作中采集数据的方法：
- 使用VR interface获取专家的操作数据
  - Deep imitation learning for humanoid loco-manipulation through human teleoperation. Humanoids, 2023
- 运动学教学
  - Moma-force: Visualforce imitation for real-world mobile manipulation. arXiv preprint arXiv:2308.03624, 2023
- 训练强化学习策略
  - Skill transformer: A monolithic policy for mobile manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
- 使用智能手机接口
  - Error-aware imitation learning from teleoperation data for mobile manipulation. In Conference on Robot Learning, 2022.
- 运动捕捉系统
  - Human to robot whole-body motion transfer. In 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids), 2021
- 直接从人教视频中学习
  - Human-to-robot imitation in the wild. arXiv preprint arXiv:2207.09450, 2022

Imitation Learning for Robotics

提升行为克隆的途径：
- 合并各种不同结构的历史
  - Rt-1: Robotics transformer for real-world control at scale. In arXiv preprint arXiv:2212.06817, 2022.
  - Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, 2022.
  - What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning, 2021.
  - Behavior transformers: Cloning k modes with one stone. ArXiv, abs/2206.11251, 2022.
- 新的训练目标
  - Roboagent: Towards sample efficient robot manipulation with semantic augmentations and action chunking, 2023
  - Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023
  - Implicit behavioral cloning. ArXiv, abs/2109.00137, 2021.
  - The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021
- 正则化
  - Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 3758–3765, 2017.
- 运动基元
- 数据处理
  - Waypoint-based imitation learning for robotic manipulation. CoRL, 2023
few-shot和multi-task 模仿学习的手段：
- language-conditioned imitation learning
  - Rt-1: Robotics transformer for real-world control at scale. In arXiv preprint arXiv:2212.06817, 2022.
  - Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, 2022.
  - Cliport: What and where pathways for robotic manipulation. ArXiv, abs/2109.12098, 2021.
  - Perceiver-actor: A multi-task transformer for robotic manipulation. ArXiv, abs/2209.05451, 2022.
- imitation from play data
- From play to policy: Conditional behavior generation from uncurated robot data. arXiv preprint arXiv:2210.10047, 2022
- Learning latent plans from play. In Conference on robot learning, pages 1113–1132. PMLR, 2020.
- Latent plans for task-agnostic offline reinforcement learning. In Conference on Robot Learning, pages 1838–1849. PMLR, 2023.
- Mimicplay: Longhorizon imitation learning by watching human play. arXiv preprint arXiv:2302.12422, 2023.
- using human videos
  - Footstep planning for the honda asimo humanoid. In ICRA, 2005
  - Model-based inverse reinforcement learning from visual demonstrations. In Conference on Robot Learning, pages 1930–1942. PMLR, 2021.
  - Perceptual values from observation. arXiv preprint arXiv:1905.07861, 2019.
  - R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.
  - Real-world robot learning with masked visual pre-training. CoRL, 2022.
  - Waypoint-based imitation learning for robotic manipulation. CoRL, 2023
- using task-specific structures
  - Coarse-to-fine imitation learning: Robot manipulation from a single demonstration. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 4613– 4619, 2021.
  - Perceiver-actor: A multi-task transformer for robotic manipulation. ArXiv, abs/2209.05451, 2022
  - Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on Robot Learning, 2020.
在从不同但相似类型的机器人收集的真实世界数据集上进行的联合训练在单臂操作和导航方面显示出有希望的结果
- Robocat: A self-improving foundation agent for robotic manipulation. arXiv preprint arXiv:2306.11706, 2023
- Open X-Embodiment: Robotic learning datasets and RT-X models. https://arxiv.org/abs/2310.08864, 2023.
- Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot. In Towards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition@ CoRL2023, 2023
- Octo: An open-source generalist robot policy. https://octo-models.github.io, 2023.
- Polybot: Training one policy across robots while embracing variability. In Conference on Robot Learning, pages 2955– 2974. PMLR, 2023.
- Gnm: A general navigation model to drive any robot. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7226– 7233. IEEE, 2023.
首个使用已有的机器人操作数据集进行协同训练

3. Mobile ALOHA Hardware

考虑的四个关键点：
- 移动性：参考了人的步行速度，大约 1.42m/s；
- 稳定定：便于操作沉重的物品，例如锅和橱柜；
- 全身遥操作：全身所有自由度均可同时被遥操作；
- 不受束缚：板载电源和计算；

4. Co-training with Static ALOHA Data

模仿学习要求先有真实世界的机器人行为数据，但是数据收集过程非常冗长；学到的策略依赖具体的任务情况（光照和其他干扰物），所以使用的策略是不健壮的
之前的单臂机器人操作的工作展示了机器人可以在由不同的机器人收集的数据上联合训练，由此带来了启发
静态的ALOHA数据集 static ALOHA datasets：
- Waypoint-based imitation learning for robotic manipulation. CoRL, 2023.
- Learning fine-grained bimanual manipulation with low-cost hardware. RSS, 2023.
该数据集包含825个不同任务的演示，请注意，静态ALOHA数据都是在双臂固定的黑色桌面上收集的；
这种设置不同于Mobile ALOHA，在Mobile ALOHA中，背景随着移动的底座而变化，两臂平行面向前方放置。在我们的联合训练中，我们没有对静态ALOHA数据的RGB观察或双手操作使用任何特殊的数据处理技术
公式化：定义静态ALOHA数据集为，针对任务的移动ALOHA数据集为
定义机器人双臂的行为为关节角度，基座的行为为线速度和角速度
训练目标是优化任务的操作策略：其中，表示两个腕部相机和一个头顶相机的RGB数据、手臂关节位置；表示模仿的损失函数；
训练中，等概率地采样静态和动态数据集，batch size为16；并用0填充静态数据集中的基座的行为（zero-pad）；从而实现行为的维度对齐；
用Mobile ALOHA dataset 的统计结果对全部数据进行归一化；
一些基础的模仿学习方法：
- ACT
- Learning fine-grained bimanual manipulation with low-cost hardware. RSS, 2023
- Diffusion Policy
  - Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023
- VINN
  - The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021

5. Tasks

设置了8个任务：
- Wipe Wine 擦红酒
- Cook Shrimp 炒虾
- Wash Pan 刷锅
- Use Cabine 使用橱柜
- Take Elevator 坐电梯
- Push Chairs 放椅子

6. Experiments

实验主要向回答两个问题：
- 1. Can Mobile ALOHA acquire complex mobile manipulation skills with co-training and a small amount of mobile manipulation data?
- 1. Can Mobile ALOHA work with different types of imitation learning methods, including ACT [104], Diffusion Policy [18], and retrieval-based VINN [63]?

总结

mobile ALOHA 的效果还是比较惊艳的，尤其是仅仅使用了co-train这么个技巧的情况下；在不同任务上实现了不错的成功率；硬件上选择轮式base是很明智的，直接避开了步行和自平衡的难题，这样才能专注于操作任务学习上；对于算法上，本文并没有什么创新，也仍然面临着很多机器人操作的“老毛病”，例如缺少规则约束、失败的后果严重、刚性交互等等，总觉得“还不是那么智能”，改进道路漫长；