General-purpose robotic systems require powerful representations and abstractions. In deployment, such robots are expected to encounter diverse and complex scenarios. While recent large-scale learned models exhibit remarkable generalization, similarly getting representations that can flexibly generalize to all the unanticipated situations a robot might face remains challenging, especially given the cost of robot data. Thus, it is important to investigate how to best learn generalizable representations, evaluate their effectiveness, and leverage them for downstream robotics tasks.
Ideally, these representations should capture: (1) spatial-dynamic information needed for fine-grained control, (2) semantic information required for common-sense reasoning and scene understanding, and (3) knowledge of conventions needed for smooth human-robot interactions. Additionally, these representations must be robust to the diversity of tasks, scenes, and operators the robot will encounter. In this workshop, we aim to explore the following: What makes a good robot representation, how can we learn them, and how can we most effectively make use of them?
Our speakers and panelists are pioneering robotics and machine learning researchers defining the state of the art on a range of topics, including: end-to-end control, task and motion planning (TAMP), human-robot interaction (HRI), scene understanding / SLAM, and more. We invite the community for submissions in these areas as well as from a wider set of perspectives – for example, submissions addressing how the following fields might guide robotics research: (1) deep representation learning in vision and language; (2) learning representations for field robotics and AI, where data is extremely scarce or noisy; or (3) bias and robustness in neural representations.
We aim to investigate the following topics and research questions:
We also give a non-exhaustive list of keywords:
We are accepting workshop submissions of the following types
We request that submissions are in the RSS format. They should not be anonymized. Additionally, you may submit papers that are under review at other venues or submitted to other workshops.
Paper Submission Deadline | |
Paper Acceptance | |
Camera-ready Version Due | June 16, 2025 - 23:59 AOE |
Workshop | June 25, 2025 |
Session 1 |
|
8:00 AM - 8:15 AM | Opening Remarks |
8:15 AM - 9:30 AM | Invited Talks 1, 2 |
9:30 AM - 10:30 AM | Poster Session A, Coffee Break |
Session 2 |
|
10:30 AM - 11:15 AM | Invited Talks 3, 4 |
11:15 AM - 12:30 PM | Panel |
12:30 PM - 2:00 PM | Lunch Break |
Session 3 |
|
2:00 PM - 2:20 PM | Spotlight Talks |
2:20 PM - 3:00 PM | Invited Talk 5 |
3:00 PM - 4:00 PM | Poster Session B, Coffee Break |
Session 4 |
|
4:00 PM - 4:40 PM | Invited Talk 6 |
4:40 PM - 5:00 PM | Closing Remarks |
We shall link to all the papers very soon!
Poster Session A: 9:30 AM - 10:30 AM |
TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement
Learning Ge Li, Dong Tian, Hongyi Zhou, Xinkai Jiang, Rudolf Lioutikov, Gerhard Neumann |
Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question
Answering Muhammad Fadhil Ginting, Dong-Ki Kim, Xiangyun Meng, Andrzej Marek Reinke, Jai Krishna Bandi, Navid Kayhani, Oriana Peltzer, David Fan, Amirreza Shaban, Sung-Kyun Kim, Mykel Kochenderfer, Ali-akbar Agha-mohammadi, Shayegan Omidshafiei |
Learning Attentive Neural Processes for Planning with Pushing
Actions Atharv Jain, Seiji A Shaw, Nicholas Roy |
Interpretable Human-in-the-Loop In-Context Preference Learning Via Preference
Boundaries Valerie K. Chen, Julie Shah, Andreea Bobu |
Online Latent Factor Representation Learning Alejandro Murillo-González, Lantao Liu |
DexWild: Dexterous Human Interactions for In-the-Wild Robot
Policies Tony Tao, Mohan Kumar Srirama, Jason Jingzhou Liu, Kenneth Shaw, Deepak Pathak |
GRIM: Task-Oriented Grasping with Conditioning on Generative
Examples Shailesh, Alok Raj, Nayan Kumar, Priya Shukla, Andrew Melnik, Michael Beetz, Gora Chand Nandi |
Bi-Manual Joint Camera Calibration and Scene Representation Haozhan Tang, Tianyi Zhang, Matthew Johnson-Roberson, William Zhi |
DisDP: Robust Imitation Learning via Disentangled Diffusion
Policies Pankhuri Vanjani, Paul Mattes, Kevin Daniel Kuryshev, Xiaogang Jia, Vedant Dave, Rudolf Lioutikov |
RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and
Exploration Omar Alama, Avigyan Bhattacharya, Haoyang He, Seungchan Kim, Yuheng Qiu, Wenshan Wang, Cherie Ho, Nikhil Varma Keetha, Sebastian Scherer |
Learning Symbolic World Model Representations for Long-Horizon Robot
Planning Naman Shah, Jayesh Nagpal, Siddharth Srivastava |
WoMAP: World Models For Embodied Open-Vocabulary Object
Localization Tenny Yin, Zhiting Mei, Tao Sun, Lihan Zha, Ola Sho, Emily Zhou, Miyu Yamane, Jeremy Bao, Anirudha Majumdar |
ReWiND: Language-Guided Rewards Teach Robot Policies without New
Demonstrations Jiahui Zhang, Yusen Luo, Abrar Anwar, Sumedh Anand Sontakke, Joseph J Lim, Jesse Thomason, Erdem Biyik, Jesse Zhang |
Importance Weighted Retrieval for Few-Shot Imitation Learning Amber Xie, Rahul Chand, Dorsa Sadigh, Joey Hejna |
Poster Session B: 3:00 PM - 4:00 PM |
Learning Factorized Diffusion Policies for Conditional Action
Diffusion Omkar Patil, Prabin Kumar Rath, Kartikay Milind Pangaonkar, Eric Rosen, Nakul Gopalan |
DREAM: Differentiable Real-to-Sim-to-Real Engine for Learning Robotic
Manipulation Haozhe Lou, Mingtong Zhang, Haoran Geng, Hanyang Zhou, Sicheng He, Zhiyuan Gao, Siheng Zhao, Jiageng Mao, Pieter Abbeel, Jitendra Malik, Daniel Seita, Yue Wang |
Learning Long-Context Diffusion Policies via Past-Token
Prediction Marcel Torne, Andy Tang, Yuejiang Liu, Chelsea Finn |
H3DP: Triply‑Hierarchical Diffusion Policy for Visuomotor
Learning Yiyang Lu, Yufeng Tian, Zhecheng Yuan, Xianbang Wang, Pu Hua, Zhengrong Xue, Huazhe Xu |
Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation
Datasets Kaiyuan Chen, Shuangyu Xie, Zehan Ma, Pannag R Sanketi, Ken Goldberg |
Implicit Contact Representations with Neural Descriptor Fields for Learning Dynamic
Recovery Policies Fan Yang, Sergio Francisco Aguilera Marinovic, Soshi Iba, Rana Soltani Zarrin, Dmitry Berenson |
CL-HCoTNav: Closed-Loop Hierarchical Chain-of-Thought for Zero-Shot Object-Goal
Navigation with Vision-Language Models Yuxin Cai, Haoruo Zhang, Wei-Yun Yau, Chen Lv |
XPG-RL: Reinforcement Learning with Explainable Priority Guidance for
Efficiency-Boosted Mechanical Search Yiting Zhang, Shichen Li, Elena Shrestha |
Point Policy: Unifying Observations and Actions with Key Points for Robot
Manipulation Siddhant Haldar, Lerrel Pinto |
A Steerable Vision-Language-Action Framework for Autonomous
Driving Tian Gao, Catherine Glossop, Kyle Stachowicz, Timothy Gao, Celine Tan, Oier Mees, Yuejiang Liu, Sergey Levine, Dorsa Sadigh, Chelsea Finn |
GraphSeg: Segmented 3D Representations via Graph Edge Addition and
Contraction Haozhan Tang, Tianyi Zhang, Matthew Johnson-Roberson, William Zhi |
Seeing the Bigger Picture: 3D Latent Mapping for Mobile Manipulation Policy
Learning Sunghwan Kim, Woojeh Chung, Yulun Tian, Zhirui Dai, Arth Shukla, Hao Su, Nikolay Atanasov |
SkillWrapper: Autonomously Learning Interpretable Skill Abstractions with Foundation
Models Ziyi Yang, Benned Hedegaard, Ahmed Jaafar, Skye Thompson, Yichen Wei, Everest Yang, Haotian Fu, Shreyas Sundara Raman, Stefanie Tellex, George Konidaris, David Paulius, Naman Shah |
Structured 3D Scene Queries with Graph Databases Aaron Ray, Luca Carlone |
EgoZero: Robot Learning from Smart Glasses Vincent Liu, Ademi Adeniji, Haotian Zhan, Raunaq Bhirangi, Pieter Abbeel, Lerrel Pinto |
Feel the Force: Contact-Driven Learning from Humans Ademi Adeniji, Zhuoran Chen, Vincent Liu, Venkatesh Pattabiraman, Siddhant Haldar, Raunaq Bhirangi, Pieter Abbeel, Lerrel Pinto |
BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation
Learning Hongyi Zhou, Weiran Liao, Xi Huang, Yucheng Tang, Fabian Otto, Xiaogang Jia, Xinkai Jiang, Simon Hilber, Ge Li, Qian Wang, Ömer Erdinç Yağmurlu, Nils Blank, Moritz Reuss, Rudolf Lioutikov |