Workshop on Learned Robot Representations

The workshop is now over!

Thank you to all of our speakers, panelists, presenters, and attendees for allowing us to run such a wonderful and productive workshop. We hope that the discussions fostered by it gave valuable insights into robot representation learning!

Overview

General-purpose robotic systems require powerful representations and abstractions. In deployment, such robots are expected to encounter diverse and complex scenarios. While recent large-scale learned models exhibit remarkable generalization, similarly getting representations that can flexibly generalize to all the unanticipated situations a robot might face remains challenging, especially given the cost of robot data. Thus, it is important to investigate how to best learn generalizable representations, evaluate their effectiveness, and leverage them for downstream robotics tasks.

Ideally, these representations should capture: (1) spatial-dynamic information needed for fine-grained control, (2) semantic information required for common-sense reasoning and scene understanding, and (3) knowledge of conventions needed for smooth human-robot interactions. Additionally, these representations must be robust to the diversity of tasks, scenes, and operators the robot will encounter. In this workshop, we aim to explore the following: What makes a good robot representation, how can we learn them, and how can we most effectively make use of them?

Our speakers and panelists are pioneering robotics and machine learning researchers defining the state of the art on a range of topics, including: end-to-end control, task and motion planning (TAMP), human-robot interaction (HRI), scene understanding / SLAM, and more. We invite the community for submissions in these areas as well as from a wider set of perspectives – for example, submissions addressing how the following fields might guide robotics research: (1) deep representation learning in vision and language; (2) learning representations for field robotics and AI, where data is extremely scarce or noisy; or (3) bias and robustness in neural representations.

Areas of Interest

We aim to investigate the following topics and research questions:

What sorts of pre-training tasks, objectives, data, or models yield good representations for robotics?
- How do we evaluate or choose good pre-trained models to fine-tune into robot policies or leverage in a perception or control stack?
- Is “embodied” or “grounded” data necessary for good robot representations, or is general pre-training data sufficient?
- When is fine-tuning large models from other domains enough for robotic foundation models and when do we need robot-specific large-scale training?
- How do we elicit representations at a level of granularity and semantic abstraction that is appropriate for robot tasks?
What functionalities do pre-trained representations grant robots? E.g.:
- Common-sense semantic understanding (“what” to look for)
- Reasoning about semantics, dynamics, spatial features, or motion (“how” to act)
- Conventions for interacting with humans (dialogue, goal specification, information gathering, interpretability)
How are these pre-trained representations best integrated into robot systems? E.g.:
- End-to-end learned policies learned with BC or RL (e.g., RT-2, OpenVLA)
- Zero-shot controllers (e.g., Code as Policies, MOKA)
- Hand-crafted scene or task representations (e.g., semantic scene graphs, LeRFs, neural TAMP, image generation as subgoals)
How can learned representations be de-biased such that the robots using them can:
- Robustly operate in underrepresented environments (e.g., households from cultures around the world, not just “conventional” households)
- Understand language and behaviors from diverse human operators

We also give a non-exhaustive list of keywords:

Foundation models for (zero-shot) planning, control, reasoning, scene understanding/SLAM, and HRI
Representation learning for policy learning: vision-language-action models (VLAs), reinforcement learning, cross-embodiment transfer
Representation learning for perception and scene understanding: NeRF/LeRF, Gaussian splatting, inverse graphics/reconstruction, metric-semantic scene graphs
Representation learning for planning: neural TAMP, learned dynamics/world modeling, chain-of-thought reasoning, generated image subgoals
Data for embodied, grounded, or spatial reasoning

Submission Guidelines

Submission Portal (NOW CLOSED): OpenReview

We are accepting workshop submissions of the following types

Papers - up to 6 pages plus unlimited references / appendices
Extended Abstracts - up to 2 pages plus unlimited references / appendices

We request that submissions are in the RSS format. They should not be anonymized. Additionally, you may submit papers that are under review at other venues or submitted to other workshops.

Best Paper: Learning Long-Context Diffusion Policies via Past-Token Prediction

Important Dates

Paper Submission Deadline	~~May 28, 2025 - 23:59 AOE~~
Paper Acceptance	~~June 11, 2025~~
Camera-ready Version Due	~~June 16, 2025 - 23:59 AOE~~
Workshop	~~June 25, 2025~~

Schedule

Session 1
8:40 AM - 8:50 AM	Opening Remarks
8:50 AM - 9:30 AM	Invited Talk 1: Wolfram Burgard (Virtual)
9:30 AM - 10:20 AM	Poster Session A, Coffee Break
Session 2
10:20 AM - 11:55 AM	Invited Talks 2, 3, 4: Liam Paull, Chelsea Finn, Mahi Shafiullah
11:55 AM - 12:30 PM	Panel
12:30 PM - 2:00 PM	Lunch Break
Session 3
2:00 PM - 2:20 PM	Spotlight Talks
2:20 PM - 3:00 PM	Invited Talk 5: Krishna Murthy
3:00 PM - 4:00 PM	Poster Session B, Coffee Break
Session 4
4:00 PM - 4:40 PM	Invited Talk 6: Andreea Bobu
4:40 PM - 5:00 PM	Closing Remarks

Papers and Poster Session Assignments

Asterisk after paper title indicates spotlight talk.

Poster Session A: 9:30 AM - 10:20 AM

TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning
Ge Li, Dong Tian, Hongyi Zhou, Xinkai Jiang, Rudolf Lioutikov, Gerhard Neumann

Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering
Muhammad Fadhil Ginting, Dong-Ki Kim, Xiangyun Meng, Andrzej Marek Reinke, Jai Krishna Bandi, Navid Kayhani, Oriana Peltzer, David Fan, Amirreza Shaban, Sung-Kyun Kim, Mykel Kochenderfer, Ali-akbar Agha-mohammadi, Shayegan Omidshafiei

Learning Attentive Neural Processes for Planning with Pushing Actions
Atharv Jain, Seiji A Shaw, Nicholas Roy

Interpretable Human-in-the-Loop In-Context Preference Learning Via Preference Boundaries
Valerie K. Chen, Julie Shah, Andreea Bobu

Online Latent Factor Representation Learning
Alejandro Murillo-González, Lantao Liu

DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies
Tony Tao, Mohan Kumar Srirama, Jason Jingzhou Liu, Kenneth Shaw, Deepak Pathak

GRIM: Task-Oriented Grasping with Conditioning on Generative Examples
Shailesh, Alok Raj, Nayan Kumar, Priya Shukla, Andrew Melnik, Michael Beetz, Gora Chand Nandi

Bi-Manual Joint Camera Calibration and Scene Representation
Haozhan Tang, Tianyi Zhang, Matthew Johnson-Roberson, William Zhi

DisDP: Robust Imitation Learning via Disentangled Diffusion Policies
Pankhuri Vanjani, Paul Mattes, Kevin Daniel Kuryshev, Xiaogang Jia, Vedant Dave, Rudolf Lioutikov

RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration
Omar Alama, Avigyan Bhattacharya, Haoyang He, Seungchan Kim, Yuheng Qiu, Wenshan Wang, Cherie Ho, Nikhil Varma Keetha, Sebastian Scherer

Learning Factorized Diffusion Policies for Conditional Action Diffusion
Omkar Patil, Prabin Kumar Rath, Kartikay Milind Pangaonkar, Eric Rosen, Nakul Gopalan

Learning Symbolic World Model Representations for Long-Horizon Robot Planning
Naman Shah, Jayesh Nagpal, Siddharth Srivastava

WoMAP: World Models For Embodied Open-Vocabulary Object Localization*
Tenny Yin, Zhiting Mei, Tao Sun, Lihan Zha, Emily Zhou, Jeremy Bao, Miyu Yamane, Ola Sho, Anirudha Majumdar

ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations*
Jiahui Zhang, Yusen Luo, Abrar Anwar, Sumedh Anand Sontakke, Joseph J Lim, Jesse Thomason, Erdem Biyik, Jesse Zhang

Poster Session B: 3:00 PM - 4:00 PM

DREAM: Differentiable Real-to-Sim-to-Real Engine for Learning Robotic Manipulation
Haozhe Lou, Mingtong Zhang, Haoran Geng, Hanyang Zhou, Sicheng He, Zhiyuan Gao, Siheng Zhao, Jiageng Mao, Pieter Abbeel, Jitendra Malik, Daniel Seita, Yue Wang

Learning Long-Context Diffusion Policies via Past-Token Prediction*
Marcel Torne, Andy Tang, Yuejiang Liu, Chelsea Finn

H³DP: Triply‑Hierarchical Diffusion Policy for Visuomotor Learning
Yiyang Lu, Yufeng Tian, Zhecheng Yuan, Xianbang Wang, Pu Hua, Zhengrong Xue, Huazhe Xu

Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets
Kaiyuan Chen, Shuangyu Xie, Zehan Ma, Pannag R Sanketi, Ken Goldberg

Implicit Contact Representations with Neural Descriptor Fields for Learning Dynamic Recovery Policies
Fan Yang, Sergio Francisco Aguilera Marinovic, Soshi Iba, Rana Soltani Zarrin, Dmitry Berenson

CL-HCoTNav: Closed-Loop Hierarchical Chain-of-Thought for Zero-Shot Object-Goal Navigation with Vision-Language Models
Yuxin Cai, Haoruo Zhang, Wei-Yun Yau, Chen Lv

XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search
Yiting Zhang, Shichen Li, Elena Shrestha

Importance Weighted Retrieval for Few-Shot Imitation Learning
Amber Xie, Rahul Chand, Dorsa Sadigh, Joey Hejna

Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation
Siddhant Haldar, Lerrel Pinto

A Steerable Vision-Language-Action Framework for Autonomous Driving
Tian Gao, Catherine Glossop, Kyle Stachowicz, Timothy Gao, Celine Tan, Oier Mees, Yuejiang Liu, Sergey Levine, Dorsa Sadigh, Chelsea Finn

GraphSeg: Segmented 3D Representations via Graph Edge Addition and Contraction
Haozhan Tang, Tianyi Zhang, Oliver Kroemer, Matthew Johnson-Roberson, William Zhi

Seeing the Bigger Picture: 3D Latent Mapping for Mobile Manipulation Policy Learning*
Sunghwan Kim, Woojeh Chung, Yulun Tian, Zhirui Dai, Arth Shukla, Hao Su, Nikolay Atanasov

SkillWrapper: Autonomously Learning Interpretable Skill Abstractions with Foundation Models
Ziyi Yang, Benned Hedegaard, Ahmed Jaafar, Skye Thompson, Yichen Wei, Everest Yang, Haotian Fu, Shreyas Sundara Raman, Stefanie Tellex, George Konidaris, David Paulius, Naman Shah

Structured 3D Scene Queries with Graph Databases
Aaron Ray, Luca Carlone

EgoZero: Robot Learning from Smart Glasses*
Vincent Liu, Ademi Adeniji, Haotian Zhan, Siddhant Haldar, Raunaq Bhirangi, Pieter Abbeel, Lerrel Pinto

Feel the Force: Contact-Driven Learning from Humans
Ademi Adeniji, Zhuoran Chen, Vincent Liu, Venkatesh Pattabiraman, Siddhant Haldar, Raunaq Bhirangi, Pieter Abbeel, Lerrel Pinto

BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
Hongyi Zhou, Weiran Liao, Xi Huang, Yucheng Tang, Fabian Otto, Xiaogang Jia, Xinkai Jiang, Simon Hilber, Ge Li, Qian Wang, Ömer Erdinç Yağmurlu, Nils Blank, Moritz Reuss, Rudolf Lioutikov

Learned Robot Representations (RoboReps)

Email: rss25.roboreps@gmail.com

The workshop is now over!

Overview

Areas of Interest

Submission Guidelines

Submission Portal (NOW CLOSED): OpenReview

Best Paper: Learning Long-Context Diffusion Policies via Past-Token Prediction

Important Dates

Schedule

Session 1

Session 2

Session 3

Session 4

Papers and Poster Session Assignments

Poster Session A: 9:30 AM - 10:20 AM

Poster Session B: 3:00 PM - 4:00 PM

Invited Speakers

Mahi Shafiullah

Andreea Bobu

Chelsea Finn

Wolfram Burgard

Liam Paull

Krishna Murthy Jatavallabhula

Organizing Committees

William Chen

Dominic Maggio

Mara Levy

Dhruv Shah

Jared Strader

Kuan Fang