🎉 EECS Visit Day 2026 — March 16, 1–4 PM  ·  Robot Demos at SDH & Cory Hall

Publications

Preprints

Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer

Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer

Haoru Xue*, Tairan He*, Zi Wang*, Qingwei Ben, Wenli Xiao", Zhengyi Luo", Xingye Da, Fernando Castañeda, Guanya Shi, S. Shankar Sastry, Linxi "Jim" Fan, Yuke Zhu (* equal contribution)

ArXiv Preprint ArXiv Preprint, 2025

TL;DR: DoorMan proposes a teacher-student-bootstrap framework for challenging humanoid loco-manipulation tasks such as door opening. Trained as an RGB policy purely in simulation, it is up to 31.7% times faster than human in the real world.

VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation

VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation

Tairan He*, Zi Wang*, Haoru Xue*, Qingwei Ben*, Zhengyi Luo, Wenli Xiao, Ye Yuan, Xingye Da, Fernando Castañeda, S. Shankar Sastry, Changliu Liu, Guanya Shi, Linxi "Jim" Fan, Yuke Zhu (* equal contribution)

ArXiv Preprint ArXiv Preprint, 2025

TL;DR: VIRAL investigates the scaling law of visual sim-to-real. We find the right recipe to enjoy the free lunch of simulation: zero-shot, robust, continuous real-world deployment.

SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control

SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control

Zhengyi Luo*, Ye Yuan*, Tingwu Wang*, Chenran Li*, Sirui Chen, Fernando Castañeda, Zi-Ang Cao, Jiefeng Li, David Minor, Qingwei Ben, Xingye Da, Runyu Ding, Cyrus Hogg, Lina Song, Edy Lim, Eugene Jeong, Tairan He, Haoru Xue, Wenli Xiao, Zi Wang, Simon Yuen, Jan Kautz, Yan Chang, Umar Iqbal, Linxi "Jim" Fan, Yuke Zhu (* equal contribution)

ArXiv Preprint ArXiv Preprint, 2025

TL;DR: SONIC is a general humanoid whole-body motion tracker supporting various control modes.

2026

Self-Improving Vision-Language-Action Models with Data Generation via Residual RL

Self-Improving Vision-Language-Action Models with Data Generation via Residual RL

Wenli Xiao*, Haotian Lin*, Andy Peng, Haoru Xue, Tairan He, Yuqi Xie, Fengyuan Hu", Jimmy Wu, Zhengyi Luo, Linxi "Jim" Fan, Yuke Zhu (* equal contribution)

ICLR ICLR, 2026

TL;DR: (Probe, Learn, Distill) is a plug-and-play recipe for Vision-Language-Action (VLA) post-training. It is model agnostic, supporting both autoregressive and diffusion architectures, and can push success rates to 99%.

DreamControl: Human-Inspired Whole-Body Humanoid Control for Scene Interaction via Guided Diffusion

DreamControl: Human-Inspired Whole-Body Humanoid Control for Scene Interaction via Guided Diffusion

Dvij Kalaria, Sudarshan Harithas, Pushkal Katara, Sangkyung Kwak, Sarthak Bhagat, S. Shankar Sastry, Srinath Sridhar, Sai Vemprala, Ashish Kapoor, Jonathan Huang

ICRA ICRA, 2026

TL;DR: DreamControl leverages a generative model trained on offline human motion data to synthesize diverse human motion trajectories for a wide range of tasks. These trajectories are re-targeted to the humanoid, and used to train task-specific RL policies that can be deployed both in simulation and on some real-world tasks.

HITTER: A HumanoId Table TEnnis Robot via Hierarchical Planning and Learning

HITTER: A HumanoId Table TEnnis Robot via Hierarchical Planning and Learning

Zhi Su, Bike Zhang, Nima Rahmanian, Yuman Gao, Qiayuan Liao, Caitlin Regan, Koushil Sreenath, S. Shankar Sastry

ICRA ICRA, 2026

TL;DR: Model based planner + reference motion tracking enables robust real-world Ping Pong play (continuiously up to 106 hits!)

LEGO: Learning to Grasp Anything by Playing with Random Toys

LEGO: Learning to Grasp Anything by Playing with Random Toys

Dantong Niu*, Yuvan Sharma*, Baifeng Shi*, Rachel Ding, Matteo Gioia, Haoru Xue, Henry Tsai, Konstantinos Kallidromitis, Anirudh Pai, S. Shankar Sastry, Trevor Darrell, Jitendra Malik, Roei Herzig (* equal contribution)

ICLR ICLR, 2026

TL;DR: LEGO improves generalized grasping capability of off-the-shelf VLAs like π0-Fast by learning from grasping toys randomly generated from simple primitives. We use a novel object-centric representation to ensure the visual features stay grounded in the transfer.

2025

LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction

LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction

Haoru Xue*, Xiaoyu Huang*, Dantong Niu*, Qiayuan Liao*, Thomas Kragerud, Jan Tommy Gravdahl, Xue Bin Peng, Guanya Shi, Trevor Darrell, Koushil Sreenath, S. Shankar Sastry (* equal contribution)

ArXiv Preprint ArXiv Preprint, 2025

TL;DR: LeVERB is the first latent whole-body humanoid VLA. We introduce a latent vocabulary as an interface between vision-language and whole-body action to enable expressive task specification and interpolatable execution.

Pre-training Auto-regressive Robotic Models with 4D Representations

Pre-training Auto-regressive Robotic Models with 4D Representations

Dantong Niu*, Yuvan Sharma*, Haoru Xue, Giscard Biamby, Junyi Zhang, Ziteng Ji, Trevor Darrell, Roei Herzig (* equal contribution)

ICML ICML, 2025

TL;DR: ARM4R is an Autoregressive Robotic Model that leverages low-level 4D Representations learned from human video data to yield a robotics model that has stronger spatial and temporal understandings.

LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos

LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos

Daniel Etaat, Dvij Kalaria, Nima Rahmanian, S. Shankar Sastry

CVPR CVPR, 2025

TL;DR: LATTE-MV is a scalable system that processes over 800 hours of YouTube table tennis footage to reconstruct 27 hours of clean gameplay. This dataset is then used to train an anticipatory controller that predicts the player's next shot, enabling the robot to pre-position itself and perform better against high-speed rallies.

Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing

Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing

Haoru Xue*, Chaoyi Pan*, Zeji Yi, Guannan Qu, Guanya Shi (* equal contribution)

ICRA (Best Paper Finalist) ICRA, 2025

TL;DR: DIAL-MPC is the first training-free method achieving real-time whole-body torque control using full-order dynamics.