I'm a CS Ph.D. student at the University of Pennsylvania advised by Dinesh Jayaraman. I find it fun to advance artificial intelligence, ranging from virtual agents to physical robots. Towards this, my research focuses on deep learning, RL, and their applications towards robotics and LLMs.

Click to see expanded research summary.
  • World Models and Planning: World models unlock new capabilities for agents, such as solving problems with long-term dependencies and sample-efficient policy training. I've developed model-based RL algorithms that can learn sophisticated behaviors that conventional RL agents struggle to reproduce: learning to find objects in the dark, stacking blocks without rewards, and even zero-shot transfer to new robot arms. Beyond robots, I've shown that LLMs are better at planning when trained with world modeling objectives.
  • Sensory Requirements of Policy Learning: Artificial agents observe the world through some input stream (e.g. vision, language, propioception). The choice of inputs greatly affects both the training dynamics and resulting behavior of the agent. My research investigates this close-knit relationship between sensing and learning, and has yielded interesting implications on the sensory requirements of RL agents, and practical RL/IL algorithms that can operate with minimal sensing.
  • Learning comprehensive behaviors: For a robot to actually master a task and be useful, it must demonstrate capabilities beyond just task solving. Exploration, resetting, and verification abilities are essential for a robot to adapt to new conditions and perform tasks robustly. We've developed frameworks to teach robots these capabilities efficiently. For example, our robot learns to tug on doors after locking them to make sure they're secure, and reset their own workspaces to continue practicing.

I am also a student researcher at Microsoft AI Frontiers, researching LLM training with John Langford and Alex Lamb. I received my BS/MS in CS at the University of Southern California and worked with Joseph J. Lim on RL.


Publications

Code and reviews for all of my PhD papers are public. Check them out!

World Models Increase Autonomy in Reinforcement Learning
Zhao Yang, Thomas Moerland, Mike Preuss, Aske Plaat, Edward S. Hu,

TMLR 2025

Keywords: Reset-free, World Models, RL

The Belief State Transformer

ICLR 2025

Keywords: LLM, Planning, World Models

The Value of Sensory Information to a Robot

ICLR 2025

Keywords: Perception, RL

Privileged Sensing Scaffolds Reinforcement Learning
Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman

ICLR 2024 (Spotlight, 5% accept rate, 3rd Highest Rated Paper in ICLR)

Keywords: Privileged Information, World Models, RL

Planning Goals for Exploration
Edward S. Hu, Richard Chang, Oleh Rybkin, Dinesh Jayaraman

ICLR 2023 (Spotlight, 5% accept rate)
CoRL22 Roboadapt Workshop, (Oral, Best Paper Award)

Keywords: Exploration, Goal-conditioned RL, World Models

image for IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks
Training Robots to Evaluate Robots: Example-Based Interactive Reward Functions for Policy Learning
Kun Huang, Edward S. Hu, Dinesh Jayaraman

CORL 2022 (Oral, 6.5% accept rate, Best Paper Award)

Keywords: Interactive Perception, Task Specification, RL

image for IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks
Transferable Visual Control Policies Through Robot-Awareness
Edward S. Hu, Kun Huang, Oleh Rybkin, Dinesh Jayaraman

ICLR 2022
ICLR Generalizable Policy Learning Workshop, 2022 (Oral)

Keywords: World Models, Robot Transfer, Manipulation

image for IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks
IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks

ICRA 2021

Keywords: RL, Manipulation, Benchmark

image for To Follow or not to Follow: Selective Imitation Learning from Observations
To Follow or not to Follow: Selective Imitation Learning from Observations

CORL 2019

Keywords: Learning from Demonstrations, Goal-conditioned RL

Composing Complex Skills by Learning Transition Policies

ICLR 2019

Keywords: Hierarchical RL

Mentorship

Current:
  • Fiona Luo, UPenn BS
  • Muyao Li, UPenn BS
  • Arjun Arasappan, UPenn BS
  • Xingfang Yuan, UPenn MS
Past:
  • James Springer, UPenn MS -> Anduril
  • Harsh Goel, UPenn MS -> UT Austin PhD
  • Kun Huang, UPenn MS -> Fulltime SWE at Cruise
  • Richard Chang, UPenn BS
  • Lucy Shi, USC undergrad -> Stanford PhD