I'm a CS Ph.D. student at the University of Pennsylvania advised by Dinesh Jayaraman. I find it fun to advance artificial intelligence, ranging from virtual agents to physical robots. Towards this, my research focuses on deep learning, RL, and their applications towards robotics and LLMs.

Click to see expanded research summary.

World Models and Planning: World models unlock new capabilities for agents, such as solving problems with long-term dependencies and sample-efficient policy training. I've developed model-based RL algorithms that can learn sophisticated behaviors that conventional RL agents struggle to reproduce: learning to find objects in the dark, stacking blocks without rewards, and even zero-shot transfer to new robot arms. Beyond robots, I've shown that LLMs are better at planning when trained with world modeling objectives.
Sensory Requirements of Policy Learning: Artificial agents observe the world through some input stream (e.g. vision, language, propioception). The choice of inputs greatly affects both the training dynamics and resulting behavior of the agent. My research investigates this close-knit relationship between sensing and learning, and has yielded interesting implications on the sensory requirements of RL agents, and practical RL/IL algorithms that can operate with minimal sensing.
Learning comprehensive behaviors: For a robot to actually master a task and be useful, it must demonstrate capabilities beyond just task solving. Exploration, resetting, and verification abilities are essential for a robot to adapt to new conditions and perform tasks robustly. We've developed frameworks to teach robots these capabilities efficiently. For example, our robot learns to tug on doors after locking them to make sure they're secure, and reset their own workspaces to continue practicing.

I am also a student researcher at Microsoft AI Frontiers, researching LLM training with John Langford and Alex Lamb. I received my BS/MS in CS at the University of Southern California and worked with Joseph J. Lim on RL.

Publications

Code and reviews for all of my PhD papers are public. Check them out!

World Models Increase Autonomy in Reinforcement Learning

Zhao Yang, Thomas Moerland, Mike Preuss, Aske Plaat, Edward S. Hu,

TMLR 2025

Keywords: Reset-free, World Models, RL

Paper / Website / Reviews

The Belief State Transformer

Edward S. Hu, Kwangjun Ahn, Qinghua Liu, Haoran Xu, Manan Tomar, Ada Langford, Dinesh Jayaraman, Alex Lamb, John Langford

ICLR 2025

Keywords: LLM, Planning, World Models

Paper

The Value of Sensory Information to a Robot

Arjun Krishna, Edward S. Hu, Dinesh Jayaraman

ICLR 2025

Keywords: Perception, RL

Paper

Privileged Sensing Scaffolds Reinforcement Learning

Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman

ICLR 2024 (Spotlight, 5% accept rate, 3rd Highest Rated Paper in ICLR)

Keywords: Privileged Information, World Models, RL

Paper / Website / Code / Reviews

Planning Goals for Exploration

Edward S. Hu, Richard Chang, Oleh Rybkin, Dinesh Jayaraman

ICLR 2023 (Spotlight, 5% accept rate)
CoRL22 Roboadapt Workshop, (Oral, Best Paper Award)

Keywords: Exploration, Goal-conditioned RL, World Models

Paper / Website / Talk / Code / Reviews

image for IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks

Training Robots to Evaluate Robots: Example-Based Interactive Reward Functions for Policy Learning

Kun Huang, Edward S. Hu, Dinesh Jayaraman

CORL 2022 (Oral, 6.5% accept rate, Best Paper Award)

Keywords: Interactive Perception, Task Specification, RL

Paper / Website / Code / Reviews

Transferable Visual Control Policies Through Robot-Awareness

Edward S. Hu, Kun Huang, Oleh Rybkin, Dinesh Jayaraman

ICLR 2022
ICLR Generalizable Policy Learning Workshop, 2022 (Oral)

Keywords: World Models, Robot Transfer, Manipulation

Paper / Website / Code / Reviews

IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks

Youngwoon Lee, Edward S. Hu, Joseph J. Lim

ICRA 2021

Keywords: RL, Manipulation, Benchmark

Paper / Website / Code

To Follow or not to Follow: Selective Imitation Learning from Observations

Youngwoon Lee, Edward S. Hu, Zhengyu Yang, Joseph J. Lim

CORL 2019

Keywords: Learning from Demonstrations, Goal-conditioned RL

Paper / Website / Talk

Composing Complex Skills by Learning Transition Policies

Youngwoon Lee*, Shao-Hua Sun*, Sriram Somasundaram, Edward S. Hu, Joseph J. Lim

ICLR 2019

Keywords: Hierarchical RL

Paper / Website / Code

Mentorship

Current:	Fiona Luo, UPenn BS Muyao Li, UPenn BS Arjun Arasappan, UPenn BS Xingfang Yuan, UPenn MS
Past:	James Springer, UPenn MS -> Anduril Harsh Goel, UPenn MS -> UT Austin PhD Kun Huang, UPenn MS -> Fulltime SWE at Cruise Richard Chang, UPenn BS Lucy Shi, USC undergrad -> Stanford PhD