This work was partly supported by an Amazon Research Award to DJ. The authors would like to thank Karl Schmeckpeper for help with RoboNet, Leon Kim for support with the Franka, and the Perception, Action, and Learning (PAL) research group at UPenn for constructive feedback.