no code implementations • 27 Oct 2022 • Ashvin Nair, Brian Zhu, Gokul Narayanan, Eugen Solowjow, Sergey Levine
One of the main observations we make in this work is that, with a suitable representation learning and domain generalization approach, it can be significantly easier for the reward function to generalize to a new but structurally similar task (e. g., inserting a new type of connector) than for the policy.