no code implementations • 7 Feb 2024 • Natasha Butt, Blazej Manczak, Auke Wiggers, Corrado Rainone, David Zhang, Michaël Defferrard, Taco Cohen
Our method iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay.