no code implementations • 5 Mar 2024 • Jacob Beck, Matthew Jackson, Risto Vuorio, Zheng Xiong, Shimon Whiteson
However, it remains unclear whether task inference sequence models are beneficial even when task inference objectives are not.
no code implementations • 9 Feb 2024 • Zheng Xiong, Risto Vuorio, Jacob Beck, Matthieu Zimmer, Kun Shao, Shimon Whiteson
Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies.
1 code implementation • 23 Nov 2023 • Christoph Kern, Stephanie Eckman, Jacob Beck, Rob Chew, Bolei Ma, Frauke Kreuter
We introduce the term annotation sensitivity to refer to the impact of annotation data collection methods on the annotations themselves and on downstream model performance and predictions.
1 code implementation • NeurIPS 2023 • Jacob Beck, Risto Vuorio, Zheng Xiong, Shimon Whiteson
While many specialized meta-RL methods have been proposed, recent work suggests that end-to-end learning in conjunction with an off-the-shelf sequential model, such as a recurrent network, is a surprisingly strong baseline.
1 code implementation • 22 Feb 2023 • Zheng Xiong, Jacob Beck, Shimon Whiteson
Learning a universal policy across different robot morphologies can significantly improve learning efficiency and generalization in continuous control.
no code implementations • 19 Jan 2023 • Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson
Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible.
1 code implementation • 20 Oct 2022 • Jacob Beck, Matthew Thomas Jackson, Risto Vuorio, Shimon Whiteson
In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.
1 code implementation • 22 Sep 2022 • Risto Vuorio, Jacob Beck, Shimon Whiteson, Jakob Foerster, Gregory Farquhar
Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms.
no code implementations • 31 Jan 2022 • Mingfei Sun, Sam Devlin, Jacob Beck, Katja Hofmann, Shimon Whiteson
We present trust region bounds for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are non-stationary.
no code implementations • 1 Dec 2021 • Zheng Xiong, Luisa Zintgraf, Jacob Beck, Risto Vuorio, Shimon Whiteson
We further find that theoretically inconsistent algorithms can be made consistent by continuing to update all agent components on the OOD tasks, and adapt as well or better than originally consistent ones.
no code implementations • ICLR 2020 • Jacob Beck, Kamil Ciosek, Sam Devlin, Sebastian Tschiatschek, Cheng Zhang, Katja Hofmann
In many partially observable scenarios, Reinforcement Learning (RL) agents must rely on long-term memory in order to learn an optimal policy.
no code implementations • 23 Aug 2019 • Matt Cooper, Jun Ki Lee, Jacob Beck, Joshua D. Fishman, Michael Gillett, Zoë Papakipos, Aaron Zhang, Jerome Ramos, Aansh Shah, Michael L. Littman
This idea generalizes the concept of a Stackelberg equilibrium.
no code implementations • ICLR 2019 • Jacob Beck, Zoe Papakipos, Michael Littman
Our framework learns continuous control from sub-optimal demonstration and evaluative feedback collected before training.
no code implementations • 29 Jul 2018 • Jacob Beck, Zoe Papakipos
Like in the brain, we only allow neurons to fire in a time step if they contain enough energy, or excitement.