no code implementations • 5 Mar 2024 • Jacob Beck, Matthew Jackson, Risto Vuorio, Zheng Xiong, Shimon Whiteson
However, it remains unclear whether task inference sequence models are beneficial even when task inference objectives are not.
no code implementations • 9 Feb 2024 • Zheng Xiong, Risto Vuorio, Jacob Beck, Matthieu Zimmer, Kun Shao, Shimon Whiteson
Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies.
1 code implementation • NeurIPS 2023 • Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster
Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks.
1 code implementation • NeurIPS 2023 • Jacob Beck, Risto Vuorio, Zheng Xiong, Shimon Whiteson
While many specialized meta-RL methods have been proposed, recent work suggests that end-to-end learning in conjunction with an off-the-shelf sequential model, such as a recurrent network, is a surprisingly strong baseline.
no code implementations • 19 Jan 2023 • Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson
Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible.
no code implementations • 4 Nov 2022 • Risto Vuorio, Johann Brehmer, Hanno Ackermann, Daniel Dijkman, Taco Cohen, Pim de Haan
Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent.
1 code implementation • 20 Oct 2022 • Jacob Beck, Matthew Thomas Jackson, Risto Vuorio, Shimon Whiteson
In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.
1 code implementation • 22 Sep 2022 • Risto Vuorio, Jacob Beck, Shimon Whiteson, Jakob Foerster, Gregory Farquhar
Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms.
no code implementations • 1 Dec 2021 • Zheng Xiong, Luisa Zintgraf, Jacob Beck, Risto Vuorio, Shimon Whiteson
We further find that theoretically inconsistent algorithms can be made consistent by continuing to update all agent components on the OOD tasks, and adapt as well or better than originally consistent ones.
no code implementations • 9 Feb 2021 • Zeyu Zheng, Risto Vuorio, Richard Lewis, Satinder Singh
In this empirical paper, we explore heuristics based on more general pairwise weightings that are functions of the state in which the action was taken, the state at the time of the reward, as well as the time interval between the two.
1 code implementation • NeurIPS 2021 • Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, Satinder Singh
Our main contribution in this work is an empirical finding that random General Value Functions (GVFs), i. e., deep action-conditional predictions -- random both in what feature of observations they predict as well as in the sequence of actions the predictions are conditioned upon -- form good auxiliary tasks for reinforcement learning (RL) problems.
no code implementations • 25 Nov 2019 • John Holler, Risto Vuorio, Zhiwei Qin, Xiaocheng Tang, Yan Jiao, Tiancheng Jin, Satinder Singh, Chenxi Wang, Jieping Ye
Order dispatching and driver repositioning (also known as fleet management) in the face of spatially and temporally varying supply and demand are central to a ride-sharing platform marketplace.
2 code implementations • NeurIPS 2019 • Risto Vuorio, Shao-Hua Sun, Hexiang Hu, Joseph J. Lim
Model-agnostic meta-learners aim to acquire meta-learned parameters from similar tasks to adapt to novel tasks from the same distribution with few gradient updates.
no code implementations • 18 Dec 2018 • Risto Vuorio, Shao-Hua Sun, Hexiang Hu, Joseph J. Lim
One important limitation of such frameworks is that they seek a common initialization shared across the entire task distribution, substantially limiting the diversity of the task distributions that they are able to learn from.
no code implementations • 27 Sep 2018 • Risto Vuorio, Shao-Hua Sun, Hexiang Hu, Joseph J. Lim
In this paper, we augment MAML with the capability to identify tasks sampled from a multimodal task distribution and adapt quickly through gradient updates.
no code implementations • 11 Jun 2018 • Risto Vuorio, Dong-Yeon Cho, Daejoong Kim, Jiwon Kim
This ability is limited in the current deep neural networks by a problem called catastrophic forgetting, where training on new tasks tends to severely degrade performance on previous tasks.