no code implementations • 26 Mar 2024 • Philip Lippmann, Matthijs T. J. Spaan, Jie Yang
Natural Language Processing (NLP) models optimized for predictive performance often make high confidence errors and suffer from vulnerability to adversarial and out-of-distribution data.
no code implementations • 19 Feb 2024 • Davide Mambelli, Stephan Bongers, Onno Zoeter, Matthijs T. J. Spaan, Frans A. Oliehoek
A well-established off-policy objective is the excursion objective.
no code implementations • 26 Jul 2023 • Qisong Yang, Thiago D. Simão, Nils Jansen, Simon H. Tindemans, Matthijs T. J. Spaan
Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses.
no code implementations • 12 Jun 2023 • Moritz A. Zanger, Wendelin Böhmer, Matthijs T. J. Spaan
In contrast to classical reinforcement learning, distributional reinforcement learning algorithms aim to learn the distribution of returns rather than their expected value.
no code implementations • 9 Jun 2023 • Max Weltevrede, Matthijs T. J. Spaan, Wendelin Böhmer
We motivate mathematically and show empirically that generalisation to tasks that are "reachable'' during training is improved by increasing the diversity of transitions in the replay buffer.
no code implementations • 4 Jun 2023 • Miguel Suau, Matthijs T. J. Spaan, Frans A. Oliehoek
In this paper, we provide a mathematical characterization of this phenomenon, which we refer to as policy confounding, and show, through a series of examples, when and how it occurs in practice.
no code implementations • 21 Oct 2022 • Yaniv Oren, Matthijs T. J. Spaan, Wendelin Böhmer
One of the most well-studied and highly performing planning approaches used in Model-Based Reinforcement Learning (MBRL) is Monte-Carlo Tree Search (MCTS).
Model-based Reinforcement Learning reinforcement-learning +1
1 code implementation • 1 Jul 2022 • Miguel Suau, Jinke He, Mustafa Mert Çelikok, Matthijs T. J. Spaan, Frans A. Oliehoek
Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning.
no code implementations • 6 Jun 2022 • Sebastian Junges, Matthijs T. J. Spaan
The key ideas to accelerate analysis of such programs are (1) to treat the behavior of the subroutine as uncertain and only remove this uncertainty by a detailed analysis if needed, and (2) to abstract similar subroutines into a parametric template, and then analyse this template.
no code implementations • 3 Feb 2022 • Miguel Suau, Jinke He, Matthijs T. J. Spaan, Frans A. Oliehoek
Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL).
no code implementations • 21 Sep 2020 • Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, Matthijs T. J. Spaan
Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function.
no code implementations • 29 Nov 2015 • Joris Scharpff, Diederik M. Roijers, Frans A. Oliehoek, Matthijs T. J. Spaan, Mathijs M. de Weerdt
In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value.
no code implementations • 18 Feb 2015 • Frans A. Oliehoek, Matthijs T. J. Spaan, Stefan Witwicki
Recent years have seen the development of methods for multiagent planning under uncertainty that scale to tens or even hundreds of agents.
no code implementations • 4 Feb 2014 • Frans Adriaan Oliehoek, Matthijs T. J. Spaan, Christopher Amato, Shimon Whiteson
We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent.
no code implementations • 1 Aug 2011 • Frans A. Oliehoek, Shimon Whiteson, Matthijs T. J. Spaan
Such problems can be modeled as collaborative Bayesian games in which each agent receives private information in the form of its type.