no code implementations • ICML 2020 • Byung-Jun Lee, Jongmin Lee, Peter Vrancx, Dongho Kim, Kee-Eung Kim
We consider the batch reinforcement learning problem where the agent needs to learn only from a fixed batch of data, without further interaction with the environment.
2 code implementations • 26 Mar 2021 • John McLeod, Hrvoje Stojic, Vincent Adam, Dongho Kim, Jordi Grau-Moya, Peter Vrancx, Felix Leibfried
This paves the way for new research directions, e. g. investigating uncertainty-aware environment models that are not necessarily neural-network-based, or developing algorithms to solve industrially-motivated benchmarks that share characteristics with real-world problems.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 9 Oct 2019 • Marcin B. Tomczak, Sergio Valcarcel Macua, Enrique Munoz de Cote, Peter Vrancx
In this work we establish conditions under which the parametric approximation of the critic does not introduce bias to the updates of surrogate objective.
1 code implementation • 9 Oct 2019 • Marcin B. Tomczak, Dongho Kim, Peter Vrancx, Kee-Eung Kim
These proxy objectives allow stable and low variance policy learning, but require small policy updates to ensure that the proxy objective remains an accurate approximation of the target policy value.
no code implementations • 21 Jun 2019 • Janith C. Petangoda, Sergio Pascual-Diaz, Vincent Adam, Peter Vrancx, Jordi Grau-Moya
We propose a novel framework for multi-task reinforcement learning (MTRL).
Hierarchical Reinforcement Learning reinforcement-learning +2
no code implementations • ICLR 2019 • Jordi Grau-Moya, Felix Leibfried, Peter Vrancx
We show that the prior optimization introduces a mutual-information regularizer in the RL objective.
no code implementations • 6 Sep 2018 • Felix Leibfried, Peter Vrancx
This paper proposes a new optimization objective for value-based deep reinforcement learning.
no code implementations • 19 Feb 2018 • Garrett Andersen, Peter Vrancx, Haitham Bou-Ammar
A common approach to HL, is to provide the agent with a number of high-level skills that solve small parts of the overall problem.
no code implementations • 10 Nov 2017 • Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe
Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient.
no code implementations • 22 Aug 2017 • Denis Steckelmacher, Diederik M. Roijers, Anna Harutyunyan, Peter Vrancx, Hélène Plisnier, Ann Nowé
Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability.
no code implementations • 1 Aug 2017 • Jesus Lago, Fjo De Ridder, Peter Vrancx, Bart De Schutter
Motivated by the increasing integration among electricity markets, in this paper we propose two different methods to incorporate market integration in electricity price forecasting and to improve the predictive performance.
no code implementations • 26 Jul 2017 • Frederik Ruelens, Bert J. Claessens, Peter Vrancx, Fred Spiessens, Geert Deconinck
This paper considers a demand response agent that must find a near-optimal sequence of decisions based on sparse observations of its environment.
no code implementations • 28 Feb 2017 • Roxana Rădulescu, Peter Vrancx, Ann Nowé
Congestion problems are omnipresent in today's complex networks and represent a challenge in many research domains.
Multi-agent Reinforcement Learning reinforcement-learning +1
1 code implementation • 28 Apr 2016 • Bert J. Claessens, Peter Vrancx, Frederik Ruelens
Direct load control of a heterogeneous cluster of residential demand flexibility sources is a high-dimensional control problem with partial observability.
no code implementations • 17 Dec 2015 • Denis Steckelmacher, Peter Vrancx
This paper explores the performance of fitted neural Q iteration for reinforcement learning in several partially observable environments, using three recurrent neural network architectures: Long Short-Term Memory, Gated Recurrent Unit and MUT1, a recurrent neural architecture evolved from a pool of several thousands candidate architectures.
no code implementations • 11 Feb 2015 • Anna Harutyunyan, Tim Brys, Peter Vrancx, Ann Nowe
While PBRS is proven to always preserve optimal policies, its effect on learning speed is determined by the quality of its potential function, which, in turn, depends on both the underlying heuristic and the scale.
no code implementations • 21 May 2014 • Anna Harutyunyan, Tim Brys, Peter Vrancx, Ann Nowe
Recent advances of gradient temporal-difference methods allow to learn off-policy multiple value functions in parallel with- out sacrificing convergence guarantees or computational efficiency.