Search Results for author: Peter Vrancx

Found 17 papers, 3 papers with code

Batch Reinforcement Learning with Hyperparameter Gradients

no code implementations ICML 2020 Byung-Jun Lee, Jongmin Lee, Peter Vrancx, Dongho Kim, Kee-Eung Kim

We consider the batch reinforcement learning problem where the agent needs to learn only from a fixed batch of data, without further interaction with the environment.

Continuous Control reinforcement-learning +1

Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow

2 code implementations26 Mar 2021 John McLeod, Hrvoje Stojic, Vincent Adam, Dongho Kim, Jordi Grau-Moya, Peter Vrancx, Felix Leibfried

This paves the way for new research directions, e. g. investigating uncertainty-aware environment models that are not necessarily neural-network-based, or developing algorithms to solve industrially-motivated benchmarks that share characteristics with real-world problems.

Model-based Reinforcement Learning reinforcement-learning +1

Compatible features for Monotonic Policy Improvement

no code implementations9 Oct 2019 Marcin B. Tomczak, Sergio Valcarcel Macua, Enrique Munoz de Cote, Peter Vrancx

In this work we establish conditions under which the parametric approximation of the critic does not introduce bias to the updates of surrogate objective.

Policy Optimization Through Approximate Importance Sampling

1 code implementation9 Oct 2019 Marcin B. Tomczak, Dongho Kim, Peter Vrancx, Kee-Eung Kim

These proxy objectives allow stable and low variance policy learning, but require small policy updates to ensure that the proxy objective remains an accurate approximation of the target policy value.

Continuous Control

Learning High-level Representations from Demonstrations

no code implementations19 Feb 2018 Garrett Andersen, Peter Vrancx, Haitham Bou-Ammar

A common approach to HL, is to provide the agent with a number of high-level skills that solve small parts of the overall problem.

Montezuma's Revenge Open-Ended Question Answering +1

Learning with Options that Terminate Off-Policy

no code implementations10 Nov 2017 Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe

Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient.

Forecasting day-ahead electricity prices in Europe: the importance of considering market integration

no code implementations1 Aug 2017 Jesus Lago, Fjo De Ridder, Peter Vrancx, Bart De Schutter

Motivated by the increasing integration among electricity markets, in this paper we propose two different methods to incorporate market integration in electricity price forecasting and to improve the predictive performance.

Bayesian Optimization feature selection

Analysing Congestion Problems in Multi-agent Reinforcement Learning

no code implementations28 Feb 2017 Roxana Rădulescu, Peter Vrancx, Ann Nowé

Congestion problems are omnipresent in today's complex networks and represent a challenge in many research domains.

Multi-agent Reinforcement Learning reinforcement-learning +1

Convolutional Neural Networks For Automatic State-Time Feature Extraction in Reinforcement Learning Applied to Residential Load Control

1 code implementation28 Apr 2016 Bert J. Claessens, Peter Vrancx, Frederik Ruelens

Direct load control of a heterogeneous cluster of residential demand flexibility sources is a high-dimensional control problem with partial observability.

An Empirical Comparison of Neural Architectures for Reinforcement Learning in Partially Observable Environments

no code implementations17 Dec 2015 Denis Steckelmacher, Peter Vrancx

This paper explores the performance of fitted neural Q iteration for reinforcement learning in several partially observable environments, using three recurrent neural network architectures: Long Short-Term Memory, Gated Recurrent Unit and MUT1, a recurrent neural architecture evolved from a pool of several thousands candidate architectures.

reinforcement-learning Reinforcement Learning (RL)

Off-Policy Reward Shaping with Ensembles

no code implementations11 Feb 2015 Anna Harutyunyan, Tim Brys, Peter Vrancx, Ann Nowe

While PBRS is proven to always preserve optimal policies, its effect on learning speed is determined by the quality of its potential function, which, in turn, depends on both the underlying heuristic and the scale.

Off-Policy Shaping Ensembles in Reinforcement Learning

no code implementations21 May 2014 Anna Harutyunyan, Tim Brys, Peter Vrancx, Ann Nowe

Recent advances of gradient temporal-difference methods allow to learn off-policy multiple value functions in parallel with- out sacrificing convergence guarantees or computational efficiency.

Computational Efficiency reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.