Search Results for author: Kristopher De Asis

Found 6 papers, 0 papers with code

Value-aware Importance Weighting for Off-policy Reinforcement Learning

no code implementations27 Jun 2023 Kristopher De Asis, Eric Graves, Richard S. Sutton

Importance sampling is a central idea underlying off-policy prediction in reinforcement learning.

reinforcement-learning

Incremental Policy Gradients for Online Reinforcement Learning Control

no code implementations1 Jan 2021 Kristopher De Asis, Alan Chan, Yi Wan, Richard S. Sutton

Our emphasis is on the first approach in this work, detailing an incremental policy gradient update which neither waits until the end of the episode, nor relies on learning estimates of the return.

Policy Gradient Methods reinforcement-learning +1

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

no code implementations9 Sep 2019 Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves

We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps.

Q-Learning reinforcement-learning +1

Predicting Periodicity with Temporal Difference Learning

no code implementations20 Sep 2018 Kristopher De Asis, Brendan Bennett, Richard S. Sutton

Temporal difference (TD) learning is an important approach in reinforcement learning, as it combines ideas from dynamic programming and Monte Carlo methods in a way that allows for online and incremental model-free learning.

Decision Making

Per-decision Multi-step Temporal Difference Learning with Control Variates

no code implementations5 Jul 2018 Kristopher De Asis, Richard S. Sutton

Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme.

Multi-step Reinforcement Learning: A Unifying Algorithm

no code implementations3 Mar 2017 Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton

These methods are often studied in the one-step case, but they can be extended across multiple time steps to achieve better performance.

Q-Learning reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.