no code implementations • 27 Jun 2023 • Kristopher De Asis, Eric Graves, Richard S. Sutton
Importance sampling is a central idea underlying off-policy prediction in reinforcement learning.
no code implementations • 1 Jan 2021 • Kristopher De Asis, Alan Chan, Yi Wan, Richard S. Sutton
Our emphasis is on the first approach in this work, detailing an incremental policy gradient update which neither waits until the end of the episode, nor relies on learning estimates of the return.
no code implementations • 9 Sep 2019 • Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves
We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps.
no code implementations • 20 Sep 2018 • Kristopher De Asis, Brendan Bennett, Richard S. Sutton
Temporal difference (TD) learning is an important approach in reinforcement learning, as it combines ideas from dynamic programming and Monte Carlo methods in a way that allows for online and incremental model-free learning.
no code implementations • 5 Jul 2018 • Kristopher De Asis, Richard S. Sutton
Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme.
no code implementations • 3 Mar 2017 • Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton
These methods are often studied in the one-step case, but they can be extended across multiple time steps to achieve better performance.