Search Results for author: Richard S. Sutton

Found 68 papers, 15 papers with code

A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays

no code implementations • 22 Dec 2023 • Huizhen Yu, Yi Wan, Richard S. Sutton

In this paper, we study asynchronous stochastic approximation algorithms without communication delays.

Paper
Add Code

Iterative Option Discovery for Planning, by Planning

no code implementations • 2 Oct 2023 • Kenny Young, Richard S. Sutton

Discovering useful temporal abstractions, in the form of options, is widely thought to be key to applying reinforcement learning and planning to increasingly complex domains.

Paper
Add Code

Value-aware Importance Weighting for Off-policy Reinforcement Learning

no code implementations • 27 Jun 2023 • Kristopher De Asis, Eric Graves, Richard S. Sutton

Importance sampling is a central idea underlying off-policy prediction in reinforcement learning.

reinforcement-learning

Paper
Add Code

Maintaining Plasticity in Deep Continual Learning

1 code implementation • 23 Jun 2023 • Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, A. Rupam Mahmood, Richard S. Sutton

If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples.

Binary Classification Continual Learning +1

Paper
Code

On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly Communicating MDPs

no code implementations • 30 Sep 2022 • Yi Wan, Richard S. Sutton

We show two average-reward off-policy control algorithms, Differential Q-learning (Wan, Naik, & Sutton 2021a) and RVI Q-learning (Abounadi Bertsekas & Borkar 2001), converge in weakly communicating MDPs.

Q-Learning

Paper
Add Code

The Alberta Plan for AI Research

no code implementations • 23 Aug 2022 • Richard S. Sutton, Michael Bowling, Patrick M. Pilarski

Herein we describe our approach to artificial intelligence research, which we call the Alberta Plan.

Paper
Add Code

Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions

no code implementations • 4 Jul 2022 • Tian Tian, Kenny Young, Richard S. Sutton

However, Asynchronous VI still requires a maximization over the entire action space, making it impractical for domains with large action space.

Paper
Add Code

Toward Discovering Options that Achieve Faster Planning

no code implementations • 25 May 2022 • Yi Wan, Richard S. Sutton

In a variant of the classic four-room domain, we show that 1) a higher objective value is typically associated with fewer number of elementary planning operations used by the option-value iteration algorithm to obtain a near-optimal value function, 2) our algorithm achieves an objective value that matches it achieved by two human-designed options 3) the amount of computation used by option-value iteration with options discovered by our algorithm matches it with the human-designed options, 4) the options produced by our algorithm also make intuitive sense--they seem to move to and terminate at the entrances of rooms.

Paper
Add Code

The Quest for a Common Model of the Intelligent Decision Maker

no code implementations • 26 Feb 2022 • Richard S. Sutton

It is time to recognize and build on the convergence of multiple diverse disciplines on a substantive common model of the intelligent agent.

Decision Making

Paper
Add Code

A History of Meta-gradient: Gradient Methods for Meta-learning

no code implementations • 20 Feb 2022 • Richard S. Sutton

The history of meta-learning methods based on gradient descent is reviewed, focusing primarily on methods that adapt step-size (learning rate) meta-parameters.

Meta-Learning

Paper
Add Code

Reward-Respecting Subtasks for Model-Based Reinforcement Learning

no code implementations • 7 Feb 2022 • Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White

Each subtask is solved to produce an option, and then a model of the option is learned and made available to the planning process.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Learning Agent State Online with Recurrent Generate-and-Test

no code implementations • 30 Dec 2021 • Amir Samani, Richard S. Sutton

Learning continually and online from a continuous stream of data is challenging, especially for a reinforcement learning agent with sequential data.

Paper
Add Code

Average-Reward Learning and Planning with Options

no code implementations • NeurIPS 2021 • Yi Wan, Abhishek Naik, Richard S. Sutton

We extend the options framework for temporal abstraction in reinforcement learning from discounted Markov decision processes (MDPs) to average-reward MDPs.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment

no code implementations • 10 Sep 2021 • Sina Ghiassian, Richard S. Sutton

In the Rooms task, the product of importance sampling ratios can be as large as $2^{14}$ and can sometimes be two.

Paper
Add Code

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness

1 code implementation • 13 Aug 2021 • Shibhansh Dohare, Richard S. Sutton, A. Rupam Mahmood

The Backprop algorithm for learning in neural networks utilizes two mechanisms: first, stochastic gradient descent and second, initialization with small random weights, where the latter is essential to the effectiveness of the former.

Continual Learning Reinforcement Learning (RL)

Paper
Code

An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task

2 code implementations • 2 Jun 2021 • Sina Ghiassian, Richard S. Sutton

In the middle tier, the five Gradient-TD algorithms and Off-policy TD($\lambda$) were more sensitive to the bootstrapping parameter.

Paper
Code

Planning with Expectation Models for Control

no code implementations • 17 Apr 2021 • Katya Kudashkina, Yi Wan, Abhishek Naik, Richard S. Sutton

Our algorithms and experiments are the first to treat MBRL with expectation models in a general setting.

Model-based Reinforcement Learning

Paper
Add Code

Does the Adam Optimizer Exacerbate Catastrophic Forgetting?

1 code implementation • 15 Feb 2021 • Dylan R. Ashley, Sina Ghiassian, Richard S. Sutton

Catastrophic forgetting remains a severe hindrance to the broad application of artificial neural networks (ANNs), however, it continues to be a poorly understood phenomenon.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Average-Reward Off-Policy Policy Evaluation with Function Approximation

1 code implementation • 8 Jan 2021 • Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson

We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function.

3,093

Paper
Code

Incremental Policy Gradients for Online Reinforcement Learning Control

no code implementations • 1 Jan 2021 • Kristopher De Asis, Alan Chan, Yi Wan, Richard S. Sutton

Our emphasis is on the first approach in this work, detailing an incremental policy gradient update which neither waits until the end of the episode, nor relies on learning estimates of the return.

Policy Gradient Methods reinforcement-learning +1

Paper
Add Code

Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning

no code implementations • 28 Oct 2020 • Kenny Young, Richard S. Sutton

We demonstrate analytically and experimentally that such pathological behaviours can impact a wide range of RL and dynamic programming algorithms; such behaviours can arise both with and without bootstrapping, and with linear function approximation as well as with more complex parameterized functions like neural networks.

Reinforcement Learning (RL)

Paper
Add Code

Document-editing Assistants and Model-based Reinforcement Learning as a Path to Conversational AI

no code implementations • 27 Aug 2020 • Katya Kudashkina, Patrick M. Pilarski, Richard S. Sutton

In this article we argue for the domain of voice document editing and for the methods of model-based reinforcement learning.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Inverse Policy Evaluation for Value-based Sequential Decision-making

no code implementations • 26 Aug 2020 • Alan Chan, Kris de Asis, Richard S. Sutton

In this work, we explore the use of \textit{inverse policy evaluation}, the process of solving for a likely policy given a value function, for deriving behavior from a value function.

Decision Making Q-Learning

Paper
Add Code

Learning and Planning in Average-Reward Markov Decision Processes

1 code implementation • 29 Jun 2020 • Yi Wan, Abhishek Naik, Richard S. Sutton

We introduce learning and planning algorithms for average-reward MDPs, including 1) the first general proven-convergent off-policy model-free control algorithm without reference states, 2) the first proven-convergent off-policy model-free prediction algorithm, and 3) the first off-policy learning algorithm that converges to the actual value function rather than to the value function plus an offset.

Paper
Code

Learning Sparse Representations Incrementally in Deep Reinforcement Learning

no code implementations • 9 Dec 2019 • J. Fernando Hernandez-Garcia, Richard S. Sutton

Sparse representations have been shown to be useful in deep reinforcement learning for mitigating catastrophic interference and improving the performance of agents in terms of cumulative reward.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Discounted Reinforcement Learning Is Not an Optimization Problem

no code implementations • 4 Oct 2019 • Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton

Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks.

Misconceptions reinforcement-learning +1

Paper
Add Code

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

no code implementations • 9 Sep 2019 • Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves

We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps.

Q-Learning reinforcement-learning +1

Paper
Add Code

Planning with Expectation Models

no code implementations • 2 Apr 2019 • Yi Wan, Zaheer Abbas, Adam White, Martha White, Richard S. Sutton

In particular, we 1) show that planning with an expectation model is equivalent to planning with a distribution model if the state value function is linear in state features, 2) analyze two common parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results, and 4) empirically demonstrate the effectiveness of the proposed planning algorithm.

Model-based Reinforcement Learning

Paper
Add Code

Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning

no code implementations • 8 Mar 2019 • Alex Kearney, Vivek Veeriah, Jaden Travnik, Patrick M. Pilarski, Richard S. Sutton

In this paper, we examine an instance of meta-learning in which feature relevance is learned by adapting step size parameters of stochastic gradient descent---building on a variety of prior work in stochastic approximation, machine learning, and artificial neural networks.

Meta-Learning Representation Learning

Paper
Add Code

Should All Temporal Difference Learning Use Emphasis?

1 code implementation • 1 Mar 2019 • Xiang Gu, Sina Ghiassian, Richard S. Sutton

ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy training but it is different from conventional TD learning even under on-policy training.

Paper
Code

Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target

1 code implementation • 22 Jan 2019 • J. Fernando Hernandez-Garcia, Richard S. Sutton

Our results show that (1) using off-policy correction can have an adverse effect on the performance of Sarsa and $Q(\sigma)$; (2) increasing the backup length $n$ consistently improved performance across all the different algorithms; and (3) the performance of Sarsa and $Q$-learning was more robust to the effect of the target network update frequency than the performance of Tree Backup, $Q(\sigma)$, and Retrace in this particular task.

Q-Learning reinforcement-learning +1

Paper
Code

Online Off-policy Prediction

no code implementations • 6 Nov 2018 • Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White

The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades.

Paper
Add Code

Predicting Periodicity with Temporal Difference Learning

no code implementations • 20 Sep 2018 • Kristopher De Asis, Brendan Bennett, Richard S. Sutton

Temporal difference (TD) learning is an important approach in reinforcement learning, as it combines ideas from dynamic programming and Monte Carlo methods in a way that allows for online and incremental model-free learning.

Decision Making

Paper
Add Code

Per-decision Multi-step Temporal Difference Learning with Control Variates

no code implementations • 5 Jul 2018 • Kristopher De Asis, Richard S. Sutton

Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme.

Paper
Add Code

Integrating Episodic Memory into a Reinforcement Learning Agent using Reservoir Sampling

no code implementations • ICLR 2018 • Kenny J. Young, Richard S. Sutton, Shuo Yang

We suggest one advantage of this particular type of memory is the ability to easily assign credit to a specific state when remembered information is found to be useful.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Two geometric input transformation methods for fast online reinforcement learning with neural nets

no code implementations • 18 May 2018 • Sina Ghiassian, Huizhen Yu, Banafsheh Rafiee, Richard S. Sutton

We apply neural nets with ReLU gates in online reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

no code implementations • 10 Apr 2018 • Alex Kearney, Vivek Veeriah, Jaden B. Travnik, Richard S. Sutton, Patrick M. Pilarski

In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning.

Representation Learning

Paper
Add Code

Reactive Reinforcement Learning in Asynchronous Environments

no code implementations • 16 Feb 2018 • Jaden B. Travnik, Kory W. Mathewson, Richard S. Sutton, Patrick M. Pilarski

The relationship between a reinforcement learning (RL) agent and an asynchronous environment is often ignored.

Decision Making reinforcement-learning +1

Paper
Add Code

Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods

no code implementations • 25 Jan 2018 • Craig Sherstan, Brendan Bennett, Kenny Young, Dylan R. Ashley, Adam White, Martha White, Richard S. Sutton

This paper investigates estimating the variance of a temporal-difference learning agent's update target.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Deeper Look at Experience Replay

4 code implementations • 4 Dec 2017 • Shangtong Zhang, Richard S. Sutton

Recently experience replay is widely used in various deep reinforcement learning (RL) algorithms, in this paper we rethink the utility of experience replay.

Atari Games reinforcement-learning +1

Paper
Code

Communicative Capital for Prosthetic Agents

no code implementations • 10 Nov 2017 • Patrick M. Pilarski, Richard S. Sutton, Kory W. Mathewson, Craig Sherstan, Adam S. R. Parker, Ann L. Edwards

This work presents an overarching perspective on the role that machine intelligence can play in enhancing human abilities, especially those that have been diminished due to injury or illness.

Paper
Add Code

A First Empirical Study of Emphatic Temporal Difference Learning

no code implementations • 11 May 2017 • Sina Ghiassian, Banafsheh Rafiee, Richard S. Sutton

In this paper we present the first empirical study of the emphatic temporal-difference learning algorithm (ETD), comparing it with conventional temporal-difference learning, in particular, with linear TD(0), on on-policy and off-policy variations of the Mountain Car problem.

Paper
Add Code

GQ($λ$) Quick Reference and Implementation Guide

no code implementations • 10 May 2017 • Adam White, Richard S. Sutton

This document should serve as a quick reference for and guide to the implementation of linear GQ($\lambda$), a gradient-based off-policy temporal-difference learning algorithm.

Paper
Add Code

Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space -- Fundamental Theory and Methods

1 code implementation • 9 May 2017 • Jaeyoung Lee, Richard S. Sutton

Policy iteration (PI) is a recursive process of policy evaluation and improvement for solving an optimal decision-making/control problem, or in other words, a reinforcement learning (RL) problem.

Decision Making Q-Learning +1

Paper
Code

On Generalized Bellman Equations and Temporal-Difference Learning

no code implementations • 14 Apr 2017 • Huizhen Yu, A. Rupam Mahmood, Richard S. Sutton

As to its soundness, using Markov chain theory, we prove the ergodicity of the joint state-trace process under nonrestrictive conditions, and we show that associated with our scheme is a generalized Bellman equation (for the policy to be evaluated) that depends on both the evolution of $\lambda$ and the unique invariant probability measure of the state-trace process.

Paper
Add Code

Multi-step Reinforcement Learning: A Unifying Algorithm

no code implementations • 3 Mar 2017 • Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton

These methods are often studied in the one-step case, but they can be extended across multiple time steps to achieve better performance.

Q-Learning reinforcement-learning +1

Paper
Add Code

Multi-step Off-policy Learning Without Importance Sampling Ratios

1 code implementation • 9 Feb 2017 • Ashique Rupam Mahmood, Huizhen Yu, Richard S. Sutton

We show that an explicit use of importance sampling ratios can be eliminated by varying the amount of bootstrapping in TD updates in an action-dependent manner.

Paper
Code

Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

no code implementations • 9 Dec 2016 • Vivek Veeriah, Shangtong Zhang, Richard S. Sutton

In this paper, we introduce a new incremental learning algorithm called crossprop, which learns incoming weights of hidden units based on the meta-gradient descent approach, that was previously introduced by Sutton (1992) and Schraudolph (1999) for learning step-sizes.

Incremental Learning

Paper
Add Code

Face valuing: Training user interfaces with facial expressions and reinforcement learning

no code implementations • 9 Jun 2016 • Vivek Veeriah, Patrick M. Pilarski, Richard S. Sutton

The primary objective of the current work is to demonstrate that a learning agent can reduce the amount of explicit feedback required for adapting to the user's preferences pertaining to a task by learning to perceive a value of its behavior from the human user, particularly from the user's facial expressions---we call this face valuing.

BIG-bench Machine Learning reinforcement-learning +1

Paper
Add Code

True Online Temporal-Difference Learning

1 code implementation • 13 Dec 2015 • Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton

Our results suggest that the true online methods indeed dominate the regular methods.

Atari Games

Paper
Code

Learning to Predict Independent of Span

no code implementations • 19 Aug 2015 • Hado van Hasselt, Richard S. Sutton

If predictions are made at a high rate or span over a large amount of time, substantial computation can be required to store all relevant observations and to update all predictions when the outcome is finally observed.

Paper
Add Code

True Online Emphatic TD($λ$): Quick Reference and Implementation Guide

no code implementations • 25 Jul 2015 • Richard S. Sutton

This document is a guide to the implementation of true online emphatic TD($\lambda$), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and the true-online idea (van Seijen & Sutton 2014).

Paper
Add Code

Emphatic Temporal-Difference Learning

no code implementations • 6 Jul 2015 • A. Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton

Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps.

Paper
Add Code

An Empirical Evaluation of True Online TD(λ)

no code implementations • 1 Jul 2015 • Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton

Our results confirm the strength of true online TD({\lambda}): 1) for sparse feature vectors, the computational overhead with respect to TD({\lambda}) is minimal; for non-sparse features the computation time is at most twice that of TD({\lambda}), 2) across all domains/representations the learning speed of true online TD({\lambda}) is often better, but never worse than that of TD({\lambda}), and 3) true online TD({\lambda}) is easier to use, because it does not require choosing between trace types, and it is generally more stable with respect to the step-size.

Paper
Add Code

Temporal-Difference Networks

no code implementations • NeurIPS 2004 • Richard S. Sutton, Brian Tanner

We introduce a generalization of temporal-difference (TD) learning to networks of interrelated predictions.

World Knowledge

Paper
Add Code

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

no code implementations • 14 Mar 2015 • Richard S. Sutton, A. Rupam Mahmood, Martha White

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps.

Paper
Add Code

Weighted importance sampling for off-policy learning with linear function approximation

no code implementations • NeurIPS 2014 • A. Rupam Mahmood, Hado P. Van Hasselt, Richard S. Sutton

Second, we show that these benefits extend to a new weighted-importance-sampling version of off-policy LSTD(lambda).

Paper
Add Code

Universal Option Models

no code implementations • NeurIPS 2014 • Hengshuai Yao, Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar

We prove that the UOM of an option can construct a traditional option model given a reward function, and the option-conditional return is computed directly by a single dot-product of the UOM with the reward function.

Paper
Add Code

Temporal-Difference Learning to Assist Human Decision Making during the Control of an Artificial Limb

no code implementations • 18 Sep 2013 • Ann L. Edwards, Alexandra Kearney, Michael Rory Dawson, Richard S. Sutton, Patrick M. Pilarski

In the present work, we explore the use of temporal-difference learning and GVFs to predict when users will switch their control influence between the different motor functions of a robot arm.

Decision Making Reinforcement Learning (RL)

Paper
Add Code

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

no code implementations • 13 Jun 2012 • Richard S. Sutton, Csaba Szepesvari, Alborz Geramifard, Michael P. Bowling

Our main results are to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions.

Paper
Add Code

Off-Policy Actor-Critic

1 code implementation • 22 May 2012 • Thomas Degris, Martha White, Richard S. Sutton

Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in off-policy gradient temporal-difference learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Multi-timescale Nexting in a Reinforcement Learning Robot

no code implementations • 6 Dec 2011 • Joseph Modayil, Adam White, Richard S. Sutton

The term "nexting" has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

no code implementations • NeurIPS 2009 • Shalabh Bhatnagar, Doina Precup, David Silver, Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári

We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks.

Q-Learning

Paper
Add Code

Multi-Step Dyna Planning for Policy Evaluation and Control

no code implementations • NeurIPS 2009 • Hengshuai Yao, Shalabh Bhatnagar, Dongcui Diao, Richard S. Sutton, Csaba Szepesvári

We extend Dyna planning architecture for policy evaluation and control in two significant aspects.

Paper
Add Code

A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation

no code implementations • NeurIPS 2008 • Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári

We introduce the first temporal-difference learning algorithm that is stable with linear function approximation and off-policy training, for any finite Markov decision process, target policy, and exciting behavior policy, and whose complexity scales linearly in the number of parameters.

Paper
Add Code

A computational model of hippocampal function in trace conditioning

no code implementations • NeurIPS 2008 • Elliot A. Ludvig, Richard S. Sutton, Eric Verbeek, E. J. Kehoe

For trace conditioning, with no contiguity between stimulus and reward, these long-latency temporal elements are vital to learning adaptively timed responses.

Hippocampus

Paper
Add Code

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

1 code implementation • Artificial Intelligence 1999 • Richard S. Sutton, Doina Precup, Satinder Singh

In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning.

Q-Learning reinforcement-learning

Paper
Code

Learning to Predict by the Methods of Temporal Differences

1 code implementation • Machine Learning 1988 • Richard S. Sutton

This article introduces a class of incremental learning procedures specialized for prediction that is, for using past experience with an incompletely known system to predict its future behavior.

Incremental Learning

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.