Search Results for author: Shangtong Zhang

Found 26 papers, 18 papers with code

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

no code implementations15 Jan 2024 Shuze Liu, Shuhang Chen, Shangtong Zhang

Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e. g., stochastic gradient descent and temporal difference learning.

reinforcement-learning

Improving Monte Carlo Evaluation with Offline Data

no code implementations31 Jan 2023 Shuze Liu, Shangtong Zhang

Most reinforcement learning practitioners evaluate their policies with online Monte Carlo estimators for either hyperparameter tuning or testing different algorithmic design choices, where the policy is repeatedly executed in the environment to get the average outcome.

Management

On the Convergence of SARSA with Linear Function Approximation

no code implementations14 Feb 2022 Shangtong Zhang, Remi Tachet, Romain Laroche

SARSA, a classical on-policy control algorithm for reinforcement learning, is known to chatter when combined with linear function approximation: SARSA does not diverge but oscillates in a bounded region.

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

1 code implementation NeurIPS 2023 Shangtong Zhang, Remi Tachet, Romain Laroche

In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy.

Policy Gradient Methods

Truncated Emphatic Temporal Difference Methods for Prediction and Control

1 code implementation11 Aug 2021 Shangtong Zhang, Shimon Whiteson

Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of off-policy RL, there are still two open problems.

Reinforcement Learning (RL)

Learning Expected Emphatic Traces for Deep RL

no code implementations12 Jul 2021 Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt

We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting.

Breaking the Deadly Triad with a Target Network

1 code implementation21 Jan 2021 Shangtong Zhang, Hengshuai Yao, Shimon Whiteson

The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.

Q-Learning

Average-Reward Off-Policy Policy Evaluation with Function Approximation

1 code implementation8 Jan 2021 Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson

We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function.

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

1 code implementation2 Oct 2020 Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes

In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

Representation Learning

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

1 code implementation22 Apr 2020 Shangtong Zhang, Bo Liu, Shimon Whiteson

We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable.

reinforcement-learning Reinforcement Learning (RL)

GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values

1 code implementation ICML 2020 Shangtong Zhang, Bo Liu, Shimon Whiteson

Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced to ensure positivity, so any primal-dual algorithm is not guaranteed to converge or find the desired solution.

Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation

1 code implementation ICML 2020 Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson

With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.

Vocal Bursts Valence Prediction

Distributional Reinforcement Learning for Efficient Exploration

no code implementations13 May 2019 Borislav Mavrin, Shangtong Zhang, Hengshuai Yao, Linglong Kong, Kaiwen Wu, Yao-Liang Yu

In distributional reinforcement learning (RL), the estimated distribution of value function models both the parametric and intrinsic uncertainties.

Atari Games Distributional Reinforcement Learning +3

Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards

1 code implementation12 May 2019 Yuhang Song, Jianyi Wang, Thomas Lukasiewicz, Zhenghua Xu, Shangtong Zhang, Andrzej Wojcicki, Mai Xu

Intrinsic rewards were introduced to simulate how human intelligence works; they are usually evaluated by intrinsically-motivated play, i. e., playing games without extrinsic rewards but evaluated with extrinsic rewards.

Deep Residual Reinforcement Learning

1 code implementation3 May 2019 Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

We revisit residual algorithms in both model-free and model-based reinforcement learning settings.

Model-based Reinforcement Learning reinforcement-learning +1

Generalized Off-Policy Actor-Critic

1 code implementation NeurIPS 2019 Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting.

counterfactual reinforcement-learning +1

ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search

1 code implementation6 Nov 2018 Shangtong Zhang, Hao Chen, Hengshuai Yao

In this paper, we propose an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning.

Continuous Control reinforcement-learning +2

QUOTA: The Quantile Option Architecture for Reinforcement Learning

3 code implementations5 Nov 2018 Shangtong Zhang, Borislav Mavrin, Linglong Kong, Bo Liu, Hengshuai Yao

In this paper, we propose the Quantile Option Architecture (QUOTA) for exploration based on recent advances in distributional reinforcement learning (RL).

Decision Making Distributional Reinforcement Learning +2

mlpack 3: a fast, flexible machine learning library

1 code implementation Journal of Open Source Software 2018 Ryan R. Curtin, Marcus Edel, Mikhail Lozhnikov, Yannis Mentekidis, Sumedh Ghaisas, Shangtong Zhang

In the past several years, the field of machine learning has seen an explosion of interest and excitement, with hundreds or thousands of algorithms developed for different tasks every year.

Benchmarking BIG-bench Machine Learning +1

A Deeper Look at Experience Replay

4 code implementations4 Dec 2017 Shangtong Zhang, Richard S. Sutton

Recently experience replay is widely used in various deep reinforcement learning (RL) algorithms, in this paper we rethink the utility of experience replay.

Atari Games reinforcement-learning +1

Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control

no code implementations30 Nov 2017 Shangtong Zhang, Osmar R. Zaiane

Reinforcement Learning and the Evolutionary Strategy are two major approaches in addressing complicated control problems.

Continuous Control reinforcement-learning +1

Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

no code implementations9 Dec 2016 Vivek Veeriah, Shangtong Zhang, Richard S. Sutton

In this paper, we introduce a new incremental learning algorithm called crossprop, which learns incoming weights of hidden units based on the meta-gradient descent approach, that was previously introduced by Sutton (1992) and Schraudolph (1999) for learning step-sizes.

Incremental Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.