Search Results for author: Keith W. Ross

Found 8 papers, 3 papers with code

Aggressive Q-Learning with Ensembles: Achieving Both High Sample Efficiency and High Asymptotic Performance

no code implementations • 17 Nov 2021 • Yanqiu Wu, Xinyue Chen, Che Wang, Yiming Zhang, Keith W. Ross

In particular, Truncated Quantile Critics (TQC) achieves state-of-the-art asymptotic training performance on the MuJoCo benchmark with a distributional representation of critics; and Randomized Ensemble Double Q-Learning (REDQ) achieves high sample efficiency that is competitive with state-of-the-art model-based methods using a high update-to-data ratio and target randomization.

Continuous Control Q-Learning +1

Paper
Add Code

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

no code implementations • 14 Jun 2021 • Yiming Zhang, Keith W. Ross

Based on this bound, we develop an iterative procedure which produces a sequence of monotonically improved policies for the average reward criterion.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Average Reward Reinforcement Learning with Monotonic Policy Improvement

no code implementations • 1 Jan 2021 • Yiming Zhang, Keith W. Ross

In continuing control tasks, an agent’s average reward per time step is a more natural performance measure compared to the commonly used discounting framework as it can better capture an agent’s long-term behavior.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

First Order Constrained Optimization in Policy Space

2 code implementations • NeurIPS 2020 • Yiming Zhang, Quan Vuong, Keith W. Ross

We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS) which maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints.

Paper
Code

SUPERVISED POLICY UPDATE

1 code implementation • ICLR 2019 • Quan Vuong, Yiming Zhang, Keith W. Ross

We show how the Natural Policy Gradient and Trust Region Policy Optimization (NPG/TRPO) problems, and the Proximal Policy Optimization (PPO) problem can be addressed by this methodology.

Reinforcement Learning (RL)

Paper
Code

Efficient Entropy for Policy Gradient with Multidimensional Action Space

no code implementations • 2 Jun 2018 • Yiming Zhang, Quan Ho Vuong, Kenny Song, Xiao-Yue Gong, Keith W. Ross

We develop several novel unbiased estimators for the entropy bonus and its gradient.

Atari Games reinforcement-learning +1

Paper
Add Code

Supervised Policy Update for Deep Reinforcement Learning

1 code implementation • ICLR 2019 • Quan Vuong, Yiming Zhang, Keith W. Ross

We show how the Natural Policy Gradient and Trust Region Policy Optimization (NPG/TRPO) problems, and the Proximal Policy Optimization (PPO) problem can be addressed by this methodology.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Policy Gradient For Multidimensional Action Spaces: Action Sampling and Entropy Bonus

no code implementations • ICLR 2018 • Vuong Ho Quan, Yiming Zhang, Kenny Song, Xiao-Yue Gong, Keith W. Ross

In the case of high-dimensional action spaces, calculating the entropy and the gradient of the entropy requires enumerating all the actions in the action space and running forward and backpropagation for each action, which may be computationally infeasible.

Atari Games reinforcement-learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.