Search Results for author: Keith W. Ross

Found 8 papers, 3 papers with code

Aggressive Q-Learning with Ensembles: Achieving Both High Sample Efficiency and High Asymptotic Performance

no code implementations17 Nov 2021 Yanqiu Wu, Xinyue Chen, Che Wang, Yiming Zhang, Keith W. Ross

In particular, Truncated Quantile Critics (TQC) achieves state-of-the-art asymptotic training performance on the MuJoCo benchmark with a distributional representation of critics; and Randomized Ensemble Double Q-Learning (REDQ) achieves high sample efficiency that is competitive with state-of-the-art model-based methods using a high update-to-data ratio and target randomization.

Continuous Control Q-Learning +1

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

no code implementations14 Jun 2021 Yiming Zhang, Keith W. Ross

Based on this bound, we develop an iterative procedure which produces a sequence of monotonically improved policies for the average reward criterion.

reinforcement-learning Reinforcement Learning (RL)

Average Reward Reinforcement Learning with Monotonic Policy Improvement

no code implementations1 Jan 2021 Yiming Zhang, Keith W. Ross

In continuing control tasks, an agent’s average reward per time step is a more natural performance measure compared to the commonly used discounting framework as it can better capture an agent’s long-term behavior.

reinforcement-learning Reinforcement Learning (RL)

First Order Constrained Optimization in Policy Space

2 code implementations NeurIPS 2020 Yiming Zhang, Quan Vuong, Keith W. Ross

We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS) which maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints.

SUPERVISED POLICY UPDATE

1 code implementation ICLR 2019 Quan Vuong, Yiming Zhang, Keith W. Ross

We show how the Natural Policy Gradient and Trust Region Policy Optimization (NPG/TRPO) problems, and the Proximal Policy Optimization (PPO) problem can be addressed by this methodology.

Reinforcement Learning (RL)

Supervised Policy Update for Deep Reinforcement Learning

1 code implementation ICLR 2019 Quan Vuong, Yiming Zhang, Keith W. Ross

We show how the Natural Policy Gradient and Trust Region Policy Optimization (NPG/TRPO) problems, and the Proximal Policy Optimization (PPO) problem can be addressed by this methodology.

reinforcement-learning Reinforcement Learning (RL)

Policy Gradient For Multidimensional Action Spaces: Action Sampling and Entropy Bonus

no code implementations ICLR 2018 Vuong Ho Quan, Yiming Zhang, Kenny Song, Xiao-Yue Gong, Keith W. Ross

In the case of high-dimensional action spaces, calculating the entropy and the gradient of the entropy requires enumerating all the actions in the action space and running forward and backpropagation for each action, which may be computationally infeasible.

Atari Games reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.