Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an on-policy critic... (read more)

PDF Abstract

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Entropy Regularization
Regularization
Convolution
Convolutions
PPO
Policy Gradient Methods
Dense Connections
Feedforward Networks
Q-Learning
Off-Policy TD Control
DQN
Q-Learning Networks