Sample Efficient Actor-Critic with Experience Replay

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. To achieve this, the paper introduces several innovations, including truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method...

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Experience Replay
Replay Memory
Retrace
Value Function Estimation
TRPO
Policy Gradient Methods
Entropy Regularization
Regularization
Double Q-learning
Off-Policy TD Control
Dense Connections
Feedforward Networks
ReLU
Activation Functions
Softmax
Output Functions
Convolution
Convolutions
Stochastic Dueling Network
Value Function Estimation
ACER
Policy Gradient Methods
Dueling Network
Q-Learning Networks