Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return... (read more)

PDF Abstract NeurIPS 2020 PDF NeurIPS 2020 Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Weight Decay
Regularization
Convolution
Convolutions
Batch Normalization
Normalization
DDPG
Policy Gradient Methods
Experience Replay
Replay Memory
Dense Connections
Feedforward Networks
ReLU
Activation Functions
Target Policy Smoothing
Regularization
Clipped Double Q-learning
Off-Policy TD Control
Adam
Stochastic Optimization
TD3
Policy Gradient Methods