Weighted Bellman Backups for Improved Signal-to-Noise in Q-Updates

1 Jan 2021 Anonymous

Off-policy deep reinforcement learning (RL) has been successful in a range of challenging domains. However, standard off-policy RL algorithms can suffer from low signal and even instability in Q-learning because target values are derived from current Q-estimates, which are often noisy... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper

Off-Policy TD Control
Noisy Linear Layer
Randomized Value Functions
Dense Connections
Feedforward Networks
Double Q-learning
Off-Policy TD Control
N-step Returns
Value Function Estimation
Q-Learning Networks
Prioritized Experience Replay
Replay Memory
Dueling Network
Q-Learning Networks
Rainbow DQN
Q-Learning Networks