DDPG, or Deep Deterministic Policy Gradient, is an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. It combines the actor-critic approach with insights from DQNs: in particular, the insights that 1) the network is trained off-policy with samples from a replay buffer to minimize correlations between samples, and 2) the network is trained with a target Q network to give consistent targets during temporal difference backups. DDPG makes use of the same ideas along with batch normalization.
Source:TASK | PAPERS | SHARE |
---|---|---|
Continuous Control | 20 | 40.82% |
Autonomous Driving | 4 | 8.16% |
Decision Making | 3 | 6.12% |
Imitation Learning | 3 | 6.12% |
Efficient Exploration | 2 | 4.08% |
Motion Planning | 2 | 4.08% |
adversarial training | 2 | 4.08% |
Meta-Learning | 2 | 4.08% |
Hypothesis Testing | 1 | 2.04% |