Policy Gradient Methods

Policy Gradient Methods try to optimize the policy function directly in reinforcement learning. This contrasts with, for example Q-Learning, where the policy manifests itself as maximizing a value function. Below you can find a continuously updating catalogue of policy gradient methods.

METHOD YEAR PAPERS
PPO
2017 111
REINFORCE
1999 89
DDPG
2015 80
A2C
2016 39
A3C
2016 36
TRPO
2015 35
TD3
2018 24
Soft Actor Critic
2018 21
MADDPG
2017 13
DPG
2014 7
IMPALA
2018 6
D4PG
2018 5
ACER
2016 4
Soft Actor-Critic (Autotuned Temperature)
2018 4
NoisyNet-A3C
2017 1
ACTKR
2017 1
SVPG
2017 1
Ape-X DPG
2018 1
MDPO
2020 1
TayPO
2020 1