Policy Gradient Methods try to optimize the policy function directly in reinforcement learning. This contrasts with, for example Q-Learning, where the policy manifests itself as maximizing a value function. Below you can find a continuously updating catalogue of policy gradient methods.
METHOD | YEAR | PAPERS | |
---|---|---|---|
![]() |
2017 | 111 | |
![]() |
1999 | 89 | |
![]() |
2015 | 80 | |
![]() |
2016 | 39 | |
![]() |
2016 | 36 | |
![]() |
2015 | 35 | |
![]() |
2018 | 24 | |
![]() |
2018 | 21 | |
![]() |
2017 | 13 | |
![]() |
2014 | 7 | |
![]() |
2018 | 6 | |
![]() |
2018 | 5 | |
![]() |
2016 | 4 | |
![]() |
2018 | 4 | |
![]() |
2017 | 1 | |
![]() |
2017 | 1 | |
![]() |
2017 | 1 | |
![]() |
2018 | 1 | |
![]() |
2020 | 1 | |
![]() |
2020 | 1 |