Phasic Policy Gradient

9 Sep 2020 Karl Cobbe Jacob Hilton Oleg Klimov John Schulman

We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases. In prior methods, one must choose between using a shared network or separate networks to represent the policy and value function... (read more)

PDF Abstract

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Entropy Regularization
Regularization
PPO
Policy Gradient Methods