Phasic Policy Gradient

We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases. In prior methods, one must choose between using a shared network or separate networks to represent the policy and value function... (read more)

Results in Papers With Code
(↓ scroll down to see all results)