no code implementations • 3 Jan 2017 • Nithyanand Kota, Abhishek Mishra, Sunil Srinivasa, Xi, Chen, Pieter Abbeel
The high variance issue in unbiased policy-gradient methods such as VPG and REINFORCE is typically mitigated by adding a baseline.
Policy Gradient Methods