no code implementations • 16 Nov 2019 • Brandon Trabucco, Albert Qu, Simon Li, Ganeshkumar Ashokavardhanan
Existing methods for estimating the optimal stochastic control policy rely on high variance estimates of the policy descent.