no code implementations • 27 Sep 2018 • Jingkai Mao, Jakob Foerster, Tim Rocktäschel, Gregory Farquhar, Maruan Al-Shedivat, Shimon Whiteson
To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation.