no code implementations • ICLR 2019 • Huizhuo Yuan, Chris Junchi Li, Yuhao Tang, Yuren Zhou
In this paper, we propose the StochAstic Recursive grAdient Policy Optimization (SARAPO) algorithm which is a novel variance reduction method on Trust Region Policy Optimization (TRPO).