PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm

11 Jun 2023 · Wensong Bai, Chao Zhang, Yichao Fu, Lingwei Peng, Hui Qian, Bin Dai ·

In this paper, we propose the first fully push-forward-based Distributional Reinforcement Learning algorithm, called Push-forward-based Actor-Critic EncourageR (PACER). Specifically, PACER establishes a stochastic utility value policy gradient theorem and simultaneously leverages the push-forward operator in the construction of both the actor and the critic. Moreover, based on maximum mean discrepancies (MMD), a novel sample-based encourager is designed to incentivize exploration. Experimental evaluations on various continuous control benchmarks demonstrate the superiority of our algorithm over the state-of-the-art.

PDF Abstract