Markov Chain Monte Carlo Policy Optimization

4 Jan 2021  ·  Daniel Hsu ·

Discovering approximately optimal policies in domains is crucial to applying reinforcement learning (RL) in many real-world scenarios, which is termed as policy optimization. By viewing the policy optimization from the perspective of variational inference, the representation power of policy network allows us to obtain the approximate posterior of actions conditioned on the states, with the entropy or KL regularization. However, in practice the policy optimization may lead to suboptimal policy estimates due to amortization gap. Inspired by the Markov Chain Monte Carlo (MCMC) techniques, instead of optimizing policy parameters or policy distributions directly, we propose a new policy optimization method, incorporating gradient-based feedback in various ways. The empirical evaluation verifies the performance improvement of the proposed method in many continuous control benchmarks.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here