Mirror Descent Policy Optimization

Mirror descent (MD), a well-known first-order method in constrained convex optimization, has recently been shown as an important tool to analyze trust-region algorithms in reinforcement learning (RL). Inspired by such theoretical analyses, we propose an efficient RL algorithm, called {\em mirror descent policy optimization} (MDPO)... (read more)

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
MDPO
Policy Gradient Methods
Entropy Regularization
Regularization
TRPO
Policy Gradient Methods
PPO
Policy Gradient Methods