Policy Gradient Methods

Mirror Descent Policy Optimization

Introduced by Tomar et al. in Mirror Descent Policy Optimization

Mirror Descent Policy Optimization (MDPO) is a policy gradient algorithm based on the idea of iteratively solving a trust-region problem that minimizes a sum of two terms: a linearization of the standard RL objective function and a proximity term that restricts two consecutive updates to be close to each other. It is based on Mirror Descent, which is a general trust region method that attempts to keep consecutive iterates close to each other.

Source: Mirror Descent Policy Optimization

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Continuous Control 1 50.00%
Reinforcement Learning (RL) 1 50.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories