Based on the feedback signals generated during this process (e. g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e. g., using the policy gradient methods).

Paper
Code

Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control

Breakend/ReproducibilityInContinuousPolicyGradientMethods • • 10 Aug 2017

We investigate and discuss: the significance of hyper-parameters in policy gradients for continuous control, general variance in the algorithms, and reproducibility of reported results.

Paper
Code

Cold-Start Reinforcement Learning with Softmax Policy Gradient

jacksonchen1998/Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient • • NeurIPS 2017

Policy-gradient approaches to reinforcement learning have two common and undesirable overhead procedures, namely warm-start training and sample variance reduction.

Paper
Code

Hindsight policy gradients

paulorauber/hpg • • ICLR 2019

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy.

Paper
Code

Run, skeleton, run: skeletal model in a physics-based simulation

Scitator/Run-Skeleton-Run • • 18 Nov 2017

In this paper, we present our approach to solve a physics-based reinforcement learning challenge "Learning to Run" with objective to train physiologically-based human model to navigate a complex obstacle course as quickly as possible.

Paper
Code

Divide-and-Conquer Reinforcement Learning

dibyaghosh/dnc • ICLR 2018

In this paper, we develop a novel algorithm that instead partitions the initial state space into "slices", and optimizes an ensemble of policies, each on a different slice.

Paper
Code

Bayesian Policy Gradients via Alpha Divergence Dropout Inference

Breakend/BayesianPolicyGradients • • 6 Dec 2017

Policy gradient methods have had great success in solving continuous control tasks, yet the stochastic nature of such problems makes deterministic value estimation difficult.

Paper
Code

Clipped Action Policy Gradient

pfnet-research/capg • ICML 2018

We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation.

Paper
Code

Policy Gradient Methods

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result