Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance.

Paper
Code

On Learning Intrinsic Rewards for Policy Gradient Methods

Hwhitetooth/lirpg • • NeurIPS 2018

In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents.

Paper
Code

Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

ruizhaogit/GuessWhat-TemperedPolicyGradient • • 2 Jul 2018

Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic.

Paper
Code

Training for Diversity in Image Paragraph Captioning

lukemelas/image-paragraph-captioning • • EMNLP 2018

Image paragraph captioning models aim to produce detailed descriptions of a source image.

Paper
Code

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

facebookresearch/WhereDidMyOptimumGo • • 5 Oct 2018

We find that adaptive optimizers have a narrow window of effective learning rates, diverging in other cases, and that the effectiveness of momentum varies depending on the properties of the environment.

Paper
Code

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

samuelfneumann/greedyac • • 22 Oct 2018

We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values.

Paper
Code

Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning

facebookresearch/jps • • 4 Nov 2018

We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment.

Paper
Code

Fast Efficient Hyperparameter Tuning for Policy Gradients

supratikp/HOOF • • 18 Feb 2019

The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.

Paper
Code

Evaluating Rewards for Question Generation Models

bloomsburyai/question-generation • • NAACL 2019

Recent approaches to question generation have used modifications to a Seq2Seq architecture inspired by advances in machine translation.

Paper
Code

Neural Logic Reinforcement Learning

ZhengyaoJiang/NLRL • • 24 Apr 2019

Deep reinforcement learning (DRL) has achieved significant breakthroughs in various tasks.

Paper
Code

Policy Gradient Methods

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result