Policy Gradient Methods

90 papers with code • 0 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Policy Gradient Methods models and implementations
2 papers
1,155
2 papers
617
See all 7 libraries.

Most implemented papers

The Mirage of Action-Dependent Baselines in Reinforcement Learning

brain-research/mirage-rl ICML 2018

Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance.

On Learning Intrinsic Rewards for Policy Gradient Methods

Hwhitetooth/lirpg NeurIPS 2018

In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents.

Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

ruizhaogit/GuessWhat-TemperedPolicyGradient 2 Jul 2018

Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic.

Training for Diversity in Image Paragraph Captioning

lukemelas/image-paragraph-captioning EMNLP 2018

Image paragraph captioning models aim to produce detailed descriptions of a source image.

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

facebookresearch/WhereDidMyOptimumGo 5 Oct 2018

We find that adaptive optimizers have a narrow window of effective learning rates, diverging in other cases, and that the effectiveness of momentum varies depending on the properties of the environment.

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

samuelfneumann/greedyac 22 Oct 2018

We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values.

Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning

facebookresearch/jps 4 Nov 2018

We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment.

Fast Efficient Hyperparameter Tuning for Policy Gradients

supratikp/HOOF 18 Feb 2019

The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.

Evaluating Rewards for Question Generation Models

bloomsburyai/question-generation NAACL 2019

Recent approaches to question generation have used modifications to a Seq2Seq architecture inspired by advances in machine translation.

Neural Logic Reinforcement Learning

ZhengyaoJiang/NLRL 24 Apr 2019

Deep reinforcement learning (DRL) has achieved significant breakthroughs in various tasks.