Policy Gradient Methods
90 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in Policy Gradient Methods
Libraries
Use these libraries to find Policy Gradient Methods models and implementationsMost implemented papers
The Mirage of Action-Dependent Baselines in Reinforcement Learning
Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance.
On Learning Intrinsic Rewards for Policy Gradient Methods
In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents.
Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient
Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic.
Training for Diversity in Image Paragraph Captioning
Image paragraph captioning models aim to produce detailed descriptions of a source image.
Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods
We find that adaptive optimizers have a narrow window of effective learning rates, diverging in other cases, and that the effectiveness of momentum varies depending on the properties of the environment.
Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement
We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values.
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment.
Fast Efficient Hyperparameter Tuning for Policy Gradients
The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.
Evaluating Rewards for Question Generation Models
Recent approaches to question generation have used modifications to a Seq2Seq architecture inspired by advances in machine translation.
Neural Logic Reinforcement Learning
Deep reinforcement learning (DRL) has achieved significant breakthroughs in various tasks.