Policy Gradient Methods
90 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in Policy Gradient Methods
Libraries
Use these libraries to find Policy Gradient Methods models and implementationsMost implemented papers
Analysis of the Optimization Landscape of Linear Quadratic Gaussian (LQG) Control
This paper revisits the classical Linear Quadratic Gaussian (LQG) control from a modern optimization perspective.
Policy Gradient Methods in the Presence of Symmetries and State Abstractions
Our policy gradient results allow for leveraging approximate symmetries of the environment for policy optimization.
Dual Learning for Machine Translation
Based on the feedback signals generated during this process (e. g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e. g., using the policy gradient methods).
Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
We investigate and discuss: the significance of hyper-parameters in policy gradients for continuous control, general variance in the algorithms, and reproducibility of reported results.
Cold-Start Reinforcement Learning with Softmax Policy Gradient
Policy-gradient approaches to reinforcement learning have two common and undesirable overhead procedures, namely warm-start training and sample variance reduction.
Hindsight policy gradients
A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy.
Run, skeleton, run: skeletal model in a physics-based simulation
In this paper, we present our approach to solve a physics-based reinforcement learning challenge "Learning to Run" with objective to train physiologically-based human model to navigate a complex obstacle course as quickly as possible.
Divide-and-Conquer Reinforcement Learning
In this paper, we develop a novel algorithm that instead partitions the initial state space into "slices", and optimizes an ensemble of policies, each on a different slice.
Bayesian Policy Gradients via Alpha Divergence Dropout Inference
Policy gradient methods have had great success in solving continuous control tasks, yet the stochastic nature of such problems makes deterministic value estimation difficult.
Clipped Action Policy Gradient
We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation.