Policy Gradient Methods
90 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in Policy Gradient Methods
Libraries
Use these libraries to find Policy Gradient Methods models and implementationsLatest papers with no code
Actor-Critic Reinforcement Learning with Phased Actor
We prove qualitative properties of PAAC for learning convergence of the value and policy, solution optimality, and stability of system dynamics.
Intervention-Assisted Policy Gradient Methods for Online Stochastic Queuing Network Optimization: Technical Report
This framework combines the learning power of neural networks with the guaranteed stability of classical control policies for SQNs.
Elementary Analysis of Policy Gradient Methods
Projected policy gradient under the simplex parameterization, policy gradient and natural policy gradient under the softmax parameterization, are fundamental algorithms in reinforcement learning.
ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy
In WebShop, the 1-shot performance of the A$^3$T agent matches human average, and 4 rounds of iterative refinement lead to the performance approaching human experts.
Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic
In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution-poses a significant challenge for the global convergence of policy gradient methods.
Global Convergence Guarantees for Federated Policy Gradient Methods with Adversaries
Federated Reinforcement Learning (FRL) allows multiple agents to collaboratively build a decision making policy without sharing raw trajectories.
Provable Policy Gradient Methods for Average-Reward Markov Potential Games
We prove that both algorithms based on independent policy gradient and independent natural policy gradient converge globally to a Nash equilibrium for the average reward criterion.
Fill-and-Spill: Deep Reinforcement Learning Policy Gradient Methods for Reservoir Operation Decision and Control
Changes in demand, various hydrological inputs, and environmental stressors are among the issues that water managers and policymakers face on a regular basis.
Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process
Nevertheless, when applying policy gradients to SDEs, since the policy gradient is estimated on a finite set of trajectories, it can be ill-defined, and the policy behavior in data-scarce regions may be uncontrolled.
Towards Provable Log Density Policy Gradient
In this work, we argue that this residual term is significant and correcting for it could potentially improve sample-complexity of reinforcement learning methods.