Policy Gradient Methods
90 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in Policy Gradient Methods
Libraries
Use these libraries to find Policy Gradient Methods models and implementationsLatest papers with no code
Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process
Nevertheless, when applying policy gradients to SDEs, since the policy gradient is estimated on a finite set of trajectories, it can be ill-defined, and the policy behavior in data-scarce regions may be uncontrolled.
Towards Provable Log Density Policy Gradient
In this work, we argue that this residual term is significant and correcting for it could potentially improve sample-complexity of reinforcement learning methods.
Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rate
The efficient utilization of historical trajectories obtained from previous policies is essential for expediting policy optimization.
When Do Off-Policy and On-Policy Policy Gradient Methods Align?
A well-established off-policy objective is the excursion objective.
Identifying Policy Gradient Subspaces
Policy gradient methods hold great potential for solving complex continuous control tasks.
Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction
Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning.
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property
Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning.
Privacy Preserving Multi-Agent Reinforcement Learning in Supply Chains
To tackle this challenge, we propose a game-theoretic, privacy-preserving mechanism, utilizing a secure multi-party computation (MPC) framework in MARL settings.
RL Dreams: Policy Gradient Optimization for Score Distillation based 3D Generation
Further, the recent work of Denoising Diffusion Policy Optimization (DDPO) demonstrates that the diffusion process is compatible with policy gradient methods and has been demonstrated to improve the 2D diffusion models using an aesthetic scoring function.
Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems
As a second contribution, we show that, under appropriate assumptions, the policy under a SAGE-based policy-gradient method has a large probability of converging to an optimal policy, provided that it starts sufficiently close to it, even with a nonconvex objective function and multiple maximizers.