Projected policy gradient under the simplex parameterization, policy gradient and natural policy gradient under the softmax parameterization, are fundamental algorithms in reinforcement learning.

Paper
Add Code

ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy

no code yet • 21 Mar 2024

In WebShop, the 1-shot performance of the A$^3$T agent matches human average, and 4 rounds of iterative refinement lead to the performance approaching human experts.

Paper
Add Code

Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic

no code yet • 18 Mar 2024

In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution-poses a significant challenge for the global convergence of policy gradient methods.

Paper
Add Code

Global Convergence Guarantees for Federated Policy Gradient Methods with Adversaries

no code yet • 15 Mar 2024

Federated Reinforcement Learning (FRL) allows multiple agents to collaboratively build a decision making policy without sharing raw trajectories.

Paper
Add Code

Provable Policy Gradient Methods for Average-Reward Markov Potential Games

no code yet • 9 Mar 2024

We prove that both algorithms based on independent policy gradient and independent natural policy gradient converge globally to a Nash equilibrium for the average reward criterion.

Paper
Add Code

Fill-and-Spill: Deep Reinforcement Learning Policy Gradient Methods for Reservoir Operation Decision and Control

no code yet • 7 Mar 2024

Changes in demand, various hydrological inputs, and environmental stressors are among the issues that water managers and policymakers face on a regular basis.

Paper
Add Code

Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process

no code yet • 7 Mar 2024

Nevertheless, when applying policy gradients to SDEs, since the policy gradient is estimated on a finite set of trajectories, it can be ill-defined, and the policy behavior in data-scarce regions may be uncontrolled.

Paper
Add Code

Towards Provable Log Density Policy Gradient

no code yet • 3 Mar 2024

In this work, we argue that this residual term is significant and correcting for it could potentially improve sample-complexity of reinforcement learning methods.

Paper
Add Code

Policy Gradient Methods

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result