Policy Gradient Methods
89 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in Policy Gradient Methods
Libraries
Use these libraries to find Policy Gradient Methods models and implementationsLatest papers
Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement
Current methods for end-to-end constructive neural combinatorial optimization usually train a policy using behavior cloning from expert solutions or policy gradient methods from reinforcement learning.
Predictable Reinforcement Learning Dynamics through Entropy Rate Minimization
In Reinforcement Learning (RL), agents have no incentive to exhibit predictable behaviors, and are often pushed (through e. g. policy entropy regularization) to randomize their actions in favor of exploration.
Clipped-Objective Policy Gradients for Pessimistic Policy Optimization
To facilitate efficient learning, policy gradient approaches to deep reinforcement learning (RL) are typically paired with variance reduction measures and strategies for making large but safe policy changes based on a batch of experiences.
Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach
We investigate the problem of learning an $\epsilon$-approximate solution for the discrete-time Linear Quadratic Regulator (LQR) problem via a Stochastic Variance-Reduced Policy Gradient (SVRPG) approach.
Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity and Last-Iterate Convergence
Our main results are two-fold: (i) in the deterministic setting, we establish the first global last-iterate linear convergence result for the nested algorithm that seeks NE of zero-sum LQ games; (ii) in the model-free setting, we establish a~$\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity using a single-point ZO estimator.
Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning
Oftentimes, environments for sequential decision-making problems can be quite sparse in the provision of evaluative feedback to guide reinforcement-learning agents.
Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models
We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data.
Efficient Diffusion Policies for Offline Reinforcement Learning
2) It is incompatible with maximum likelihood-based RL algorithms (e. g., policy gradient methods) as the likelihood of diffusion models is intractable.
Client Selection for Federated Policy Optimization with Environment Heterogeneity
This paper investigates the federated version of Approximate PI (API) and derives its error bound, taking into account the approximation error introduced by environment heterogeneity.
Policy Gradient Methods in the Presence of Symmetries and State Abstractions
Our policy gradient results allow for leveraging approximate symmetries of the environment for policy optimization.