Policy Gradient Methods
90 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in Policy Gradient Methods
Libraries
Use these libraries to find Policy Gradient Methods models and implementationsLatest papers
Policy Gradient Methods in the Presence of Symmetries and State Abstractions
Our policy gradient results allow for leveraging approximate symmetries of the environment for policy optimization.
Online Portfolio Management via Deep Reinforcement Learning with High-Frequency Data
In addition, while the vast majority of SOTA strategies maintain a poor turnover rate of approximately greater than 50% on average, our framework enjoys a relatively low turnover rate on all datasets, efficiency analysis illustrates that our framework no longer has the quadratic dependency limitation.
Distributional constrained reinforcement learning for supply chain optimization
We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL.
Partial advantage estimator for proximal policy optimization
Estimation of value in policy gradient methods is a fundamental problem.
Policy Gradient in Robust MDPs with Global Convergence Guarantee
In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs.
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
To help answer this, we first introduce an open-source modular library, RL4LMs (Reinforcement Learning for Language Models), for optimizing language generators with RL.
Continuous MDP Homomorphisms and Homomorphic Policy Gradient
Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms.
The Performance Impact of Combining Agent Factorization with Different Learning Algorithms for Multiagent Coordination
In this work, we explore if the performance impact of agent factorization is different when using different learning algorithms in multiagent coordination set- tings.
Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning
Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them.
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure
The popular Proximal Policy Optimization (PPO) algorithm approximates the solution in a clipped policy space.