In addition, while the vast majority of SOTA strategies maintain a poor turnover rate of approximately greater than 50% on average, our framework enjoys a relatively low turnover rate on all datasets, efficiency analysis illustrates that our framework no longer has the quadratic dependency limitation.

01 May 2023

Paper
Code

Distributional constrained reinforcement learning for supply chain optimization

jaimesabalimperial/jaisalab • • 3 Feb 2023

We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL.

03 Feb 2023

Paper
Code

Partial advantage estimator for proximal policy optimization

ubiquition/drl • • 26 Jan 2023

Estimation of value in policy gradient methods is a fundamental problem.

26 Jan 2023

Paper
Code

Policy Gradient in Robust MDPs with Global Convergence Guarantee

jerrisonwang/icml-drpg • 20 Dec 2022

In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs.

20 Dec 2022

Paper
Code

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

allenai/rl4lms • • 3 Oct 2022

To help answer this, we first introduce an open-source modular library, RL4LMs (Reinforcement Learning for Language Models), for optimizing language generators with RL.

2,093

03 Oct 2022

Paper
Code

Continuous MDP Homomorphisms and Homomorphic Policy Gradient

sahandrez/homomorphic_policy_gradient • • 15 Sep 2022

Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms.

15 Sep 2022

Paper
Code

The Performance Impact of Combining Agent Factorization with Different Learning Algorithms for Multiagent Coordination

stavrosgreece/MultiAgentLearning • • SETN 2022

In this work, we explore if the performance impact of agent factorization is different when using different learning algorithms in multiagent coordination set- tings.

09 Sep 2022

Paper
Code

Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning

ml-jku/reactive-exploration • • 12 Jul 2022

Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them.

12 Jul 2022

Paper
Code

The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure

raincchio/p3o • • 20 May 2022

The popular Proximal Policy Optimization (PPO) algorithm approximates the solution in a clipped policy space.

20 May 2022

Paper
Code

Policy Gradient Methods

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result