Offline RL
234 papers with code • 2 benchmarks • 7 datasets
Libraries
Use these libraries to find Offline RL models and implementationsDatasets
Latest papers
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
In this work, we propose a new hybrid RL algorithm that combines an on-policy actor-critic method with offline data.
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets.
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning
Off-policy dynamic programming (DP) techniques such as $Q$-learning have proven to be important in sequential decision-making problems.
Robust Offline Reinforcement learning with Heavy-Tailed Rewards
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications.
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage
The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using historical (offline) data, without access to the environment for online exploration.
CROP: Conservative Reward for Model-based Offline Policy Optimization
Offline reinforcement learning (RL) aims to optimize policy using collected data without online interactions.
Corruption-Robust Offline Reinforcement Learning with General Function Approximation
Notably, under the assumption of single policy coverage and the knowledge of $\zeta$, our proposed algorithm achieves a suboptimality bound that is worsened by an additive factor of $\mathcal{O}(\zeta (C(\widehat{\mathcal{F}},\mu)n)^{-1})$ due to the corruption.
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies from offline datasets without the need for costly or unsafe interactions with the environment.
Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning
Our automatic and human evaluations show that our framework improves both the persona consistency and dialogue quality of a state-of-the-art social chatbot.
Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Can we leverage offline RL to recover better policies from online interaction?