Offline RL
226 papers with code • 2 benchmarks • 6 datasets
Libraries
Use these libraries to find Offline RL models and implementationsDatasets
Latest papers
MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator
This method trades off performance and robustness via introducing the robust Bellman operator into the algorithm.
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
This paper introduces SCOPE-RL, a comprehensive open-source Python software designed for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection (OPS).
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
In this work, we propose a new hybrid RL algorithm that combines an on-policy actor-critic method with offline data.
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets.
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning
Off-policy dynamic programming (DP) techniques such as $Q$-learning have proven to be important in sequential decision-making problems.
Robust Offline Reinforcement learning with Heavy-Tailed Rewards
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications.
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage
The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using historical (offline) data, without access to the environment for online exploration.
CROP: Conservative Reward for Model-based Offline Policy Optimization
Offline reinforcement learning (RL) aims to optimize policy using collected data without online interactions.
Corruption-Robust Offline Reinforcement Learning with General Function Approximation
Notably, under the assumption of single policy coverage and the knowledge of $\zeta$, our proposed algorithm achieves a suboptimality bound that is worsened by an additive factor of $\mathcal{O}(\zeta (C(\widehat{\mathcal{F}},\mu)n)^{-1})$ due to the corruption.
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies from offline datasets without the need for costly or unsafe interactions with the environment.