Offline RL
227 papers with code • 2 benchmarks • 6 datasets
Libraries
Use these libraries to find Offline RL models and implementationsDatasets
Most implemented papers
Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief
To make practical, we further devise an offline RL algorithm to approximately find the solution.
Extreme Q-Learning: MaxEnt RL without Entropy
Using EVT, we derive our \emph{Extreme Q-Learning} framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms, that do not explicitly require access to a policy or its entropy.
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization
This gives a deeper understanding of why the in-sample learning paradigm works, i. e., it applies implicit value regularization to the policy.
MOReL : Model-Based Offline Reinforcement Learning
In this work, we present MOReL, an algorithmic framework for model-based offline RL.
RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning
We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.
Offline Meta-Reinforcement Learning with Advantage Weighting
That is, in offline meta-RL, we meta-train on fixed, pre-collected data from several tasks in order to adapt to a new task with a very small amount (less than 5 trajectories) of data from the new task.
DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs
We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience.
Offline Reinforcement Learning with Fisher Divergence Critic Regularization
Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data.
Online and Offline Reinforcement Learning by Planning with a Learned Model
Combining Reanalyse with the MuZero algorithm, we introduce MuZero Unplugged, a single unified algorithm for any data budget, including offline RL.
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.