Offline RL
225 papers with code • 2 benchmarks • 6 datasets
Libraries
Use these libraries to find Offline RL models and implementationsDatasets
Most implemented papers
Critic Regularized Regression
Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.
COMBO: Conservative Offline Model-Based Policy Optimization
We overcome this limitation by developing a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-action tuples generated via rollouts under the learned model.
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
However, prior methods typically require accurate estimation of the behavior policy or sampling from OOD data points, which themselves can be a non-trivial problem.
The In-Sample Softmax for Offline Reinforcement Learning
We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.
The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints
We show that a mild relaxation of the task and workspace constraints implicit in existing object grasping datasets can cause neural network based grasping algorithms to fail on even a simple block stacking task when executed under more realistic circumstances.
NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning
We evaluate existing offline RL algorithms on NeoRL and argue that the performance of a policy should also be compared with the deterministic version of the behavior policy, instead of the dataset reward.
Adversarially Trained Actor Critic for Offline Reinforcement Learning
We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.
Supported Policy Optimization for Offline Reinforcement Learning
Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization that constrains the policy to perform actions within the support set of the behavior policy.
cosFormer: Rethinking Softmax in Attention
As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of the conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy.