Offline RL
225 papers with code • 2 benchmarks • 6 datasets
Libraries
Use these libraries to find Offline RL models and implementationsDatasets
Latest papers
Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning
Our COCOA seeks both in-distribution anchors and differences by utilizing the learned reverse dynamics model, encouraging conservatism in the compositional input space for the policy or value function.
Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner?
Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions
This work offers a hands-on reference for the research progress in deep generative models for offline policy learning, and aims to inspire improved DGM-based offline RL or IL algorithms.
MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces
Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge.
Stitching Sub-Trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RL
In this paper, we propose SSD (Sub-trajectory Stitching with Diffusion), a model-based offline GCRL method that leverages the conditional diffusion model to address these limitations.
SEABO: A Simple Search-Based Method for Offline Imitation Learning
Offline reinforcement learning (RL) has attracted much attention due to its ability in learning from static offline datasets and eliminating the need of interacting with the environment.
Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning
We show that such an SDE has a solution that we can use to calculate the log probability of the policy, yielding an entropy regularizer that improves the exploration of offline datasets.
ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update
To resolve this issue, we propose a simple yet effective modification that projects the backward gradient onto the normal plane of the forward gradient, resulting in an orthogonal-gradient update, a new learning rule for DICE-based methods.
Differentiable Tree Search in Latent State Space
In this work, we introduce Differentiable Tree Search (DTS), a novel neural network architecture that significantly strengthens the inductive bias by embedding the algorithmic structure of a best-first online search algorithm.
Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model
Interestingly, we discover that via reachability analysis of safe-control theory, the hard safety constraint can be equivalently translated to identifying the largest feasible region given the offline dataset.