Offline RL
227 papers with code • 2 benchmarks • 6 datasets
Libraries
Use these libraries to find Offline RL models and implementationsDatasets
Latest papers
Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning
Our automatic and human evaluations show that our framework improves both the persona consistency and dialogue quality of a state-of-the-art social chatbot.
Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias
Can we leverage offline RL to recover better policies from online interaction?
DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning
Constrained policy search (CPS) is a fundamental problem in offline reinforcement learning, which is generally solved by advantage weighted regression (AWR).
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets
We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset.
Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
We propose to apply the consistency model as an efficient yet expressive policy representation, namely consistency policy, with an actor-critic style algorithm for three typical RL settings: offline, offline-to-online and online.
Zero-Shot Reinforcement Learning from Low Quality Data
Zero-shot reinforcement learning (RL) promises to provide agents that can perform any task in an environment after an offline, reward-free pre-training phase.
Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action out-of-distribution (OOD) and value overestimation phenomenon excessively severe.
VAPOR: Legged Robot Navigation in Outdoor Vegetation Using Offline Reinforcement Learning
We present VAPOR, a novel method for autonomous legged robot navigation in unstructured, densely vegetated outdoor environments using offline Reinforcement Learning (RL).
Reasoning with Latent Diffusion in Offline Reinforcement Learning
However, a key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset while avoiding extrapolation errors arising due to a lack of support in the dataset.