Offline RL
224 papers with code • 2 benchmarks • 6 datasets
Libraries
Use these libraries to find Offline RL models and implementationsDatasets
Latest papers with no code
Why Online Reinforcement Learning is Causal
Our main argument is that in online learning, conditional probabilities are causal, and therefore offline RL is the setting where causal learning has the most potential to make a difference.
Offline Fictitious Self-Play for Competitive Games
Firstly, unaware of the game structure, it is impossible to interact with the opponents and conduct a major learning paradigm, self-play, for competitive games.
Trajectory-wise Iterative Reinforcement Learning Framework for Auto-bidding
The trained policy can subsequently be deployed for further data collection, resulting in an iterative training framework, which we refer to as iterative offline RL.
Align Your Intents: Offline Imitation Learning via Optimal Transport
We report that AILOT outperforms state-of-the art offline imitation learning algorithms on D4RL benchmarks and improves the performance of other offline RL algorithms in the sparse-reward tasks.
Offline Multi-task Transfer RL with Representational Penalization
We study the problem of representation transfer in offline Reinforcement Learning (RL), where a learner has access to episodic data from a number of source tasks collected a priori, and aims to learn a shared representation to be used in finding a good policy for a target task.
Goal-Conditioned Offline Reinforcement Learning via Metric Learning
Experimentally, we show how our method consistently outperforms other offline RL baselines in learning from sub-optimal offline datasets.
Reward Poisoning Attack Against Offline Reinforcement Learning
To the best of our knowledge, we propose the first black-box reward poisoning attack in the general offline RL setting.
Measurement Scheduling for ICU Patients with Offline Reinforcement Learning
Scheduling laboratory tests for ICU patients presents a significant challenge.
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
Second-order bounds are instance-dependent bounds that scale with the variance of return, which we prove are tighter than the previously known small-loss bounds of distributional RL.
Offline Actor-Critic Reinforcement Learning Scales to Large Models
We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning.