Offline RL
227 papers with code • 2 benchmarks • 6 datasets
Libraries
Use these libraries to find Offline RL models and implementationsDatasets
Latest papers with no code
The Value of Reward Lookahead in Reinforcement Learning
In particular, we measure the ratio between the value of standard RL agents and that of agents with partial future-reward lookahead.
Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning
Distributionally robust offline reinforcement learning (RL), which seeks robust policy training against environment perturbation by modeling dynamics uncertainty, calls for function approximations when facing large state-action spaces.
Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning
Across two experiments (N=316 and N=964), our results demonstrated that people interacting with policies optimized for accuracy achieve significantly better accuracy -- and even human-AI complementarity -- compared to those interacting with any other type of AI support.
Why Online Reinforcement Learning is Causal
Our main argument is that in online learning, conditional probabilities are causal, and therefore offline RL is the setting where causal learning has the most potential to make a difference.
Offline Fictitious Self-Play for Competitive Games
Firstly, unaware of the game structure, it is impossible to interact with the opponents and conduct a major learning paradigm, self-play, for competitive games.
Trajectory-wise Iterative Reinforcement Learning Framework for Auto-bidding
The trained policy can subsequently be deployed for further data collection, resulting in an iterative training framework, which we refer to as iterative offline RL.
Align Your Intents: Offline Imitation Learning via Optimal Transport
We report that AILOT outperforms state-of-the art offline imitation learning algorithms on D4RL benchmarks and improves the performance of other offline RL algorithms in the sparse-reward tasks.
Offline Multi-task Transfer RL with Representational Penalization
We study the problem of representation transfer in offline Reinforcement Learning (RL), where a learner has access to episodic data from a number of source tasks collected a priori, and aims to learn a shared representation to be used in finding a good policy for a target task.
Goal-Conditioned Offline Reinforcement Learning via Metric Learning
Experimentally, we show how our method consistently outperforms other offline RL baselines in learning from sub-optimal offline datasets.
Reward Poisoning Attack Against Offline Reinforcement Learning
To the best of our knowledge, we propose the first black-box reward poisoning attack in the general offline RL setting.