About

Benchmarks

No evaluation results yet. Help compare methods by submit evaluation metrics.

Subtasks

Greatest papers with code

Behavior Regularized Offline Reinforcement Learning

26 Nov 2019google-research/google-research

In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment.

CONTINUOUS CONTROL OFFLINE RL

RL Unplugged: Benchmarks for Offline Reinforcement Learning

24 Jun 2020deepmind/deepmind-research

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

ATARI GAMES DQN REPLAY DATASET MUJOCO GAMES

Acme: A Research Framework for Distributed Reinforcement Learning

1 Jun 2020deepmind/acme

Ultimately, we show that the design decisions behind Acme lead to agents that can be scaled both up and down and that, for the most part, greater levels of parallelization result in agents with equivalent performance, just faster.

DQN REPLAY DATASET

DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs

ICLR 2021 maximecb/gym-miniworld

We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience.

OFFLINE RL

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

15 Apr 2020rail-berkeley/d4rl

The offline reinforcement learning (RL) problem, also known as batch RL, refers to the setting where a policy must be learned from a static dataset, without additional online data collection.

OFFLINE RL

An Optimistic Perspective on Offline Reinforcement Learning

10 Jul 2019google-research/batch_rl

The DQN replay dataset can serve as an offline RL benchmark and is open-sourced.

ATARI GAMES DQN REPLAY DATASET Q-LEARNING

Human-centric Dialog Training via Offline Reinforcement Learning

EMNLP 2020 natashamjaques/neural_chat

We start by hosting models online, and gather human feedback from real-time, open-ended conversations, which we then use to train and improve the models using offline reinforcement learning (RL).

LANGUAGE MODELLING OFFLINE RL

Conservative Q-Learning for Offline Reinforcement Learning

NeurIPS 2020 aviralkumar2907/CQL

We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.

CONTINUOUS CONTROL DQN REPLAY DATASET Q-LEARNING

MOPO: Model-based Offline Policy Optimization

NeurIPS 2020 tianheyu927/mopo

We also characterize the trade-off between the gain and risk of leaving the support of the batch data.

CONTINUOUS CONTROL OFFLINE RL