# Benchmarks Add a Result

No evaluation results yet. Help compare methods by submit evaluation metrics.

# DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning

23 Feb 2021

To the best knowledge of the authors, DeepThermal is the first AI application that has been used to solve real-world complex mission-critical control tasks using the offline RL approach.

# GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning

22 Feb 2021

Offline reinforcement learning approaches can generally be divided to proximal and uncertainty-aware methods.

# Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

19 Feb 2021

Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables are all mediated through the action.

# PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators

13 Feb 2021

We consider offline reinforcement learning (RL) with heterogeneous agents under severe data scarcity, i. e., we only observe a single historical trajectory for every agent under an unknown, potentially sub-optimal policy.

# Q-Value Weighted Regression: Reinforcement Learning with Limited Data

12 Feb 2021

QWR is an extension of Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, also in the offline setting, but has low sample efficiency and struggles with high-dimensional observation spaces.

# Representation Matters: Offline Pretraining for Sequential Decision Making

11 Feb 2021

The recent success of supervised learning methods on ever larger offline datasets has spurred interest in the reinforcement learning (RL) field to investigate whether the same paradigms can be translated to RL algorithms.

# Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

2 Feb 2021

Our main result shows that OPDVR provably identifies an $\epsilon$-optimal policy with $\widetilde{O}(H^2/d_m\epsilon^2)$ episodes of offline data in the finite-horizon stationary transition setting, where $H$ is the horizon length and $d_m$ is the minimal marginal state-action distribution induced by the behavior policy.

# Addressing Extrapolation Error in Deep Offline Reinforcement Learning

1 Jan 2021

These errors can be compounded by bootstrapping when the function approximator overestimates, leading the value function to *grow unbounded*, thereby crippling learning.

# Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets

1 Jan 2021

As it turns out, fine-tuning offline RL agents is a non-trivial challenge, due to distribution shift – the agent encounters out-of-distribution samples during online interaction, which may cause bootstrapping error in Q-learning and instability during fine-tuning.

# Offline Policy Optimization with Variance Regularization

1 Jan 2021

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications.