Offline-Online Reinforcement Learning: Extending Batch and Online RL

29 Sep 2021  ·  Maryam Hashemzadeh, Wesley Chung, Martha White ·

Batch RL has seen a surge in popularity and is applicable in many practical scenarios where past data is available. Unfortunately, the performance of batch RL agents is limited in both theory and practice without strong assumptions on the data-collection process e.g. sufficient coverage or a good policy. To enable better performance, we investigate the offline-online setting: The agent has access to a batch of data to train on but is also allowed to learn during the evaluation phase in an online manner. This is an extension to batch RL, allowing the agent to adapt to new situations without having to precommit to a policy. In our experiments, we find that agents trained in an offline-online manner can outperform agents trained only offline or online, sometimes by a large margin, for different dataset sizes and data-collection policies. Furthermore, we investigate the use of optimism vs. pessimism for value functions in the offline-online setting due to their use in batch and online RL.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here