no code implementations • 9 Jun 2023 • Max Weltevrede, Matthijs T. J. Spaan, Wendelin Böhmer
We motivate mathematically and show empirically that generalisation to tasks that are "reachable'' during training is improved by increasing the diversity of transitions in the replay buffer.