no code implementations • ICLR 2021 • Arthur Argenson, Gabriel Dulac-Arnold
Recent work on training RL policies from offline data has shown results both with model-free policies learned directly from the data, or with planning on top of learnt models of the data.