no code implementations • 9 Oct 2023 • Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu
MOREC learns a generalizable dynamics reward function from offline data, which is subsequently employed as a transition filter in any offline MBRL method: when generating transitions, the dynamics model generates a batch of transitions and selects the one with the highest dynamics reward value.
no code implementations • 1 Jun 2022 • Fan-Ming Luo, Xingchen Cao, Yang Yu
Empirical results compared with the state-of-the-art AIL methods show that DARL can learn a reward that is more consistent with the true reward, thus obtaining higher environment returns.