no code implementations • 9 Oct 2023 • Fan-Ming Luo, Tian Xu, Xingchen Cao, Yang Yu
MOREC learns a generalizable dynamics reward function from offline data, which is subsequently employed as a transition filter in any offline MBRL method: when generating transitions, the dynamics model generates a batch of transitions and selects the one with the highest dynamics reward value.
no code implementations • 19 Aug 2022 • Rong-Jun Qin, Fan-Ming Luo, Hong Qian, Yang Yu
This paper addresses policy learning in non-stationary environments and games with continuous actions.
no code implementations • 19 Jun 2022 • Fan-Ming Luo, Tian Xu, Hang Lai, Xiong-Hui Chen, Weinan Zhang, Yang Yu
In this survey, we take a review of MBRL with a focus on the recent progress in deep RL.
no code implementations • 1 Jun 2022 • Fan-Ming Luo, Xingchen Cao, Yang Yu
Empirical results compared with the state-of-the-art AIL methods show that DARL can learn a reward that is more consistent with the true reward, thus obtaining higher environment returns.
1 code implementation • NeurIPS 2021 • Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Qin, Wenjie Shang, Jieping Ye
Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies.