no code implementations • 8 Sep 2021 • Bo Zhou, Kejiao Li, Hongsheng Zeng, Fan Wang, Hao Tian
Combining off-policy reinforcement learning methods with function approximators such as neural networks has been found to lead to overestimation of the value function and sub-optimal solutions.
no code implementations • 29 Jun 2021 • Bo Zhou, Hongsheng Zeng, Yuecheng Liu, Kejiao Li, Fan Wang, Hao Tian
At the planning stage, the search space is limited to the action set produced by the policy.