3 code implementations • 28 Mar 2023 • Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Victor Wai Kin Chan, Xianyuan Zhan
This gives a deeper understanding of why the in-sample learning paradigm works, i. e., it applies implicit value regularization to the policy.