no code implementations • 15 Nov 2023 • Yixiu Mao, Hongchang Zhang, Chen Chen, Yi Xu, Xiangyang Ji
Offline reinforcement learning suffers from the out-of-distribution issue and extrapolation error.
1 code implementation • NeurIPS 2023 • Jianzhun Shao, Yun Qu, Chen Chen, Hongchang Zhang, Xiangyang Ji
Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action out-of-distribution (OOD) and value overestimation phenomenon excessively severe.
no code implementations • 15 Oct 2021 • Shuncheng He, Yuhang Jiang, Hongchang Zhang, Jianzhun Shao, Xiangyang Ji
These pre-trained policies can accelerate learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning.
Hierarchical Reinforcement Learning reinforcement-learning +2
no code implementations • 27 Feb 2021 • Hongchang Zhang, Jianzhun Shao, Yuhang Jiang, Shuncheng He, Xiangyang Ji
In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data.
no code implementations • 24 Feb 2021 • Jianzhun Shao, Hongchang Zhang, Yuhang Jiang, Shuncheng He, Xiangyang Ji
Reward decomposition is a critical problem in centralized training with decentralized execution~(CTDE) paradigm for multi-agent reinforcement learning.