1 code implementation • 20 Nov 2022 • Zhizhou Ren, Anji Liu, Yitao Liang, Jian Peng, Jianzhu Ma
To bridge this gap, we study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning.
1 code implementation • 7 Dec 2021 • Qianlan Yang, Weijun Dong, Zhizhou Ren, Jianhao Wang, Tonghan Wang, Chongjie Zhang
However, one critical challenge in this paradigm is the complexity of greedy action selection with respect to the factorized values.
1 code implementation • ICLR 2022 • Zhizhou Ren, Ruihan Guo, Yuan Zhou, Jian Peng
Based on this framework, this paper proposes a novel reward redistribution algorithm, randomized return decomposition (RRD), to learn a proxy reward function for episodic reinforcement learning.
1 code implementation • NeurIPS 2021 • Zhizhou Ren, Guangxiang Zhu, Hao Hu, Beining Han, Jianglun Chen, Chongjie Zhang
Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation.
no code implementations • 22 Jun 2021 • Beining Han, Zhizhou Ren, Zuofan Wu, Yuan Zhou, Jian Peng
We study deep reinforcement learning (RL) algorithms with delayed rewards.
1 code implementation • 11 Mar 2021 • Hao Hu, Jianing Ye, Guangxiang Zhu, Zhizhou Ren, Chongjie Zhang
Episodic memory-based methods can rapidly latch onto past successful strategies by a non-parametric memory and improve sample efficiency of traditional reinforcement learning.
no code implementations • 28 Sep 2020 • Jianhao Wang, Zhizhou Ren, Beining Han, Jianing Ye, Chongjie Zhang
Value decomposition is a popular and promising approach to scaling up multi-agent reinforcement learning in cooperative settings.
5 code implementations • ICLR 2021 • Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, Chongjie Zhang
This paper presents a novel MARL approach, called duPLEX dueling multi-agent Q-learning (QPLEX), which takes a duplex dueling network architecture to factorize the joint value function.
no code implementations • NeurIPS 2021 • Jianhao Wang, Zhizhou Ren, Beining Han, Jianing Ye, Chongjie Zhang
Value factorization is a popular and promising approach to scaling up multi-agent reinforcement learning in cooperative settings, which balances the learning scalability and the representational capacity of value functions.
1 code implementation • NeurIPS 2019 • Zhizhou Ren, Kefan Dong, Yuan Zhou, Qiang Liu, Jian Peng
Goal-oriented reinforcement learning has recently been a practical framework for robotic manipulation tasks, in which an agent is required to reach a certain goal defined by a function on the state space.
no code implementations • ICLR 2019 • Guangxiang Zhu, Jianhao Wang, Zhizhou Ren, Chongjie Zhang
Object-based approaches for learning action-conditioned dynamics has demonstrated promise for generalization and interpretability.
1 code implementation • 16 Apr 2019 • Guangxiang Zhu, Jianhao Wang, Zhizhou Ren, Zichuan Lin, Chongjie Zhang
We also design a spatial-temporal relational reasoning mechanism for MAOP to support instance-level dynamics learning and handle partial observability.