no code implementations • 30 May 2024 • Junjie Zhang, Chenjia Bai, Haoran He, Wenke Xia, Zhigang Wang, Bin Zhao, Xiu Li, Xuelong Li
In this paper, we propose SAM-E, a novel architecture for robot manipulation by leveraging a vision-foundation model for generalizable scene understanding and sequence imitation for long-term action reasoning.
1 code implementation • 24 May 2024 • Jiafei Lyu, Chenjia Bai, Jingwen Yang, Zongqing Lu, Xiu Li
We perform representation learning only in the target domain and measure the representation deviations on the transitions from the source domain, which we show can be a signal of dynamics mismatch.
no code implementations • 23 May 2024 • Yang Zhang, Shixin Yang, Chenjia Bai, Fei Wu, Xiu Li, Zhen Wang, Xuelong Li
In this paper, we propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
no code implementations • 12 May 2024 • Changhong Wang, Xudong Yu, Chenjia Bai, Qiaosheng Zhang, Zhen Wang
To address this problem, our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning.
1 code implementation • 10 May 2024 • Xiaoyu Wen, Chenjia Bai, Kang Xu, Xudong Yu, Yang Zhang, Xuelong Li, Zhen Wang
In this paper, we propose a novel representation-based approach to measure the domain gap, where the representation is learned through a contrastive objective by sampling transitions from different domains.
no code implementations • 30 Apr 2024 • Qiaosheng Zhang, Chenjia Bai, Shuyue Hu, Zhen Wang, Xuelong Li
Finally, we extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
1 code implementation • 30 Apr 2024 • Chenjia Bai, Lingxiao Wang, Jianye Hao, Zhuoran Yang, Bin Zhao, Zhen Wang, Xuelong Li
We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolving the distribution shift issue in data sharing.
no code implementations • 9 Apr 2024 • Xudong Yu, Chenjia Bai, Hongyi Guo, Changhong Wang, Zhen Wang
Offline Reinforcement Learning (RL) faces distributional shift and unreliable value estimation, especially for out-of-distribution (OOD) actions.
no code implementations • 7 Apr 2024 • Xudong Yu, Chenjia Bai, Haoran He, Changhong Wang, Xuelong Li
Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks.
no code implementations • 22 Feb 2024 • Haoran He, Chenjia Bai, Ling Pan, Weinan Zhang, Bin Zhao, Xuelong Li
In the fine-tuning stage, we harness the imagined future videos to guide low-level action learning trained on a limited set of robot data.
no code implementations • 19 Dec 2023 • Jinyi Liu, Zhi Wang, Yan Zheng, Jianye Hao, Chenjia Bai, Junjie Ye, Zhen Wang, Haiyin Piao, Yang Sun
In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream principle for directing exploration towards less explored areas, characterized by higher uncertainty.
no code implementations • 29 Sep 2023 • Xiaoyu Wen, Xudong Yu, Rui Yang, Chenjia Bai, Zhen Wang
Experimental results illustrate the superiority of RO2O in facilitating stable offline-to-online learning and achieving significant improvement with limited online interactions.
1 code implementation • NeurIPS 2023 • Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li
Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings.
1 code implementation • 29 May 2023 • Haoran He, Chenjia Bai, Hang Lai, Lingxiao Wang, Weinan Zhang
In this paper, we propose a novel single-stage privileged knowledge distillation method called the Historical Information Bottleneck (HIB) to narrow the sim-to-real gap.
no code implementations • 28 May 2023 • Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li
Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.
1 code implementation • 8 May 2023 • Rushuai Yang, Chenjia Bai, Hongyi Guo, Siyuan Li, Bin Zhao, Zhen Wang, Peng Liu, Xuelong Li
Under mild assumptions, our objective maximizes the MI between different behaviors based on the same skill, which serves as an upper bound of the previous MI objective.
1 code implementation • 29 Jul 2022 • Shuang Qiu, Lingxiao Wang, Chenjia Bai, Zhuoran Yang, Zhaoran Wang
Moreover, under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
1 code implementation • 6 Jun 2022 • Rui Yang, Chenjia Bai, Xiaoteng Ma, Zhaoran Wang, Chongjie Zhang, Lei Han
Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks.
1 code implementation • ICLR 2022 • Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhihong Deng, Animesh Garg, Peng Liu, Zhaoran Wang
We show that such OOD sampling and pessimistic bootstrapping yields provable uncertainty quantifier in linear MDPs, thus providing the theoretical underpinning for PBRL.
1 code implementation • 24 Oct 2021 • Zhihong Deng, Zuyue Fu, Lingxiao Wang, Zhuoran Yang, Chenjia Bai, Tianyi Zhou, Zhaoran Wang, Jing Jiang
Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems.
1 code implementation • NeurIPS 2021 • Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye Hao, Peng Liu, Zhaoran Wang
Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards.
no code implementations • 29 Sep 2021 • Jinyi Liu, Zhi Wang, Yan Zheng, Jianye Hao, Junjie Ye, Chenjia Bai, Pengyi Li
Many exploration strategies are built upon the optimism in the face of the uncertainty (OFU) principle for reinforcement learning.
no code implementations • 14 Sep 2021 • Jianye Hao, Tianpei Yang, Hongyao Tang, Chenjia Bai, Jinyi Liu, Zhaopeng Meng, Peng Liu, Zhen Wang
In addition to algorithmic analysis, we provide a comprehensive and unified empirical comparison of different exploration methods for DRL on a set of commonly used benchmarks.
1 code implementation • 13 May 2021 • Chenjia Bai, Lingxiao Wang, Lei Han, Jianye Hao, Animesh Garg, Peng Liu, Zhaoran Wang
In this paper, we propose a principled exploration method for DRL through Optimistic Bootstrapping and Backward Induction (OB2I).
no code implementations • 1 Jan 2021 • Chenjia Bai, Lingxiao Wang, Peng Liu, Zhaoran Wang, Jianye Hao, Yingnan Zhao
However, such an approach is challenging in developing practical exploration algorithms for Deep Reinforcement Learning (DRL).
no code implementations • 17 Oct 2020 • Chenjia Bai, Peng Liu, Kaiyu Liu, Lingxiao Wang, Yingnan Zhao, Lei Han
Efficient exploration remains a challenging problem in reinforcement learning, especially for tasks where extrinsic rewards from environments are sparse or even totally disregarded.