1 code implementation • NeurIPS 2023 • Yunkai Gao, Rui Zhang, Jiaming Guo, Fan Wu, Qi Yi, Shaohui Peng, Siming Lan, Ruizhi Chen, Zidong Du, Xing Hu, Qi Guo, Ling Li, Yunji Chen
In this paper, we propose a novel approach called Context Shift Reduction for OMRL (CSRO) to address the context shift problem with only offline datasets.
no code implementations • 4 Sep 2023 • Shaohui Peng, Xing Hu, Qi Yi, Rui Zhang, Jiaming Guo, Di Huang, Zikang Tian, Ruizhi Chen, Zidong Du, Qi Guo, Yunji Chen, Ling Li
Large language models (LLMs) show their powerful automatic reasoning and planning capability with a wealth of semantic knowledge about the human world.
no code implementations • 13 Jul 2023 • Jiaming Zhang, Jitao Sang, Qi Yi, Changsheng Xu
Harnessing the concept of non-robust features, we elaborate on two guiding principles for surrogate model selection to explain why the foundational model is an optimal choice for this role.
1 code implementation • 12 Jun 2023 • Qi Yi, Rui Zhang, Shaohui Peng, Jiaming Guo, Yunkai Gao, Kaizhao Yuan, Ruizhi Chen, Siming Lan, Xing Hu, Zidong Du, Xishan Zhang, Qi Guo, Yunji Chen
Domain adaptation in reinforcement learning (RL) mainly deals with the changes of observation when transferring the policy to a new environment.
no code implementations • 9 Mar 2023 • Shaohui Peng, Xing Hu, Rui Zhang, Jiaming Guo, Qi Yi, Ruizhi Chen, Zidong Du, Ling Li, Qi Guo, Yunji Chen
Recently, the language-conditioned policy is proposed to facilitate policy transfer through learning the joint representation of observation and text that catches the compact and invariant information across environments.
1 code implementation • CVPR 2023 • Jiaming Zhang, Xingjun Ma, Qi Yi, Jitao Sang, Yu-Gang Jiang, YaoWei Wang, Changsheng Xu
Furthermore, we propose to leverage VisionandLanguage Pre-trained Models (VLPMs) like CLIP as the surrogate model to improve the transferability of the crafted UCs to diverse domains.
no code implementations • 13 Oct 2022 • Shaohui Peng, Xing Hu, Rui Zhang, Ke Tang, Jiaming Guo, Qi Yi, Ruizhi Chen, Xishan Zhang, Zidong Du, Ling Li, Qi Guo, Yunji Chen
To address this issue, we propose CDHRL, a causality-driven hierarchical reinforcement learning framework, leveraging a causality-driven discovery instead of a randomness-driven exploration to effectively build high-quality hierarchical structures in complicated environments.
Hierarchical Reinforcement Learning reinforcement-learning +1
no code implementations • 13 Oct 2022 • Qi Yi, Rui Zhang, Shaohui Peng, Jiaming Guo, Xing Hu, Zidong Du, Xishan Zhang, Qi Guo, Yunji Chen
Object-oriented reinforcement learning (OORL) is a promising way to improve the sample efficiency and generalization ability over standard RL.
no code implementations • 19 Jun 2022 • Jiaming Zhang, Qi Yi, Dongyuan Lu, Jitao Sang
In light of the growing concerns regarding the unauthorized use of facial recognition systems and its implications on individual privacy, the exploration of adversarial perturbations as a potential countermeasure has gained traction.
1 code implementation • 19 Jun 2022 • Jiaming Zhang, Qi Yi, Jitao Sang
While vision-language pre-training model (VLP) has shown revolutionary improvements on various vision-language (V+L) tasks, the studies regarding its adversarial robustness remain largely unexplored.
no code implementations • 29 Sep 2021 • Qi Yi, Jiaming Guo, Rui Zhang, Shaohui Peng, Xing Hu, Xishan Zhang, Ke Tang, Zidong Du, Qi Guo, Yunji Chen
Deep Reinforcement Learning (deep RL) has been successfully applied to solve various decision-making problems in recent years.
1 code implementation • 26 Jul 2021 • Jiaming Guo, Rui Zhang, Xishan Zhang, Shaohui Peng, Qi Yi, Zidong Du, Xing Hu, Qi Guo, Yunji Chen
In this paper, we propose to replace the state value function with a novel hindsight value function, which leverages the information from the future to reduce the variance of the gradient estimate for stochastic dynamic environments.
1 code implementation • 21 Jun 2021 • Jiaming Zhang, Jitao Sang, Qi Yi, Yunfan Yang, Huiwen Dong, Jian Yu
ImageNet pre-training has enabled state-of-the-art results on many tasks.