1 code implementation • 25 Feb 2024 • Shenao Zhang, Sirui Zheng, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang
Specifically, we develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning, particularly when the difference between the ideal policy and the LLM-informed policy is small, which suggests that the initial policy is close to optimal, reducing the need for further exploration.
1 code implementation • 29 Sep 2023 • Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang
Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon ("reason for future").
1 code implementation • NeurIPS 2023 • Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang
To achieve this, existing sample-efficient online RL algorithms typically consist of three components: estimation, planning, and exploration.
no code implementations • 25 May 2023 • Xiaoyu Chen, Shenao Zhang, Pushi Zhang, Li Zhao, Jianyu Chen
With strong capabilities of reasoning and a broad understanding of the world, Large Language Models (LLMs) have demonstrated immense potential in building versatile embodied decision-making agents capable of executing a wide array of tasks.
no code implementations • 16 Sep 2022 • Shenao Zhang
In this work, we propose Conservative Dual Policy Optimization (CDPO) that involves a Referential Update and a Conservative Update.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 30 Aug 2021 • Shenao Zhang, Lei Han, Li Shen
In multi-agent reinforcement learning, the behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number.
Multi-agent Reinforcement Learning reinforcement-learning +1
1 code implementation • 12 Jun 2021 • Shenao Zhang, Li Shen, Zhifeng Li, Wei Liu
Capturing contextual dependencies has proven useful to improve the representational power of deep neural networks.