Search Results for author: Zhihan Liu

Found 9 papers, 3 papers with code

Can Large Language Models Play Games? A Case Study of A Self-Play Approach

no code implementations8 Mar 2024 Hongyi Guo, Zhihan Liu, Yufeng Zhang, Zhaoran Wang

Large Language Models (LLMs) harness extensive data from the Internet, storing a broad spectrum of prior knowledge.

Decision Making Hallucination

How Can LLM Guide RL? A Value-Based Approach

1 code implementation25 Feb 2024 Shenao Zhang, Sirui Zheng, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang

Specifically, we develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning, particularly when the difference between the ideal policy and the LLM-informed policy is small, which suggests that the initial policy is close to optimal, reducing the need for further exploration.

Decision Making Reinforcement Learning (RL)

A Principled Framework for Knowledge-enhanced Large Language Model

no code implementations18 Nov 2023 Saizhuo Wang, Zhihan Liu, Zhaoran Wang, Jian Guo

Large Language Models (LLMs) are versatile, yet they often falter in tasks requiring deep and reliable reasoning due to issues like hallucinations, limiting their applicability in critical scenarios.

Language Modelling Large Language Model

Sample-Efficient Multi-Agent RL: An Optimization Perspective

no code implementations10 Oct 2023 Nuoya Xiong, Zhihan Liu, Zhaoran Wang, Zhuoran Yang

We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.

Multi-agent Reinforcement Learning

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

1 code implementation29 Sep 2023 Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang

Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon ("reason for future").

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

1 code implementation NeurIPS 2023 Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang

To achieve this, existing sample-efficient online RL algorithms typically consist of three components: estimation, planning, and exploration.

Guarded Policy Optimization with Imperfect Online Demonstrations

no code implementations3 Mar 2023 Zhenghai Xue, Zhenghao Peng, Quanyi Li, Zhihan Liu, Bolei Zhou

Assuming optimal, the teacher policy has the perfect timing and capability to intervene in the learning process of the student agent, providing safety guarantee and exploration guidance.

Continuous Control Efficient Exploration +2

SPLID: Self-Imitation Policy Learning through Iterative Distillation

no code implementations29 Sep 2021 Zhihan Liu, Hao Sun, Bolei Zhou

To this end, we propose a novel meta-algorithm Self-Imitation Policy Learning through Iterative Distillation (SPLID) which relies on the concept of $\delta$-distilled policy to iteratively level up the quality of the target data and agent mimics from the relabeled target data.

Continuous Control

Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation

no code implementations19 Aug 2021 Zhihan Liu, Yufeng Zhang, Zuyue Fu, Zhuoran Yang, Zhaoran Wang

In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set.

Imitation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.