TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Code Generation	HumanEval	Language Agent Tree Search (GPT-4)	Pass@1	94.4	# 3
Code Generation	HumanEval	Language Agent Tree Search (GPT-3.5)	Pass@1	83.8	# 10
Code Generation	MBPP	GPT-3.5 Turbo + Language Agent Tree Search	Accuracy	81.1	# 8

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/language-agent-tree-search-unifies-reasoning/code-generation-on-humaneval)](https://paperswithcode.com/sota/code-generation-on-humaneval?p=language-agent-tree-search-unifies-reasoning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/language-agent-tree-search-unifies-reasoning/code-generation-on-mbpp)](https://paperswithcode.com/sota/code-generation-on-mbpp?p=language-agent-tree-search-unifies-reasoning)`

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

6 Oct 2023 · Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang ·

While large language models (LLMs) have demonstrated impressive performance on a range of decision-making tasks, they rely on simple acting processes and fall short of broad deployment as autonomous agents. We introduce LATS (Language Agent Tree Search), a general framework that synergizes the capabilities of LLMs in planning, acting, and reasoning. Drawing inspiration from Monte Carlo tree search in model-based reinforcement learning, LATS employs LLMs as agents, value functions, and optimizers, repurposing their latent strengths for enhanced decision-making. What is crucial in this method is the use of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that moves beyond the limitations of existing techniques. Our experimental evaluation across diverse domains, such as programming, HotPotQA, and WebShop, illustrates the applicability of LATS for both reasoning and acting. In particular, LATS achieves 94.4% for programming on HumanEval with GPT-4 and an average score of 75.9 for web browsing on WebShop with GPT-3.5, demonstrating the effectiveness and generality of our method.

PDF Abstract

Code

Add Remove Mark official

andyz245/LanguageAgentTreeSearch official

↳ Quickstart in

Spaces

486

Tasks

Add Remove

Code Generation

Decision Making

Model-based Reinforcement Learning

Datasets

HotpotQA

HumanEval MBPP

Results from the Paper

Edit

Ranked #3 on Code Generation on HumanEval

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Code Generation	HumanEval	Language Agent Tree Search (GPT-4)	Pass@1	94.4	# 3	Compare
Code Generation	HumanEval	Language Agent Tree Search (GPT-3.5)	Pass@1	83.8	# 10	Compare
Code Generation	MBPP	GPT-3.5 Turbo + Language Agent Tree Search	Accuracy	81.1	# 8	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • GPT-4 • Label Smoothing • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Transformer • Weight Decay

Edit Social Preview

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove