TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Multi-task Language Understanding	MMLU	Leeroo (5-shot)	Average (%)	86.64	# 4
Multi-task Language Understanding	MMLU	Leeroo (5-shot)	Average (%)	75.9	# 16

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/leeroo-orchestrator-elevating-llms/multi-task-language-understanding-on-mmlu)](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu?p=leeroo-orchestrator-elevating-llms)`

Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration

25 Jan 2024 · Alireza Mohammadshahi, Ali Shaikh, Majid Yazdani ·

In this paper, we propose an architecture to harness the collective knowledge of multiple trained LLMs to create a new state-of-the-art. At the core of this framework is a LLM-based orchestrator that is adept at picking the right underlying LLM experts for optimal task execution. Inspired by self-play in reinforcement learning, we created a loop of query generation, orchestration, and evaluation to generate training data for the orchestrator. Our evaluation focused on the MMLU benchmark, employing models with 7B, 13B, and 34B parameters available on Hugging Face. The results demonstrate new state-of-the-art open-source models: Our Leeroo orchestrator achieves performance on par with the Mixtral model while incurring only two-thirds of its cost. Moreover, increasing the allowed cost surpasses Mixtral's accuracy by over 5% at the same cost level, reaching an accuracy of 75.9%. Further enhancements were observed when integrating GPT4 into the underlying model pool. The Leeroo orchestrator nearly matches GPT4's performance at half the cost and even exceeds GPT4's results with a 25% cost reduction. These findings illustrate the potential of our architecture in creating state-of-the-art and cost-effective LLMs by optimizing the synergy between multiple LLMs to achieve superior performance outcomes.

PDF Abstract

Code

Add Remove Mark official

leeroo-ai/leeroo_orchestrator official

↳ Quickstart in

Spaces

Tasks

Add Remove

Multi-task Language Understanding

Datasets

MMLU

Results from the Paper

Add Remove

Ranked #4 on Multi-task Language Understanding on MMLU

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Multi-task Language Understanding	MMLU	Leeroo (5-shot)	Average (%)	86.64	# 4	Compare
Multi-task Language Understanding	MMLU	Leeroo (5-shot)	Average (%)	75.9	# 16	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove