Multi-task Language Understanding

32 papers with code • 4 benchmarks • 5 datasets

The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf

Benchmarks

Add a Result

These leaderboards are used to track progress in Multi-task Language Understanding

Dataset	Best Model	Compare
MMLU	Gemini Ultra ~1760B	See all
MGSM	PaLM 2 (few-shot, k=8, SC)	See all
BBH-nlp	Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)	See all
BBH-alg	code-davinci-002 175B (CoT)	See all

Libraries

Use these libraries to find Multi-task Language Understanding models and implementations

huggingface/transformers

5 papers

125,059

epfllm/megatron-llm

3 papers

460

ggerganov/llama.cpp

2 papers

56,845

codedotal/gpt-code-clippy

2 papers

3,290

See all 10 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

RoBERTa: A Robustly Optimized BERT Pretraining Approach

pytorch/fairseq • • 26 Jul 2019

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

Paper
Code

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

google-research/ALBERT • • ICLR 2020

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.

Paper
Code

Language Models are Few-Shot Learners

openai/gpt-3 • NeurIPS 2020

By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do.

Paper
Code

LLaMA: Open and Efficient Foundation Language Models

facebookresearch/llama • • arXiv 2023

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.

Paper
Code

Language Models are Unsupervised Multitask Learners

openai/gpt-2 • • Preprint 2019

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets.

Paper
Code

Llama 2: Open Foundation and Fine-Tuned Chat Models

facebookresearch/llama • • 18 Jul 2023

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.

Paper
Code

Evaluating Large Language Models Trained on Code

openai/human-eval • 7 Jul 2021

We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities.

Paper
Code

Measuring Massive Multitask Language Understanding

hendrycks/test • 7 Sep 2020

By comprehensively evaluating the breadth and depth of a model's academic and professional understanding, our test can be used to analyze models across many tasks and to identify important shortcomings.

Paper
Code

GLM-130B: An Open Bilingual Pre-trained Model

thudm/glm-130b • • 5 Oct 2022

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.

Paper
Code

GPT-4 Technical Report

openai/evals • Preprint 2023

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.

Paper
Code

Multi-task Language Understanding

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result