Language Modelling

4579 papers with code • 52 benchmarks • 157 datasets

Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.

Historically, language modelling was done with N-gram language models (which still have niche uses), but since the 2010s neural language models took over, and starting from the 2020s SOTA was achieved exclusively with large language models (LLMs).

A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, The Pile, among others.

Some notable state-of-the-art language models include:

GPT-3
BERT

Check below for all state-of-the-art models.

Here are some additional readings to go deeper on the task:

Language Modeling - Lena Voita

( Image credit: Exploring the Limits of Language Modeling )

Benchmarks

Add a Result

These leaderboards are used to track progress in Language Modelling

Dataset	Best Model	Compare
WikiText-103	RETRO (7.5B)	See all
Penn Treebank (Word Level)	GPT-3 (Zero-Shot)	See all
enwik8	GPT-2 (48 layers, h=1600)	See all
WikiText-2	SparseGPT (175B, 50% Sparsity)	See all
LAMBADA	PaLM-540B (Few-Shot)	See all
Text8	GPT-2	See all
One Billion Word	OmniNetT (Large)	See all
The Pile	GLM-130B	See all
Penn Treebank (Character Level)	Mogrifier LSTM + dynamic eval	See all
Hutter Prize	Transformer-XL + RMS dynamic eval	See all
C4	Primer	See all
Wiki-40B	FLASH-Quad-8k	See all
BIG-bench-lite	GLM-130B (3-shot)	See all
FewCLUE (EPRSTMT)	GLM-130B	See all
FewCLUE (OCNLI-FC)	GLM-130B	See all
FewCLUE (BUSTM)	GLM-130B	See all
FewCLUE (CHID-FC)	GLM-130B	See all
FewCLUE (CLUEWSC-FC)	GLM-130B	See all
CLUE (C3)	GLM-130B	See all
CLUE (WSC1.1)	GLM-130B	See all
CLUE (CMNLI)	GLM-130B	See all
CLUE (DRCD)	GLM-130B	See all
CLUE (OCNLI_50K)	GLM-130B	See all
CLUE (AFQMC)	GLM-130B	See all
CLUE (CMRC2018)	GLM-130B	See all
VietMed	Hybrid 4-gram VietMed-Train + ExtraText	See all
enwiki8	PAR Transformer 24B	See all
PTB Diagnostic ECG Database	I-DARTS	See all
Text8 dev	Transformer-LS (small)	See all
enwik8 dev	Transformer-LS (small)	See all
PubMed Cognitive Control Abstracts	Gopher	See all
DM Mathematics	Gopher	See all
Ubuntu IRC	Gopher	See all
OpenSubtitles	Gopher	See all
OpenWebtext2	Gopher	See all
HackerNews	Gopher	See all
Books3	Gopher	See all
Bookcorpus2	Gopher	See all
Pile CC	Gopher	See all
PhilPapers	Gopher	See all
Gutenberg PG-19	Gopher	See all
Arxiv HEP-TH citation graph	Gopher	See all
StackExchange	Gopher	See all
NIH ExPorter	Gopher	See all
USPTO Backgrounds	Gopher	See all
PubMed Central	Gopher	See all
FreeLaw	Gopher	See all
Curation Corpus	Gopher	See all
GitHub	Gopher	See all
100 sleep nights of 8 caregivers	Gpt3	See all
language-modeling-recommendation	GPT2	See all
LM	result	See all

Show all 52 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Language Modelling models and implementations

huggingface/transformers

30 papers

126,027

faceonlive/ai-research

29 papers

225

microsoft/unilm

12 papers

18,539

pytorch/fairseq

10 papers

29,377

See all 15 libraries.

Datasets

Subtasks

Sentence Pair Modeling

Cross-Document Language Modeling

Controllable Language Modelling

Latest papers with no code

Most implemented Social Latest No code

Federated Reinforcement Learning with Constraint Heterogeneity

no code yet • 6 May 2024

In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals.

Paper
Add Code

ID-centric Pre-training for Recommendation

no code yet • 6 May 2024

Specifically, in pre-training stage, besides the ID-based sequential model for recommendation, we also build a Cross-domain ID-matcher (CDIM) learned by both behavioral and modality information.

Paper
Add Code

TED: Accelerate Model Training by Internal Generalization

no code yet • 6 May 2024

TED uses an optimization objective based on Internal Generalization Distance (IGD), measuring changes in IG before and after pruning to align with true generalization performance and achieve implicit regularization.

Paper
Add Code

VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images

no code yet • 6 May 2024

Our method is based on the Semantic Pointer Architecture (SPA) to encode objects in a hyperdimensional vector space.

Paper
Add Code

High Order Reasoning for Time Critical Recommendation in Evidence-based Medicine

no code yet • 5 May 2024

In the "So-what" scenario, the optimal model provided a detailed analysis of the motivation and significance of treatment plans for ICU patients, with its reasoning achieving a similarity of 55. 6% with actual diagnostic information.

Paper
Add Code

Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs

no code yet • 5 May 2024

The goal is to minimize the need for costly domain-specific data for subsequent fine-tuning while achieving desired performance levels.

Paper
Add Code

Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training

no code yet • 5 May 2024

Despite having diverse features important for generalization, the pre-trained feature extractor can overfit to the source data distribution during source training and forget relevant target domain knowledge.

Paper
Add Code

A self-supervised text-vision framework for automated brain abnormality detection

no code yet • 5 May 2024

To address these challenges, we present a self-supervised text-vision framework that learns to detect clinically relevant abnormalities in brain MRI scans by directly leveraging the rich information contained in accompanying free-text neuroradiology reports.

Paper
Add Code

Exploring prompts to elicit memorization in masked language model-based named entity recognition

no code yet • 5 May 2024

Finally, the prompt performance of detecting model memorization is quantified by the percentage of name pairs for which the model has higher confidence for the name from the training set.

Paper
Add Code

ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic Cloth Manipulation with Observation-Aligned Action Spaces

no code yet • 5 May 2024

In this paper, we introduce ClothPPO, a framework that employs a policy gradient algorithm based on actor-critic architecture to enhance a pre-trained model with huge 10^6 action spaces aligned with observation in the task of unfolding clothes.

Paper
Add Code

Language Modelling

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result