Language Modelling

4280 papers with code • 50 benchmarks • 151 datasets

Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.

Historically, language modelling was done with N-gram language models (which still have niche uses), but since the 2010s neural language models took over, and starting from the 2020s SOTA was achieved exclusively with large language models (LLMs).

A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, The Pile, among others.

Some notable state-of-the-art language models include:

Check below for all state-of-the-art models.

Here are some additional readings to go deeper on the task:

( Image credit: Exploring the Limits of Language Modeling )

Libraries

Use these libraries to find Language Modelling models and implementations
31 papers
123,262
12 papers
18,075
10 papers
29,037
See all 14 libraries.

Latest papers with no code

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

no code yet • 26 Mar 2024

Attempting to complement this deficiency, we investigate layerwise properties of LoRA on fine-tuning tasks and observe an uncommon skewness of weight norms across different layers.

Towards a Zero-Data, Controllable, Adaptive Dialog System

no code yet • 26 Mar 2024

Conversational Tree Search (V\"ath et al., 2023) is a recent approach to controllable dialog systems, where domain experts shape the behavior of a Reinforcement Learning agent through a dialog tree.

ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching

no code yet • 26 Mar 2024

In a single GPU-CPU system, we demonstrate that under varying workloads, ALISA improves the throughput of baseline systems such as FlexGen and vLLM by up to 3X and 1. 9X, respectively.

Improving Text-to-Image Consistency via Automatic Prompt Optimization

no code yet • 26 Mar 2024

In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models.

Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms

no code yet • 26 Mar 2024

Most studies determine which edges belong in a LM's circuit by performing causal interventions on each edge independently, but this scales poorly with model size.

Graph Language Model (GLM): A new graph-based approach to detect social instabilities

no code yet • 26 Mar 2024

This scientific report presents a novel methodology for the early prediction of important political events using News datasets.

DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition

no code yet • 26 Mar 2024

End-to-end automatic speech recognition (E2E ASR) systems often suffer from mistranscription of domain-specific phrases, such as named entities, sometimes leading to catastrophic failures in downstream tasks.

"You are an expert annotator": Automatic Best-Worst-Scaling Annotations for Emotion Intensity Modeling

no code yet • 26 Mar 2024

Labeling corpora constitutes a bottleneck to create models for new tasks or domains.

Decoding Probing: Revealing Internal Linguistic Structures in Neural Language Models using Minimal Pairs

no code yet • 26 Mar 2024

4) For Transformer-based models, both embeddings and attentions capture grammatical features but show distinct patterns.

Juru: Legal Brazilian Large Language Model from Reputable Sources

no code yet • 26 Mar 2024

This study contributes to the growing body of scientific evidence showing that pretraining data selection may enhance the performance of large language models, enabling the exploration of these models at a lower cost.