Language Modelling
4280 papers with code • 50 benchmarks • 151 datasets
Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.
Historically, language modelling was done with N-gram language models (which still have niche uses), but since the 2010s neural language models took over, and starting from the 2020s SOTA was achieved exclusively with large language models (LLMs).
A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, The Pile, among others.
Some notable state-of-the-art language models include:
Check below for all state-of-the-art models.
Here are some additional readings to go deeper on the task:
- Language Modeling - Lena Voita
( Image credit: Exploring the Limits of Language Modeling )
Libraries
Use these libraries to find Language Modelling models and implementationsDatasets
Subtasks
Latest papers with no code
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Attempting to complement this deficiency, we investigate layerwise properties of LoRA on fine-tuning tasks and observe an uncommon skewness of weight norms across different layers.
Towards a Zero-Data, Controllable, Adaptive Dialog System
Conversational Tree Search (V\"ath et al., 2023) is a recent approach to controllable dialog systems, where domain experts shape the behavior of a Reinforcement Learning agent through a dialog tree.
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
In a single GPU-CPU system, we demonstrate that under varying workloads, ALISA improves the throughput of baseline systems such as FlexGen and vLLM by up to 3X and 1. 9X, respectively.
Improving Text-to-Image Consistency via Automatic Prompt Optimization
In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models.
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
Most studies determine which edges belong in a LM's circuit by performing causal interventions on each edge independently, but this scales poorly with model size.
Graph Language Model (GLM): A new graph-based approach to detect social instabilities
This scientific report presents a novel methodology for the early prediction of important political events using News datasets.
DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition
End-to-end automatic speech recognition (E2E ASR) systems often suffer from mistranscription of domain-specific phrases, such as named entities, sometimes leading to catastrophic failures in downstream tasks.
"You are an expert annotator": Automatic Best-Worst-Scaling Annotations for Emotion Intensity Modeling
Labeling corpora constitutes a bottleneck to create models for new tasks or domains.
Decoding Probing: Revealing Internal Linguistic Structures in Neural Language Models using Minimal Pairs
4) For Transformer-based models, both embeddings and attentions capture grammatical features but show distinct patterns.
Juru: Legal Brazilian Large Language Model from Reputable Sources
This study contributes to the growing body of scientific evidence showing that pretraining data selection may enhance the performance of large language models, enabling the exploration of these models at a lower cost.