Language Modelling

4579 papers with code • 52 benchmarks • 157 datasets

Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.

Historically, language modelling was done with N-gram language models (which still have niche uses), but since the 2010s neural language models took over, and starting from the 2020s SOTA was achieved exclusively with large language models (LLMs).

A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, The Pile, among others.

Some notable state-of-the-art language models include:

Check below for all state-of-the-art models.

Here are some additional readings to go deeper on the task:

( Image credit: Exploring the Limits of Language Modeling )

Libraries

Use these libraries to find Language Modelling models and implementations
30 papers
126,027
12 papers
18,539
10 papers
29,377
See all 15 libraries.

Latest papers with no code

Federated Reinforcement Learning with Constraint Heterogeneity

no code yet • 6 May 2024

In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals.

ID-centric Pre-training for Recommendation

no code yet • 6 May 2024

Specifically, in pre-training stage, besides the ID-based sequential model for recommendation, we also build a Cross-domain ID-matcher (CDIM) learned by both behavioral and modality information.

TED: Accelerate Model Training by Internal Generalization

no code yet • 6 May 2024

TED uses an optimization objective based on Internal Generalization Distance (IGD), measuring changes in IG before and after pruning to align with true generalization performance and achieve implicit regularization.

VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images

no code yet • 6 May 2024

Our method is based on the Semantic Pointer Architecture (SPA) to encode objects in a hyperdimensional vector space.

High Order Reasoning for Time Critical Recommendation in Evidence-based Medicine

no code yet • 5 May 2024

In the "So-what" scenario, the optimal model provided a detailed analysis of the motivation and significance of treatment plans for ICU patients, with its reasoning achieving a similarity of 55. 6% with actual diagnostic information.

Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs

no code yet • 5 May 2024

The goal is to minimize the need for costly domain-specific data for subsequent fine-tuning while achieving desired performance levels.

Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training

no code yet • 5 May 2024

Despite having diverse features important for generalization, the pre-trained feature extractor can overfit to the source data distribution during source training and forget relevant target domain knowledge.

A self-supervised text-vision framework for automated brain abnormality detection

no code yet • 5 May 2024

To address these challenges, we present a self-supervised text-vision framework that learns to detect clinically relevant abnormalities in brain MRI scans by directly leveraging the rich information contained in accompanying free-text neuroradiology reports.

Exploring prompts to elicit memorization in masked language model-based named entity recognition

no code yet • 5 May 2024

Finally, the prompt performance of detecting model memorization is quantified by the percentage of name pairs for which the model has higher confidence for the name from the training set.

ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic Cloth Manipulation with Observation-Aligned Action Spaces

no code yet • 5 May 2024

In this paper, we introduce ClothPPO, a framework that employs a policy gradient algorithm based on actor-critic architecture to enhance a pre-trained model with huge 10^6 action spaces aligned with observation in the task of unfolding clothes.