Language Modelling

4494 papers with code • 51 benchmarks • 157 datasets

Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.

Historically, language modelling was done with N-gram language models (which still have niche uses), but since the 2010s neural language models took over, and starting from the 2020s SOTA was achieved exclusively with large language models (LLMs).

A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, The Pile, among others.

Some notable state-of-the-art language models include:

Check below for all state-of-the-art models.

Here are some additional readings to go deeper on the task:

( Image credit: Exploring the Limits of Language Modeling )

Libraries

Use these libraries to find Language Modelling models and implementations
30 papers
125,167
12 papers
18,343
10 papers
29,273
See all 15 libraries.

Latest papers with no code

CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning

no code yet • 23 Apr 2024

Large Language Models (LLMs) and multi-agent systems have shown impressive capabilities in natural language tasks but face challenges in clinical trial applications, primarily due to limited access to external knowledge.

Contrastive Quantization based Semantic Code for Generative Recommendation

no code yet • 23 Apr 2024

Finally, we train and test semantic code with with generative retrieval on a sequential recommendation model.

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

no code yet • 23 Apr 2024

Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification.

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

no code yet • 23 Apr 2024

Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable.

PARAMANU-GANITA: Language Model with Mathematical Capabilities

no code yet • 22 Apr 2024

In the end, we want to point out that we have only trained Paramanu-Ganita only on a part of our entire mathematical corpus and yet to explore the full potential of our model.

A Multimodal Automated Interpretability Agent

no code yet • 22 Apr 2024

Interpretability experiments proposed by MAIA compose these tools to describe and explain system behavior.

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

no code yet • 22 Apr 2024

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

From LLM to NMT: Advancing Low-Resource Machine Translation with Claude

no code yet • 22 Apr 2024

We show that Claude 3 Opus, a large language model (LLM) released by Anthropic in March 2024, exhibits stronger machine translation competence than other LLMs.

Boter: Bootstrapping Knowledge Selection and Question Answering for Knowledge-based VQA

no code yet • 22 Apr 2024

Knowledge-based Visual Question Answering (VQA) requires models to incorporate external knowledge to respond to questions about visual content.

Understanding the role of FFNs in driving multilingual behaviour in LLMs

no code yet • 22 Apr 2024

In this paper, we conduct an in-depth analysis of the multilingual capabilities of a family of a Large Language Model, examining its architecture, activation patterns, and processing mechanisms across languages.