Language Modelling
4494 papers with code • 51 benchmarks • 157 datasets
Language Modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering.
Historically, language modelling was done with N-gram language models (which still have niche uses), but since the 2010s neural language models took over, and starting from the 2020s SOTA was achieved exclusively with large language models (LLMs).
A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, The Pile, among others.
Some notable state-of-the-art language models include:
Check below for all state-of-the-art models.
Here are some additional readings to go deeper on the task:
- Language Modeling - Lena Voita
( Image credit: Exploring the Limits of Language Modeling )
Libraries
Use these libraries to find Language Modelling models and implementationsDatasets
Subtasks
Latest papers with no code
CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning
Large Language Models (LLMs) and multi-agent systems have shown impressive capabilities in natural language tasks but face challenges in clinical trial applications, primarily due to limited access to external knowledge.
Contrastive Quantization based Semantic Code for Generative Recommendation
Finally, we train and test semantic code with with generative retrieval on a sequential recommendation model.
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification.
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable.
PARAMANU-GANITA: Language Model with Mathematical Capabilities
In the end, we want to point out that we have only trained Paramanu-Ganita only on a part of our entire mathematical corpus and yet to explore the full potential of our model.
A Multimodal Automated Interpretability Agent
Interpretability experiments proposed by MAIA compose these tools to describe and explain system behavior.
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.
From LLM to NMT: Advancing Low-Resource Machine Translation with Claude
We show that Claude 3 Opus, a large language model (LLM) released by Anthropic in March 2024, exhibits stronger machine translation competence than other LLMs.
Boter: Bootstrapping Knowledge Selection and Question Answering for Knowledge-based VQA
Knowledge-based Visual Question Answering (VQA) requires models to incorporate external knowledge to respond to questions about visual content.
Understanding the role of FFNs in driving multilingual behaviour in LLMs
In this paper, we conduct an in-depth analysis of the multilingual capabilities of a family of a Large Language Model, examining its architecture, activation patterns, and processing mechanisms across languages.