Pythia

Introduced by Biderman et al. in Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Pythia is a suite of decoder-only autoregressive language models all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. The model architecture and hyperparameters largely follow GPT-3, with a few notable deviations based on recent advances in best practices for large scale language modeling.

Source: Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	9	29.03%
Memorization	3	9.68%
Question Answering	2	6.45%
Common Sense Reasoning	2	6.45%
In-Context Learning	1	3.23%
Benchmarking	1	3.23%
Interpretability Techniques for Deep Learning	1	3.23%
Model Editing	1	3.23%
Large Language Model	1	3.23%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Language Models