Pythia is a suite of decoder-only autoregressive language models all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. The model architecture and hyperparameters largely follow GPT-3, with a few notable deviations based on recent advances in best practices for large scale language modeling.
Source: Pythia: A Suite for Analyzing Large Language Models Across Training and ScalingPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 9 | 29.03% |
Memorization | 3 | 9.68% |
Question Answering | 2 | 6.45% |
Common Sense Reasoning | 2 | 6.45% |
In-Context Learning | 1 | 3.23% |
Benchmarking | 1 | 3.23% |
Interpretability Techniques for Deep Learning | 1 | 3.23% |
Model Editing | 1 | 3.23% |
Large Language Model | 1 | 3.23% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |