Language Models

GPT-NeoX

Introduced by Black et al. in GPT-NeoX-20B: An Open-Source Autoregressive Language Model

GPT-NeoX is an autoregressive transformer decoder model whose architecture largely follows that of GPT-3, with a few notable deviations. The model has 20 billion parameters with 44 layers, a hidden dimension size of 6144, and 64 heads. The main difference with GPT-3 is the change in tokenizer, the addition of Rotary Positional Embeddings, the parallel computation of attention and feed-forward layers, and a different initialization scheme and hyperparameters.

Source: GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 3 37.50%
Quantization 1 12.50%
Linguistic Acceptability 1 12.50%
Text Generation 1 12.50%
Text Detection 1 12.50%
Multi-task Language Understanding 1 12.50%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories