Transformer

Introduced by Vaswani et al. in Attention Is All You Need

A Transformer is a model architecture that eschews recurrence and instead relies entirely on an attention mechanism to draw global dependencies between input and output. Before Transformers, the dominant sequence transduction models were based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The Transformer also employs an encoder and decoder, but removing recurrence in favor of attention mechanisms allows for significantly more parallelization than methods like RNNs and CNNs.

Source: Attention Is All You Need

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	46	6.53%
Semantic Segmentation	27	3.84%
Large Language Model	20	2.84%
Question Answering	18	2.56%
Object Detection	18	2.56%
In-Context Learning	15	2.13%
Image Classification	12	1.70%
Denoising	12	1.70%
Retrieval	12	1.70%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Absolute Position Encodings	Position Embeddings
Adam	Stochastic Optimization
BPE	Subword Segmentation
Dense Connections	Feedforward Networks
Dropout	Regularization
GELU	Activation Functions	(optional)
Label Smoothing	Regularization
Layer Normalization	Normalization
Multi-Head Attention	Attention Modules
Position-Wise Feed-Forward Layer	Feedforward Networks
ReLU	Activation Functions	(optional)
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms
Softmax	Output Functions

Categories

Add Remove

Transformers

Autoregressive Transformers