Gated Transformer-XL

Introduced by Parisotto et al. in Stabilizing Transformers for Reinforcement Learning

Gated Transformer-XL, or GTrXL, is a Transformer-based architecture for reinforcement learning. It introduces architectural modifications that improve the stability and learning speed of the original Transformer and XL variant. Changes include:

Placing the layer normalization on only the input stream of the submodules. A key benefit to this reordering is that it now enables an identity map from the input of the transformer at the first layer to the output of the transformer after the last layer. This is in contrast to the canonical transformer, where there are a series of layer normalization operations that non-linearly transform the state encoding.
Replacing residual connections with gating layers. The authors' experiments found that GRUs were the most effective form of gating.

Source: Stabilizing Transformers for Reinforcement Learning

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Reinforcement Learning (RL)	2	50.00%
Language Modelling	1	25.00%
Machine Translation	1	25.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
GRU	Recurrent Neural Networks
Layer Normalization	Normalization
Multi-Head Attention	Attention Modules
Position-Wise Feed-Forward Layer	Feedforward Networks
Scaled Dot-Product Attention	Attention Mechanisms

Categories

Add Remove

RL Transformers