RL Transformers

Gated Transformer-XL

Introduced by Parisotto et al. in Stabilizing Transformers for Reinforcement Learning

Gated Transformer-XL, or GTrXL, is a Transformer-based architecture for reinforcement learning. It introduces architectural modifications that improve the stability and learning speed of the original Transformer and XL variant. Changes include:

  • Placing the layer normalization on only the input stream of the submodules. A key benefit to this reordering is that it now enables an identity map from the input of the transformer at the first layer to the output of the transformer after the last layer. This is in contrast to the canonical transformer, where there are a series of layer normalization operations that non-linearly transform the state encoding.
  • Replacing residual connections with gating layers. The authors' experiments found that GRUs were the most effective form of gating.
Source: Stabilizing Transformers for Reinforcement Learning

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Reinforcement Learning (RL) 2 50.00%
Language Modelling 1 25.00%
Machine Translation 1 25.00%

Categories