Do Transformers Need Deep Long-Range Memory?

ACL 2020 Jack RaeAli Razavi

Deep attention models have advanced the modelling of sequential data across many domains. For language modelling in particular, the Transformer-XL {---} a Transformer augmented with a long-range memory of past activations {---} has been shown to be state-of-the-art across a variety of well-studied benchmarks... (read more)

PDF Abstract


No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper