Search Results for author: Piotr Piękos

Found 4 papers, 2 papers with code

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

1 code implementation13 Dec 2023 Róbert Csordás, Piotr Piękos, Kazuki Irie, Jürgen Schmidhuber

The costly self-attention layers in modern Transformers require memory and compute quadratic in sequence length.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.