Search Results for author: Aaron Voelker

Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers

Over three orders of magnitude, we show that our new architecture attains the same accuracy as transformers with 10x fewer tokens.

Paper
Add Code

Backpropagation through the ODE solver allows each layer to adapt its internal time-step, enabling the network to learn task-relevant time-scales.

203

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.