Search Results for author: Valerie Morris

Attention-Only Transformers and Implementing MLPs with Attention Heads

The transformer architecture is widely used in machine learning models and consists of two alternating sublayers: attention heads and MLPs.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.