no code implementations • 3 Dec 2021 • Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, Dong Hyun Lee, Jungwook Choi
Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models.