no code implementations • 20 Mar 2022 • Zuzana Jelčicová, Marian Verhelst
Moreover, a reduction of ~87-94% operations can be achieved when only degrading the accuracy by 1-4%, speeding up the multi-head self-attention inference by a factor of ~7. 5-16.