no code implementations • 11 Apr 2023 • Venkat Srinivasan, Darshan Gandhi, Urmish Thakker, Raghu Prabhakar
We show that we can successfully train GPT 13B to the same quality as the dense GPT 13B model, while achieving an end-end speedup of 4. 5x over dense A100 baseline.