1 code implementation • 14 Mar 2024 • Muhammad Adnan, Akhil Arunkumar, Gaurav Jain, Prashant J. Nair, Ilya Soloveychik, Purushotham Kamath
This approach effectively reduces both the KV cache size and memory bandwidth usage without compromising model accuracy.