Search Results for author: Akhil Arunkumar

Found 1 papers, 1 papers with code

Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference

1 code implementation • 14 Mar 2024 • Muhammad Adnan, Akhil Arunkumar, Gaurav Jain, Prashant J. Nair, Ilya Soloveychik, Purushotham Kamath

This approach effectively reduces both the KV cache size and memory bandwidth usage without compromising model accuracy.

Text Generation

13

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.