Search Results for author: Seung-seob Lee

Found 1 papers, 0 papers with code

Prompt Cache: Modular Attention Reuse for Low-Latency Inference

no code implementations7 Nov 2023 In Gim, Guojun Chen, Seung-seob Lee, Nikhil Sarda, Anurag Khandelwal, Lin Zhong

We present Prompt Cache, an approach for accelerating inference for large language models (LLM) by reusing attention states across different LLM prompts.

Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.