Search Results for author: Ryan Ehrlich

Found 1 papers, 0 papers with code

Hydragen: High-Throughput LLM Inference with Shared Prefixes

no code implementations • 7 Feb 2024 • Jordan Juravsky, Bradley Brown, Ryan Ehrlich, Daniel Y. Fu, Christopher Ré, Azalia Mirhoseini

Decoding in this large-batch setting can be bottlenecked by the attention operation, which reads large key-value (KV) caches from memory and computes inefficient matrix-vector products for every sequence in the batch.

16k Chatbot

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.