Search Results for author: Ryan Ehrlich

Found 1 papers, 0 papers with code

Hydragen: High-Throughput LLM Inference with Shared Prefixes

no code implementations7 Feb 2024 Jordan Juravsky, Bradley Brown, Ryan Ehrlich, Daniel Y. Fu, Christopher Ré, Azalia Mirhoseini

Decoding in this large-batch setting can be bottlenecked by the attention operation, which reads large key-value (KV) caches from memory and computes inefficient matrix-vector products for every sequence in the batch.

16k Chatbot

Cannot find the paper you are looking for? You can Submit a new open access paper.