Search Results for author: Nitin Kedia

Found 1 papers, 0 papers with code

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

no code implementations • 4 Mar 2024 • Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee

However, batching multiple requests leads to an interleaving of prefill and decode iterations which makes it challenging to achieve both high throughput and low latency.

Scheduling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.