1 code implementation • 29 Feb 2024 • Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, Zhihao Jia
This is because existing systems cannot handle workloads that include a mix of inference and PEFT finetuning requests.
no code implementations • 23 Dec 2023 • Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia
In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data.
3 code implementations • 16 May 2023 • Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia
Our evaluation shows that SpecInfer outperforms existing LLM serving systems by 1. 5-2. 8x for distributed LLM inference and by 2. 6-3. 5x for offloading-based LLM inference, while preserving the same generative performance.