Search Results for author: Xinhao Cheng

Found 3 papers, 2 papers with code

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

1 code implementation • 29 Feb 2024 • Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, Zhihao Jia

This is because existing systems cannot handle workloads that include a mix of inference and PEFT finetuning requests.

1,529

Paper
Code

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

no code implementations • 23 Dec 2023 • Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia

In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data.

Language Modelling Large Language Model

Paper
Add Code

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

3 code implementations • 16 May 2023 • Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia

Our evaluation shows that SpecInfer outperforms existing LLM serving systems by 1. 5-2. 8x for distributed LLM inference and by 2. 6-3. 5x for offloading-based LLM inference, while preserving the same generative performance.

Decoder Language Modelling +1

1,529

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.