Search Results for author: Bingyang Wu

Found 3 papers, 1 papers with code

LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism

no code implementations • 15 Apr 2024 • Bingyang Wu, Shengyu Liu, Yinmin Zhong, Peng Sun, Xuanzhe Liu, Xin Jin

The context window of large language models (LLMs) is rapidly increasing, leading to a huge variance in resource usage between different requests as well as between different phases of the same request.

Paper
Add Code

A Survey of Resource-efficient LLM and Multimodal Foundation Models

1 code implementation • 16 Jan 2024 • Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, QiPeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu

Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment.

140

Paper
Code

Fast Distributed Inference Serving for Large Language Models

no code implementations • 10 May 2023 • Bingyang Wu, Yinmin Zhong, Zili Zhang, Gang Huang, Xuanzhe Liu, Xin Jin

Based on the new semi information-agnostic setting of LLM inference, the scheduler leverages the input length information to assign an appropriate initial queue for each arrival job to join.

Blocking Management +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.