Search Results for author: Xiaoxuan Liu

Found 8 papers, 2 papers with code

Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

1 code implementation • 22 Apr 2024 • Tyler Griggs, Xiaoxuan Liu, Jiaxiang Yu, Doyoung Kim, Wei-Lin Chiang, Alvin Cheung, Ion Stoica

Within this space, we show that there is not a linear relationship between GPU cost and performance, and identify three key LLM service characteristics that significantly affect which GPU type is the most cost effective: model request size, request rate, and latency service-level objective (SLO).

Language Modelling Large Language Model

Paper
Code

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

no code implementations • 17 Jan 2024 • Yao Lu, Song Bian, Lequn Chen, Yongjun He, Yulong Hui, Matthew Lentz, Beibin Li, Fei Liu, Jialin Li, Qi Liu, Rui Liu, Xiaoxuan Liu, Lin Ma, Kexin Rong, Jianguo Wang, Yingjun Wu, Yongji Wu, Huanchen Zhang, Minjia Zhang, Qizhen Zhang, Tianyi Zhou, Danyang Zhuo

In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures.

Paper
Add Code

Learned Best-Effort LLM Serving

no code implementations • 15 Jan 2024 • Siddharth Jha, Coleman Hooper, Xiaoxuan Liu, Sehoon Kim, Kurt Keutzer

Many applications must provide low-latency LLM service to users or risk unacceptable user experience.

Paper
Add Code

Online Speculative Decoding

no code implementations • 11 Oct 2023 • Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang

We develop a prototype of online speculative decoding based on online knowledge distillation and evaluate it using both synthetic and real query data on several popular LLMs.

Knowledge Distillation

Paper
Add Code

QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources

no code implementations • 11 Oct 2023 • Zhikai Li, Xiaoxuan Liu, Banghua Zhu, Zhen Dong, Qingyi Gu, Kurt Keutzer

Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks.

Quantization

Paper
Add Code

An Evaluation of Memory Optimization Methods for Training Neural Networks

no code implementations • 26 Mar 2023 • Xiaoxuan Liu, Siddharth Jha, Alvin Cheung

To address the challenge, this paper summarizes the scenarios in which MOMs prove advantageous for model training.

Quantization

Paper
Add Code

GACT: Activation Compressed Training for Generic Network Architectures

1 code implementation • 22 Jun 2022 • Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Cheung

Training large neural network (NN) models requires extensive memory resources, and Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint.

Paper
Code

Long-run User Value Optimization in Recommender Systems through Content Creation Modeling

no code implementations • 25 Apr 2022 • Akos Lada, Xiaoxuan Liu, Jens Rischbieth, Yi Wang, Yuwen Zhang

Content recommender systems are generally adept at maximizing immediate user satisfaction but to optimize for the \textit{long-run} user value, we need more statistically sophisticated solutions than off-the-shelf simple recommender algorithms.

BIG-bench Machine Learning Recommendation Systems

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.