Search Results for author: Zhuoming Chen

Found 6 papers, 4 papers with code

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

1 code implementation • 18 Apr 2024 • Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, Beidi Chen

However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length.

109

Paper
Code

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

1 code implementation • 19 Feb 2024 • Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen

This paper introduces Sequoia, a scalable, robust, and hardware-aware algorithm for speculative decoding.

202

Paper
Code

GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism

no code implementations • 19 Aug 2023 • Jingji Chen, Zhuoming Chen, Xuehai Qian

Communication is a key bottleneck for distributed graph neural network (GNN) training.

Paper
Add Code

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

3 code implementations • 16 May 2023 • Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia

Our evaluation shows that SpecInfer outperforms existing LLM serving systems by 1. 5-2. 8x for distributed LLM inference and by 2. 6-3. 5x for offloading-based LLM inference, while preserving the same generative performance.

Language Modelling Large Language Model

1,521

Paper
Code

Quark: A Gradient-Free Quantum Learning Framework for Classification Tasks

no code implementations • 2 Oct 2022 • Zhihao Zhang, Zhuoming Chen, Heyang Huang, Zhihao Jia

To address the limitations of existing quantum ML methods, we introduce Quark, a gradient-free quantum learning framework that optimizes quantum ML models using quantum optimization.

Edge Detection

Paper
Add Code

Quantized Training of Gradient Boosting Decision Trees

2 code implementations • 20 Jul 2022 • Yu Shi, Guolin Ke, Zhuoming Chen, Shuxin Zheng, Tie-Yan Liu

Recent years have witnessed significant success in Gradient Boosting Decision Trees (GBDT) for a wide range of machine learning applications.

Quantization

16,072

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.