Search Results for author: Zhuoming Chen

Found 6 papers, 4 papers with code

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

1 code implementation18 Apr 2024 Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, Beidi Chen

However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length.

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

1 code implementation19 Feb 2024 Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen

This paper introduces Sequoia, a scalable, robust, and hardware-aware algorithm for speculative decoding.

GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism

no code implementations19 Aug 2023 Jingji Chen, Zhuoming Chen, Xuehai Qian

Communication is a key bottleneck for distributed graph neural network (GNN) training.

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

3 code implementations16 May 2023 Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia

Our evaluation shows that SpecInfer outperforms existing LLM serving systems by 1. 5-2. 8x for distributed LLM inference and by 2. 6-3. 5x for offloading-based LLM inference, while preserving the same generative performance.

Language Modelling Large Language Model

Quark: A Gradient-Free Quantum Learning Framework for Classification Tasks

no code implementations2 Oct 2022 Zhihao Zhang, Zhuoming Chen, Heyang Huang, Zhihao Jia

To address the limitations of existing quantum ML methods, we introduce Quark, a gradient-free quantum learning framework that optimizes quantum ML models using quantum optimization.

Edge Detection

Quantized Training of Gradient Boosting Decision Trees

2 code implementations20 Jul 2022 Yu Shi, Guolin Ke, Zhuoming Chen, Shuxin Zheng, Tie-Yan Liu

Recent years have witnessed significant success in Gradient Boosting Decision Trees (GBDT) for a wide range of machine learning applications.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.