Search Results for author: Yunfei Cheng

Recurrent Drafter for Fast Speculative Decoding in Large Language Models

In this paper, we introduce an improved approach of speculative decoding aimed at enhancing the efficiency of serving large language models.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.