no code implementations • 14 Mar 2024 • Aonan Zhang, Chong Wang, Yi Wang, Xuanyu Zhang, Yunfei Cheng
In this paper, we introduce an improved approach of speculative decoding aimed at enhancing the efficiency of serving large language models.