no code implementations • 3 Feb 2024 • Cunxiao Du, Jing Jiang, Xu Yuanchen, Jiawei Wu, Sicheng Yu, Yongqi Li, Shenggui Li, Kai Xu, Liqiang Nie, Zhaopeng Tu, Yang You
Speculative decoding is a relatively new decoding framework that leverages small and efficient draft models to reduce the latency of LLMs.