Search Results for author: Hanchen Li

Found 3 papers, 1 papers with code

Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

no code implementations23 Jan 2024 Hanchen Li, YuHan Liu, Yihua Cheng, Siddhant Ray, Kuntai Du, Junchen Jiang

To render each generated token in real time, the LLM server generates response tokens one by one and streams each generated token (or group of a few tokens) through the network to the user right after it is generated, which we refer to as LLM token streaming.

Chatbot

CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving

1 code implementation11 Oct 2023 YuHan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, YuYang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang

Compared to the recent systems that reuse the KV cache, CacheGen reduces the KV cache size by 3. 7-4. 3x and the total delay in fetching and processing contexts by 2. 7-3. 2x while having negligible impact on the LLM response quality in accuracy or perplexity.

Language Modelling Quantization

GRACE: Loss-Resilient Real-Time Video through Neural Codecs

no code implementations21 May 2023 Yihua Cheng, Ziyi Zhang, Hanchen Li, Anton Arapin, Yue Zhang, Qizheng Zhang, YuHan Liu, Xu Zhang, Francis Y. Yan, Amrita Mazumdar, Nick Feamster, Junchen Jiang

In real-time video communication, retransmitting lost packets over high-latency networks is not viable due to strict latency requirements.

Cannot find the paper you are looking for? You can Submit a new open access paper.