2 code implementations • 16 Dec 2023 • Yixin Song, Zeyu Mi, Haotong Xie, Haibo Chen
This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference engine on a personal computer (PC) equipped with a single consumer-grade GPU.
Language Modelling Large Language Model