no code implementations • 15 Mar 2024 • Hyungjun Oh, Kihong Kim, JaeMin Kim, Sungkyun Kim, Junyeol Lee, Du-Seong Chang, Jiwon Seo
This paper presents ExeGPT, a distributed system designed for constraint-aware LLM inference.
Scheduling