1 code implementation • 25 Jan 2024 • Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina
This paper presents MoE-Infinity, a cost-efficient mixture-of-expert (MoE) serving system that realizes activation-aware expert offloading.