Search Results for author: Mahesh Marina

Found 1 papers, 1 papers with code

MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving

1 code implementation25 Jan 2024 Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina

This paper presents MoE-Infinity, a cost-efficient mixture-of-expert (MoE) serving system that realizes activation-aware expert offloading.

Cannot find the paper you are looking for? You can Submit a new open access paper.