Search Results for author: Insu Jang

Found 2 papers, 2 papers with code

Perseus: Removing Energy Bloat from Large Model Training

2 code implementations • 12 Dec 2023 • Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury

Training large AI models on numerous GPUs consumes a massive amount of energy.

129

Paper
Code

Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates

1 code implementation • 15 Sep 2023 • Insu Jang, Zhenning Yang, Zhen Zhang, Xin Jin, Mosharaf Chowdhury

Oobleck enables resilient distributed training of large DNN models with guaranteed fault tolerance.

65

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.