Search Results for author: Insu Jang

Found 2 papers, 2 papers with code

Perseus: Removing Energy Bloat from Large Model Training

2 code implementations12 Dec 2023 Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury

Training large AI models on numerous GPUs consumes a massive amount of energy.

Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates

1 code implementation15 Sep 2023 Insu Jang, Zhenning Yang, Zhen Zhang, Xin Jin, Mosharaf Chowdhury

Oobleck enables resilient distributed training of large DNN models with guaranteed fault tolerance.

Cannot find the paper you are looking for? You can Submit a new open access paper.