Search Results for author: Shenggui Li

Although the AI community has expanded the model scale to the trillion parameter level, the practical deployment of 10-100 billion parameter models is still uncertain due to the latency, throughput, and memory constraints.

Blocking

Paper
Add Code

A Frequency-aware Software Cache for Large Recommendation System Embeddings

1 code implementation • 8 Aug 2022 • Jiarui Fang, Geng Zhang, Jiatong Han, Shenggui Li, Zhengda Bian, Yongbin Li, Jin Liu, Yang You

Deep learning recommendation models (DLRMs) have been widely applied in Internet companies.

Paper
Code

Sky Computing: Accelerating Geo-distributed Computing in Federated Learning

1 code implementation • 24 Feb 2022 • Jie Zhu, Shenggui Li, Yang You

In this paper, we proposed Sky Computing, a load-balanced model parallelism framework to adaptively allocate the weights to devices.

Distributed Computing Federated Learning

Paper
Code

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

1 code implementation • 28 Oct 2021 • Shenggui Li, Hongxin Liu, Zhengda Bian, Jiarui Fang, Haichen Huang, Yuliang Liu, Boxiang Wang, Yang You

The success of Transformer models has pushed the deep learning model scale to billions of parameters.

37,896

Paper
Code

PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management

1 code implementation • 12 Aug 2021 • Jiarui Fang, Zilin Zhu, Shenggui Li, Hui Su, Yang Yu, Jie zhou, Yang You

PatrickStar uses the CPU-GPU heterogeneous memory space to store the model data.

Management

739

Paper
Code

Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters

no code implementations • 8 Aug 2021 • Zhengda Bian, Shenggui Li, Wei Wang, Yang You

ONES automatically manages the elasticity of each job based on the training batch size, so as to maximize GPU utilization and improve scheduling efficiency.

Scheduling

Paper
Add Code

Sequence Parallelism: Long Sequence Training from System Perspective

no code implementations • 26 May 2021 • Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li, Yang You

That is, with sparse attention, our sequence parallelism enables us to train transformer with infinite long sequence.

Paper
Add Code

An Efficient 2D Method for Training Super-Large Deep Learning Models

1 code implementation • 12 Apr 2021 • Qifan Xu, Shenggui Li, Chaoyu Gong, Yang You

However, due to memory constraints, model parallelism must be utilized to host large models that would otherwise not fit into the memory of a single device.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.