Search Results for author: Shenggui Li

Found 11 papers, 7 papers with code

GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding

no code implementations3 Feb 2024 Cunxiao Du, Jing Jiang, Xu Yuanchen, Jiawei Wu, Sicheng Yu, Yongqi Li, Shenggui Li, Kai Xu, Liqiang Nie, Zhaopeng Tu, Yang You

Speculative decoding is a relatively new decoding framework that leverages small and efficient draft models to reduce the latency of LLMs.

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models

1 code implementation6 Feb 2023 Yuliang Liu, Shenggui Li, Jiarui Fang, Yanjun Shao, Boyuan Yao, Yang You

To address these challenges, we introduce a system that can jointly optimize distributed execution and gradient checkpointing plans.

Scheduling

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models

no code implementations6 Sep 2022 Jiangsu Du, Ziming Liu, Jiarui Fang, Shenggui Li, Yongbin Li, Yutong Lu, Yang You

Although the AI community has expanded the model scale to the trillion parameter level, the practical deployment of 10-100 billion parameter models is still uncertain due to the latency, throughput, and memory constraints.

Blocking

A Frequency-aware Software Cache for Large Recommendation System Embeddings

1 code implementation8 Aug 2022 Jiarui Fang, Geng Zhang, Jiatong Han, Shenggui Li, Zhengda Bian, Yongbin Li, Jin Liu, Yang You

Deep learning recommendation models (DLRMs) have been widely applied in Internet companies.

Sky Computing: Accelerating Geo-distributed Computing in Federated Learning

1 code implementation24 Feb 2022 Jie Zhu, Shenggui Li, Yang You

In this paper, we proposed Sky Computing, a load-balanced model parallelism framework to adaptively allocate the weights to devices.

Distributed Computing Federated Learning

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

1 code implementation28 Oct 2021 Shenggui Li, Hongxin Liu, Zhengda Bian, Jiarui Fang, Haichen Huang, Yuliang Liu, Boxiang Wang, Yang You

The success of Transformer models has pushed the deep learning model scale to billions of parameters.

Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters

no code implementations8 Aug 2021 Zhengda Bian, Shenggui Li, Wei Wang, Yang You

ONES automatically manages the elasticity of each job based on the training batch size, so as to maximize GPU utilization and improve scheduling efficiency.

Scheduling

Sequence Parallelism: Long Sequence Training from System Perspective

no code implementations26 May 2021 Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li, Yang You

That is, with sparse attention, our sequence parallelism enables us to train transformer with infinite long sequence.

An Efficient 2D Method for Training Super-Large Deep Learning Models

1 code implementation12 Apr 2021 Qifan Xu, Shenggui Li, Chaoyu Gong, Yang You

However, due to memory constraints, model parallelism must be utilized to host large models that would otherwise not fit into the memory of a single device.

Cannot find the paper you are looking for? You can Submit a new open access paper.