Search Results for author: Shigang Li

Found 16 papers, 10 papers with code

TRANSOM: An Efficient Fault-Tolerant System for Training LLMs

1 code implementation16 Oct 2023 Baodong Wu, Lei Xia, Qingping Li, Kangyu Li, Xu Chen, Yongqiang Guo, Tieyao Xiang, YuHeng Chen, Shigang Li

As a result, A substantial amount of training time is devoted to task checkpoint saving and loading, task rescheduling and restart, and task manual anomaly checks, which greatly harms the overall training efficiency.

Anomaly Detection

Co-design Hardware and Algorithm for Vector Search

1 code implementation19 Jun 2023 Wenqi Jiang, Shigang Li, Yu Zhu, Johannes De Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso

Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents.

Information Retrieval Retrieval

ASDL: A Unified Interface for Gradient Preconditioning in PyTorch

2 code implementations8 May 2023 Kazuki Osawa, Satoki Ishikawa, Rio Yokota, Shigang Li, Torsten Hoefler

Gradient preconditioning is a key technique to integrate the second-order information into gradients for improving and extending gradient-based learning algorithms.

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices

1 code implementation25 Nov 2022 Kazuki Osawa, Shigang Li, Torsten Hoefler

Pipeline parallelism enables efficient training of Large Language Models (LLMs) on large-scale distributed accelerator clusters.

Efficient Quantized Sparse Matrix Operations on Tensor Cores

1 code implementation14 Sep 2022 Shigang Li, Kazuki Osawa, Torsten Hoefler

We propose Magicube, a high-performance sparse-matrix library for low-precision integers on Tensor cores.

Quantization

HammingMesh: A Network Topology for Large-Scale Deep Learning

no code implementations3 Sep 2022 Torsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Shigang Li, Marco Heddes, Jon Belk, Deepak Goel, Miguel Castro, Steve Scott

Numerous microarchitectural optimizations unlocked tremendous processing power for deep neural networks that in turn fueled the AI revolution.

Scheduling

Near-Optimal Sparse Allreduce for Distributed Deep Learning

1 code implementation19 Jan 2022 Shigang Li, Torsten Hoefler

However, it is very challenging to obtain real performance improvement because of (1) the difficulty of achieving an scalable and efficient sparse allreduce algorithm and (2) the sparsification overhead.

A Data-Centric Optimization Framework for Machine Learning

1 code implementation20 Oct 2021 Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler

Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a dramatically growing demand for compute.

BIG-bench Machine Learning

Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

1 code implementation14 Jul 2021 Shigang Li, Torsten Hoefler

For a GPT-2 model with 1. 3 billion parameters running on 2, 048 GPU nodes of the Piz Daint supercomputer, Chimera improves the training throughput by 1. 16x-2. 34x over the state-of-the-art synchronous and asynchronous pipeline approaches.

Scheduling

Deep Learning for Post-Processing Ensemble Weather Forecasts

1 code implementation18 May 2020 Peter Grönquist, Chengyuan Yao, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Shigang Li, Torsten Hoefler

Applied to global data, our mixed models achieve a relative improvement in ensemble forecast skill (CRPS) of over 14%.

Weather Forecasting

Predicting Weather Uncertainty with Deep Convnets

no code implementations2 Nov 2019 Peter Grönquist, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Luca Lavarini, Shigang Li, Torsten Hoefler

Modern weather forecast models perform uncertainty quantification using ensemble prediction systems, which collect nonparametric statistics based on multiple perturbed simulations.

Uncertainty Quantification Weather Forecasting

Asynchronous Decentralized SGD with Quantized and Local Updates

no code implementations NeurIPS 2021 Giorgi Nadiradze, Amirmojtaba Sabour, Peter Davies, Shigang Li, Dan Alistarh

Perhaps surprisingly, we show that a variant of SGD called \emph{SwarmSGD} still converges in this setting, even if \emph{non-blocking communication}, \emph{quantization}, and \emph{local steps} are all applied \emph{in conjunction}, and even if the node data distributions and underlying graph topology are both \emph{heterogenous}.

Blocking Distributed Optimization +2

Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations

no code implementations12 Aug 2019 Shigang Li, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, Torsten Hoefler

Load imbalance pervasively exists in distributed deep learning training systems, either caused by the inherent imbalance in learned tasks or by the system itself.

Cannot find the paper you are looking for? You can Submit a new open access paper.