Search Results for author: Luo Mai

Found 13 papers, 10 papers with code

MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving

1 code implementation • 25 Jan 2024 • Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina

This paper presents MoE-Infinity, a cost-efficient mixture-of-expert (MoE) serving system that realizes activation-aware expert offloading.

Paper
Code

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

no code implementations • 25 Jan 2024 • Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai

This paper presents ServerlessLLM, a locality-enhanced serverless inference system for Large Language Models (LLMs).

Paper
Add Code

Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections

no code implementations • 8 Dec 2023 • Marcel Wagenländer, Guo Li, Bo Zhao, Luo Mai, Peter Pietzuch

After a GPU change, Scalai uses the PTC to transform the job state: the PTC repartitions the dataset state under data parallelism and exposes it to DL workers through a virtual file system; and the PTC obtains the model state as partitioned checkpoints and transforms them to reflect the new parallelization configuration.

Paper
Add Code

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

1 code implementation • 8 Oct 2023 • Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai

This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers).

Reinforcement Learning (RL)

Paper
Code

Large Sequence Models for Sequential Decision-Making: A Survey

no code implementations • 24 Jun 2023 • Muning Wen, Runji Lin, Hanjing Wang, Yaodong Yang, Ying Wen, Luo Mai, Jun Wang, Haifeng Zhang, Weinan Zhang

Transformer architectures have facilitated the development of large-scale and general-purpose sequence models for prediction tasks in natural language processing and computer vision, e. g., GPT-3 and Swin Transformer.

Decision Making

Paper
Add Code

Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness

1 code implementation • 18 May 2023 • Zeyuan Tan, Xiulong Yuan, Congjie He, Man-Kit Sit, Guo Li, Xiaoze Liu, Baole Ai, Kai Zeng, Peter Pietzuch, Luo Mai

Quiver's key idea is to exploit workload metrics for predicting the irregular computation of GNN requests, and governing the use of GPUs for graph sampling and feature aggregation: (1) for graph sampling, Quiver calculates the probabilistic sampled graph size, a metric that predicts the degree of parallelism in graph sampling.

Graph Sampling

283

Paper
Code

TorchOpt: An Efficient Library for Differentiable Optimization

1 code implementation • 13 Nov 2022 • Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang

TorchOpt further provides a high-performance distributed execution runtime.

503

Paper
Code

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning

1 code implementation • 31 Dec 2021 • Xidong Feng, Bo Liu, Jie Ren, Luo Mai, Rui Zhu, Haifeng Zhang, Jun Wang, Yaodong Yang

Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations.

Atari Games Meta Reinforcement Learning +3

Paper
Code

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment

1 code implementation • 2 Dec 2021 • Jie Ren, Wenteng Liang, Ran Yan, Luo Mai, Shiwen Liu, Xiao Liu

Large-scale Bundle Adjustment (BA) requires massive memory and computation resources which are difficult to be fulfilled by existing BA libraries.

432

Paper
Code

Fast and Flexible Human Pose Estimation with HyperPose

1 code implementation • 26 Aug 2021 • Yixiao Guo, Jiawei Liu, Guo Li, Luo Mai, Hao Dong

When it comes to customising these algorithms for real-world applications, none of the existing libraries can offer both the flexibility of developing custom pose estimation algorithms and the high-performance of executing these algorithms on commodity devices.

Pose Estimation

1,242

Paper
Code

Efficient Reinforcement Learning Development with RLzoo

1 code implementation • 18 Sep 2020 • Zihan Ding, Tianyang Yu, Yanhua Huang, Hongming Zhang, Guo Li, Quancheng Guo, Luo Mai, Hao Dong

RLzoo provides developers with (i) high-level yet flexible APIs for prototyping DRL agents, and further customising the agents for best performance, (ii) a model zoo where users can import a wide range of DRL agents and easily compare their performance, and (iii) an algorithm that can automatically construct DRL agents with custom components (which are critical to improve agent's performance in custom applications).

reinforcement-learning Reinforcement Learning (RL)

617

Paper
Code

CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers

1 code implementation • 8 Jan 2019 • Alexandros Koliousis, Pijika Watcharapichat, Matthias Weidlich, Luo Mai, Paolo Costa, Peter Pietzuch

Systems such as TensorFlow and Caffe2 train models with parallel synchronous stochastic gradient descent: they process a batch of training data at a time, partitioned across GPUs, and average the resulting partial gradients to obtain an updated global model.

Paper
Code

TensorLayer: A Versatile Library for Efficient Deep Learning Development

2 code implementations • 26 Jul 2017 • Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, Simiao Yu, Yike Guo

Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others.

Management

7,298

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.