Search Results for author: Danyang Zhuo

Found 24 papers, 5 papers with code

Adaptive Skeleton Graph Decoding

no code implementations • 19 Feb 2024 • Shuowei Jin, Yongji Wu, Haizhong Zheng, Qingzhao Zhang, Matthew Lentz, Z. Morley Mao, Atul Prakash, Feng Qian, Danyang Zhuo

Large language models (LLMs) have seen significant adoption for natural language tasks, owing their success to massive numbers of model parameters (e. g., 70B+); however, LLM inference incurs significant computation and memory costs.

Paper
Add Code

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

no code implementations • 17 Jan 2024 • Yao Lu, Song Bian, Lequn Chen, Yongjun He, Yulong Hui, Matthew Lentz, Beibin Li, Fei Liu, Jialin Li, Qi Liu, Rui Liu, Xiaoxuan Liu, Lin Ma, Kexin Rong, Jianguo Wang, Yingjun Wu, Yongji Wu, Huanchen Zhang, Minjia Zhang, Qizhen Zhang, Tianyi Zhou, Danyang Zhuo

In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures.

Paper
Add Code

Curator: Efficient Indexing for Multi-Tenant Vector Databases

no code implementations • 13 Jan 2024 • Yicheng Jin, Yongji Wu, WenJun Hu, Bruce M. Maggs, Xiao Zhang, Danyang Zhuo

Vector databases have emerged as key enablers for bridging intelligent applications with unstructured data, providing generic search and management support for embedding vectors extracted from the raw unstructured data.

Clustering

Paper
Add Code

Fairness in Serving Large Language Models

1 code implementation • 31 Dec 2023 • Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica

High-demand LLM inference services (e. g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading.

Fairness Scheduling

1,461

Paper
Code

Punica: Multi-Tenant LoRA Serving

1 code implementation • 28 Oct 2023 • Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy

Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster.

807

Paper
Code

Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling

no code implementations • 14 Aug 2023 • Lequn Chen, Weixin Deng, Anirudh Canumalla, Yu Xin, Danyang Zhuo, Matthai Philipose, Arvind Krishnamurthy

However, existing model serving systems cannot achieve adequate batch sizes while meeting latency objectives as these systems eagerly dispatch requests to accelerators to minimize the accelerator idle time.

Scheduling

Paper
Add Code

Query Complexity of Active Learning for Function Family With Nearly Orthogonal Basis

no code implementations • 6 Jun 2023 • Xiang Chen, Zhao Song, Baocheng Sun, Junze Yin, Danyang Zhuo

Many machine learning algorithms require large numbers of labeled data to deliver state-of-the-art results.

Active Learning Fraud Detection +2

Paper
Add Code

Adaptive and Dynamic Multi-Resolution Hashing for Pairwise Summations

no code implementations • 21 Dec 2022 • Lianke Qin, Aravind Reddy, Zhao Song, Zhaozhuo Xu, Danyang Zhuo

In this paper, we propose Adam-Hash: an adaptive and dynamic multi-resolution hashing data-structure for fast pairwise summation estimation.

Paper
Add Code

A Faster $k$-means++ Algorithm

no code implementations • 28 Nov 2022 • Jiehao Liang, Somdeb Sarkhel, Zhao Song, Chenbo Yin, Junze Yin, Danyang Zhuo

We propose a new algorithm \textsc{FastKmeans++} that only takes in $\widetilde{O}(nd + nk^2)$ time, in total.

Clustering

Paper
Add Code

Training Overparametrized Neural Networks in Sublinear Time

no code implementations • 9 Aug 2022 • Yichuan Deng, Hang Hu, Zhao Song, Omri Weinstein, Danyang Zhuo

The success of deep learning comes at a tremendous computational and energy cost, and the scalability of training massively overparametrized neural networks is becoming a real barrier to the progress of artificial intelligence (AI).

Paper
Add Code

Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

no code implementations • 8 Aug 2022 • Jiehao Liang, Zhao Song, Zhaozhuo Xu, Junze Yin, Danyang Zhuo

In this work, we focus on the dynamic maintenance of KDE data structures with robustness to adversarial queries.

Density Estimation

Paper
Add Code

Sublinear Time Algorithm for Online Weighted Bipartite Matching

no code implementations • 5 Aug 2022 • Hang Hu, Zhao Song, Runzhou Tao, Zhaozhuo Xu, Junze Yin, Danyang Zhuo

Online bipartite matching is a fundamental problem in online algorithms.

Paper
Add Code

Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

no code implementations • 10 May 2022 • Yongji Wu, Matthew Lentz, Danyang Zhuo, Yao Lu

With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network.

AutoML BIG-bench Machine Learning +5

Paper
Add Code

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

1 code implementation • 28 Jan 2022 • Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica

Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.

2,984

Paper
Code

Fast Graph Neural Tangent Kernel via Kronecker Sketching

no code implementations • 4 Dec 2021 • Shunhua Jiang, Yunze Man, Zhao Song, Zheng Yu, Danyang Zhuo

Given a kernel matrix of $n$ graphs, using sketching in solving kernel regression can reduce the running time to $o(n^3)$.

regression

Paper
Add Code

Sample Complexity of Deep Active Learning

no code implementations • 29 Sep 2021 • Zhao Song, Baocheng Sun, Danyang Zhuo

In this paper, we present the first deep active learning algorithm which has a provable sample complexity.

Active Learning BIG-bench Machine Learning +2

Paper
Add Code

InstaHide’s Sample Complexity When Mixing Two Private Images

no code implementations • 29 Sep 2021 • Baihe Huang, Zhao Song, Runzhou Tao, Ruizhe Zhang, Danyang Zhuo

Inspired by InstaHide challenge [Huang, Song, Li and Arora'20], [Chen, Song and Zhuo'20] recently provides one mathematical formulation of InstaHide attack problem under Gaussian images distribution.

Vocal Bursts Valence Prediction

Paper
Add Code

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

1 code implementation • 16 Feb 2021 • Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica

With this key idea, we design TeraPipe, a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models.

Paper
Code

Graph Neural Network Acceleration via Matrix Dimension Reduction

no code implementations • 1 Jan 2021 • Shunhua Jiang, Yunze Man, Zhao Song, Danyang Zhuo

Theoretically, we present two techniques to speed up GNTK training while preserving the generalization error: (1) We use a novel matrix decoupling method to reduce matrix dimensions during the kernel solving.

Dimensionality Reduction

Paper
Add Code

What Can Phase Retrieval Tell Us About Private Distributed Learning?

no code implementations • ICLR 2021 • Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo

In this work, we examine the security of InstaHide, a scheme recently proposed by \cite{hsla20} for preserving the security of private datasets in the context of distributed learning.

Retrieval

Paper
Add Code

InstaHide's Sample Complexity When Mixing Two Private Images

no code implementations • 24 Nov 2020 • Baihe Huang, Zhao Song, Runzhou Tao, Junze Yin, Ruizhe Zhang, Danyang Zhuo

On the current InstaHide challenge setup, where each InstaHide image is a mixture of two private images, we present a new algorithm to recover all the private images with a provable guarantee and optimal sample complexity.

Vocal Bursts Valence Prediction

Paper
Add Code

On InstaHide, Phase Retrieval, and Sparse Matrix Factorization

no code implementations • 23 Nov 2020 • Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo

In this work, we examine the security of InstaHide, a scheme recently proposed by [Huang, Song, Li and Arora, ICML'20] for preserving the security of private datasets in the context of distributed learning.

Retrieval

Paper
Add Code

Ansor: Generating High-Performance Tensor Programs for Deep Learning

no code implementations • 11 Jun 2020 • Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph E. Gonzalez, Ion Stoica

Ansor can find high-performance programs that are outside the search space of existing state-of-the-art approaches.

Vocal Bursts Intensity Prediction

Paper
Add Code

Hoplite: Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems

1 code implementation • 13 Feb 2020 • Siyuan Zhuang, Zhuohan Li, Danyang Zhuo, Stephanie Wang, Eric Liang, Robert Nishihara, Philipp Moritz, Ion Stoica

Task-based distributed frameworks (e. g., Ray, Dask, Hydro) have become increasingly popular for distributed applications that contain asynchronous and dynamic workloads, including asynchronous gradient descent, reinforcement learning, and model serving.

Distributed Computing reinforcement-learning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.