Search Results for author: Danyang Zhuo

Found 24 papers, 5 papers with code

Adaptive Skeleton Graph Decoding

no code implementations19 Feb 2024 Shuowei Jin, Yongji Wu, Haizhong Zheng, Qingzhao Zhang, Matthew Lentz, Z. Morley Mao, Atul Prakash, Feng Qian, Danyang Zhuo

Large language models (LLMs) have seen significant adoption for natural language tasks, owing their success to massive numbers of model parameters (e. g., 70B+); however, LLM inference incurs significant computation and memory costs.

Curator: Efficient Indexing for Multi-Tenant Vector Databases

no code implementations13 Jan 2024 Yicheng Jin, Yongji Wu, WenJun Hu, Bruce M. Maggs, Xiao Zhang, Danyang Zhuo

Vector databases have emerged as key enablers for bridging intelligent applications with unstructured data, providing generic search and management support for embedding vectors extracted from the raw unstructured data.

Clustering

Fairness in Serving Large Language Models

1 code implementation31 Dec 2023 Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica

High-demand LLM inference services (e. g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading.

Fairness Scheduling

Punica: Multi-Tenant LoRA Serving

1 code implementation28 Oct 2023 Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy

Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster.

Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling

no code implementations14 Aug 2023 Lequn Chen, Weixin Deng, Anirudh Canumalla, Yu Xin, Danyang Zhuo, Matthai Philipose, Arvind Krishnamurthy

However, existing model serving systems cannot achieve adequate batch sizes while meeting latency objectives as these systems eagerly dispatch requests to accelerators to minimize the accelerator idle time.

Scheduling

Adaptive and Dynamic Multi-Resolution Hashing for Pairwise Summations

no code implementations21 Dec 2022 Lianke Qin, Aravind Reddy, Zhao Song, Zhaozhuo Xu, Danyang Zhuo

In this paper, we propose Adam-Hash: an adaptive and dynamic multi-resolution hashing data-structure for fast pairwise summation estimation.

A Faster $k$-means++ Algorithm

no code implementations28 Nov 2022 Jiehao Liang, Somdeb Sarkhel, Zhao Song, Chenbo Yin, Junze Yin, Danyang Zhuo

We propose a new algorithm \textsc{FastKmeans++} that only takes in $\widetilde{O}(nd + nk^2)$ time, in total.

Clustering

Training Overparametrized Neural Networks in Sublinear Time

no code implementations9 Aug 2022 Yichuan Deng, Hang Hu, Zhao Song, Omri Weinstein, Danyang Zhuo

The success of deep learning comes at a tremendous computational and energy cost, and the scalability of training massively overparametrized neural networks is becoming a real barrier to the progress of artificial intelligence (AI).

Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

no code implementations8 Aug 2022 Jiehao Liang, Zhao Song, Zhaozhuo Xu, Junze Yin, Danyang Zhuo

In this work, we focus on the dynamic maintenance of KDE data structures with robustness to adversarial queries.

Density Estimation

Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

no code implementations10 May 2022 Yongji Wu, Matthew Lentz, Danyang Zhuo, Yao Lu

With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network.

AutoML BIG-bench Machine Learning +5

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

1 code implementation28 Jan 2022 Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica

Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.

Fast Graph Neural Tangent Kernel via Kronecker Sketching

no code implementations4 Dec 2021 Shunhua Jiang, Yunze Man, Zhao Song, Zheng Yu, Danyang Zhuo

Given a kernel matrix of $n$ graphs, using sketching in solving kernel regression can reduce the running time to $o(n^3)$.

regression

Sample Complexity of Deep Active Learning

no code implementations29 Sep 2021 Zhao Song, Baocheng Sun, Danyang Zhuo

In this paper, we present the first deep active learning algorithm which has a provable sample complexity.

Active Learning BIG-bench Machine Learning +2

InstaHide’s Sample Complexity When Mixing Two Private Images

no code implementations29 Sep 2021 Baihe Huang, Zhao Song, Runzhou Tao, Ruizhe Zhang, Danyang Zhuo

Inspired by InstaHide challenge [Huang, Song, Li and Arora'20], [Chen, Song and Zhuo'20] recently provides one mathematical formulation of InstaHide attack problem under Gaussian images distribution.

Vocal Bursts Valence Prediction

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

1 code implementation16 Feb 2021 Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica

With this key idea, we design TeraPipe, a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models.

Graph Neural Network Acceleration via Matrix Dimension Reduction

no code implementations1 Jan 2021 Shunhua Jiang, Yunze Man, Zhao Song, Danyang Zhuo

Theoretically, we present two techniques to speed up GNTK training while preserving the generalization error: (1) We use a novel matrix decoupling method to reduce matrix dimensions during the kernel solving.

Dimensionality Reduction

What Can Phase Retrieval Tell Us About Private Distributed Learning?

no code implementations ICLR 2021 Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo

In this work, we examine the security of InstaHide, a scheme recently proposed by \cite{hsla20} for preserving the security of private datasets in the context of distributed learning.

Retrieval

InstaHide's Sample Complexity When Mixing Two Private Images

no code implementations24 Nov 2020 Baihe Huang, Zhao Song, Runzhou Tao, Junze Yin, Ruizhe Zhang, Danyang Zhuo

On the current InstaHide challenge setup, where each InstaHide image is a mixture of two private images, we present a new algorithm to recover all the private images with a provable guarantee and optimal sample complexity.

Vocal Bursts Valence Prediction

On InstaHide, Phase Retrieval, and Sparse Matrix Factorization

no code implementations23 Nov 2020 Sitan Chen, Xiaoxiao Li, Zhao Song, Danyang Zhuo

In this work, we examine the security of InstaHide, a scheme recently proposed by [Huang, Song, Li and Arora, ICML'20] for preserving the security of private datasets in the context of distributed learning.

Retrieval

Hoplite: Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems

1 code implementation13 Feb 2020 Siyuan Zhuang, Zhuohan Li, Danyang Zhuo, Stephanie Wang, Eric Liang, Robert Nishihara, Philipp Moritz, Ion Stoica

Task-based distributed frameworks (e. g., Ray, Dask, Hydro) have become increasingly popular for distributed applications that contain asynchronous and dynamic workloads, including asynchronous gradient descent, reinforcement learning, and model serving.

Distributed Computing reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.