Search Results for author: Zihao Ye

Found 12 papers, 8 papers with code

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

no code implementations • 1 Nov 2023 • Ruihang Lai, Junru Shao, Siyuan Feng, Steven S. Lyubomirsky, Bohan Hou, Wuwei Lin, Zihao Ye, Hongyi Jin, Yuchen Jin, Jiawei Liu, Lesheng Jin, Yaxing Cai, Ziheng Jiang, Yong Wu, Sunghyun Park, Prakalp Srivastava, Jared G. Roesch, Todd C. Mowry, Tianqi Chen

Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models.

Paper
Add Code

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

1 code implementation • 29 Oct 2023 • Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.

Quantization Sentiment Analysis

174

Paper
Code

Punica: Multi-Tenant LoRA Serving

1 code implementation • 28 Oct 2023 • Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy

Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster.

820

Paper
Code

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

2 code implementations • 11 Jul 2022 • Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, Luis Ceze

We propose SparseTIR, a sparse tensor compilation abstraction that offers composable formats and composable transformations for deep learning workloads.

123

Paper
Code

TensorIR: An Abstraction for Automatic Tensorized Program Optimization

2 code implementations • 9 Jul 2022 • Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, Tianqi Chen

Finally, we build an end-to-end framework on top of our abstraction to automatically optimize deep learning models for given tensor computation primitives.

BIG-bench Machine Learning

11,241

Paper
Code

Kokoyi: Executable LaTeX for End-to-end Deep Learning

no code implementations • 29 Sep 2021 • Minjie Wang, Haoming Lu, Yu Gai, Lesheng Jin, Zihao Ye, Zheng Zhang

Despite substantial efforts from the deep learning system community to relieve researchers and practitioners from the burden of implementing models with ever-growing complexity, a considerable lingual gap remains between developing models in the language of mathematics and implementing them in the languages of computer.

Math Translation

Paper
Add Code

FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems

no code implementations • 26 Aug 2020 • Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, Yida Wang

FeatGraph provides a flexible programming interface to express diverse GNN models by composing coarse-grained sparse templates with fine-grained user-defined functions (UDFs) on each vertex/edge.

Paper
Add Code

DGL-KE: Training Knowledge Graph Embeddings at Scale

1 code implementation • 18 Apr 2020 • Da Zheng, Xiang Song, Chao Ma, Zeyuan Tan, Zihao Ye, Jin Dong, Hao Xiong, Zheng Zhang, George Karypis

Experiments on knowledge graphs consisting of over 86M nodes and 338M edges show that DGL-KE can compute embeddings in 100 minutes on an EC2 instance with 8 GPUs and 30 minutes on an EC2 cluster with 4 machines with 48 cores/machine.

Distributed, Parallel, and Cluster Computing

1,240

Paper
Code