no code implementations • 1 Nov 2023 • Ruihang Lai, Junru Shao, Siyuan Feng, Steven S. Lyubomirsky, Bohan Hou, Wuwei Lin, Zihao Ye, Hongyi Jin, Yuchen Jin, Jiawei Liu, Lesheng Jin, Yaxing Cai, Ziheng Jiang, Yong Wu, Sunghyun Park, Prakalp Srivastava, Jared G. Roesch, Todd C. Mowry, Tianqi Chen
Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models.
1 code implementation • 29 Oct 2023 • Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci
To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.
1 code implementation • 28 Oct 2023 • Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy
Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster.
2 code implementations • 11 Jul 2022 • Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, Luis Ceze
We propose SparseTIR, a sparse tensor compilation abstraction that offers composable formats and composable transformations for deep learning workloads.
2 code implementations • 9 Jul 2022 • Siyuan Feng, Bohan Hou, Hongyi Jin, Wuwei Lin, Junru Shao, Ruihang Lai, Zihao Ye, Lianmin Zheng, Cody Hao Yu, Yong Yu, Tianqi Chen
Finally, we build an end-to-end framework on top of our abstraction to automatically optimize deep learning models for given tensor computation primitives.
no code implementations • 29 Sep 2021 • Minjie Wang, Haoming Lu, Yu Gai, Lesheng Jin, Zihao Ye, Zheng Zhang
Despite substantial efforts from the deep learning system community to relieve researchers and practitioners from the burden of implementing models with ever-growing complexity, a considerable lingual gap remains between developing models in the language of mathematics and implementing them in the languages of computer.
no code implementations • 26 Aug 2020 • Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, Yida Wang
FeatGraph provides a flexible programming interface to express diverse GNN models by composing coarse-grained sparse templates with fine-grained user-defined functions (UDFs) on each vertex/edge.
1 code implementation • 18 Apr 2020 • Da Zheng, Xiang Song, Chao Ma, Zeyuan Tan, Zihao Ye, Jin Dong, Hao Xiong, Zheng Zhang, George Karypis
Experiments on knowledge graphs consisting of over 86M nodes and 338M edges show that DGL-KE can compute embeddings in 100 minutes on an EC2 instance with 8 GPUs and 30 minutes on an EC2 cluster with 4 machines with 48 cores/machine.
Distributed, Parallel, and Cluster Computing
1 code implementation • 14 Feb 2020 • Chenguang Wang, Zihao Ye, Aston Zhang, Zheng Zhang, Alexander J. Smola
Transformer has been widely used thanks to its ability to capture sequence information in an efficient way.
2 code implementations • 11 Nov 2019 • Zihao Ye, Qipeng Guo, Quan Gan, Xipeng Qiu, Zheng Zhang
The Transformer model is widely successful on many natural language processing tasks.
Ranked #1 on Machine Translation on IWSLT2015 Chinese-English
7 code implementations • 3 Sep 2019 • Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, Zheng Zhang
Advancing research in the emerging field of deep graph learning requires new tools to support tensor computation over graphs.
Ranked #35 on Node Classification on Cora
no code implementations • EACL 2017 • Lu Chen, Runzhe Yang, Cheng Chang, Zihao Ye, Xiang Zhou, Kai Yu
On-line dialogue policy learning is the key for building evolvable conversational agent in real world scenarios.