Search Results for author: Juntao Zhao

Found 3 papers, 3 papers with code

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization

1 code implementation • 2 Mar 2024 • Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Chuan Wu

The immense sizes of LLMs have led to very high resource demand and cost for running the models.

Paper
Code

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

1 code implementation • 16 Nov 2023 • Hanpeng Hu, Junwei Su, Juntao Zhao, Yanghua Peng, Yibo Zhu, Haibin Lin, Chuan Wu

Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices.

Domain Adaptation

Paper
Code

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

1 code implementation • 2 Jun 2023 • Borui Wan, Juntao Zhao, Chuan Wu

Distributed full-graph training of Graph Neural Networks (GNNs) over large graphs is bandwidth-demanding and time-consuming.

Quantization

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.