Search Results for author: Chenhao Xue

Found 4 papers, 3 papers with code

LLM Inference Unveiled: Survey and Roofline Model Insights

2 code implementations26 Feb 2024 Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer

Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model for systematic analysis of LLM inference techniques.

Knowledge Distillation Language Modelling +3

Latency-aware Spatial-wise Dynamic Networks

2 code implementations12 Oct 2022 Yizeng Han, Zhihang Yuan, Yifan Pu, Chenhao Xue, Shiji Song, Guangyu Sun, Gao Huang

The latency prediction model can efficiently estimate the inference latency of dynamic networks by simultaneously considering algorithms, scheduling strategies, and hardware properties.

Image Classification Instance Segmentation +4

PTQ4ViT: Post-Training Quantization Framework for Vision Transformers with Twin Uniform Quantization

1 code implementation24 Nov 2021 Zhihang Yuan, Chenhao Xue, Yiqi Chen, Qiang Wu, Guangyu Sun

We observe the distributions of activation values after softmax and GELU functions are quite different from the Gaussian distribution.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.