Search Results for author: Qingyi Gu

Found 14 papers, 7 papers with code

LLM Inference Unveiled: Survey and Roofline Model Insights

2 code implementations26 Feb 2024 Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer

Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model for systematic analysis of LLM inference techniques.

Knowledge Distillation Language Modelling +3

RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization

no code implementations8 Feb 2024 Zhikai Li, Xuewen Liu, Jing Zhang, Qingyi Gu

In particular, for the former, we introduce a learnable per-channel dual clipping scheme, which is designed to efficiently identify outliers in the unbalanced activations with fine granularity.

Quantization

Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models

1 code implementation9 Jan 2024 Xuewen Liu, Zhikai Li, Junrui Xiao, Qingyi Gu

Specifically, at the calibration sample level, we select calibration samples based on the density and diversity in the latent space, thus facilitating the alignment of their distribution with the overall samples; and at the reconstruction output level, we propose Fine-grained Block Reconstruction, which can align the outputs of the quantized model and the full-precision model at different network granularity.

Denoising Image Generation +2

QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources

no code implementations11 Oct 2023 Zhikai Li, Xiaoxuan Liu, Banghua Zhu, Zhen Dong, Qingyi Gu, Kurt Keutzer

Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks.

Quantization

BinaryViT: Towards Efficient and Accurate Binary Vision Transformers

no code implementations24 May 2023 Junrui Xiao, Zhikai Li, Lianwei Yang, Qingyi Gu

In this paper, we first argue empirically that the severe performance degradation is mainly caused by the weight oscillation in the binarization training and the information distortion in the activation of ViTs.

Binarization Quantization

Patch-wise Mixed-Precision Quantization of Vision Transformer

no code implementations11 May 2023 Junrui Xiao, Zhikai Li, Lianwei Yang, Qingyi Gu

As emerging hardware begins to support mixed bit-width arithmetic computation, mixed-precision quantization is widely used to reduce the complexity of neural networks.

Quantization

RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

1 code implementation ICCV 2023 Zhikai Li, Junrui Xiao, Lianwei Yang, Qingyi Gu

Post-training quantization (PTQ), which only requires a tiny dataset for calibration without end-to-end retraining, is a light and practical model compression technique.

Model Compression Quantization

PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers

1 code implementation13 Sep 2022 Zhikai Li, Mengjuan Chen, Junrui Xiao, Qingyi Gu

In this paper, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs, built on top of PSAQ-ViT.

Data Free Quantization Image Classification +4

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

1 code implementation ICCV 2023 Zhikai Li, Qingyi Gu

In this paper, we propose I-ViT, an integer-only quantization scheme for ViTs, to enable ViTs to perform the entire computational graph of inference with integer arithmetic and bit-shifting, and without any floating-point arithmetic.

Quantization

Patch Similarity Aware Data-Free Quantization for Vision Transformers

1 code implementation4 Mar 2022 Zhikai Li, Liping Ma, Mengjuan Chen, Junrui Xiao, Qingyi Gu

The above insights guide us to design a relative value metric to optimize the Gaussian noise to approximate the real images, which are then utilized to calibrate the quantization parameters.

Data Free Quantization

Angle-based Search Space Shrinking for Neural Architecture Search

1 code implementation ECCV 2020 Yiming Hu, Yuding Liang, Zichao Guo, Ruosi Wan, Xiangyu Zhang, Yichen Wei, Qingyi Gu, Jian Sun

Comprehensive experiments show that ABS can dramatically enhance existing NAS approaches by providing a promising shrunk search space.

Neural Architecture Search

Multi-loss-aware Channel Pruning of Deep Networks

no code implementations27 Feb 2019 Yiming Hu, Siyang Sun, Jianquan Li, Jiagang Zhu, Xingang Wang, Qingyi Gu

Particularly, we introduce an additional loss to encode the differences in the feature and semantic distributions within feature maps between the baseline model and the pruned one.

General Classification

Cluster Regularized Quantization for Deep Networks Compression

no code implementations27 Feb 2019 Yiming Hu, Jianquan Li, Xianlei Long, Shenhua Hu, Jiagang Zhu, Xingang Wang, Qingyi Gu

Deep neural networks (DNNs) have achieved great success in a wide range of computer vision areas, but the applications to mobile devices is limited due to their high storage and computational cost.

Quantization

A novel channel pruning method for deep neural network compression

no code implementations29 May 2018 Yiming Hu, Siyang Sun, Jianquan Li, Xingang Wang, Qingyi Gu

In order to accelerate the selection process, the proposed method formulates it as a search problem, which can be solved efficiently by genetic algorithm.

Combinatorial Optimization Knowledge Distillation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.