Search Results for author: Mengzhao Chen

Found 12 papers, 10 papers with code

Adapting LLaMA Decoder to Vision Transformer

no code implementations • 10 Apr 2024 • Jiahao Wang, Wenqi Shao, Mengzhao Chen, Chengyue Wu, Yong liu, Kaipeng Zhang, Songyang Zhang, Kai Chen, Ping Luo

We first "LLaMAfy" a standard ViT step-by-step to align with LLaMA's architecture, and find that directly applying a casual mask to the self-attention brings an attention collapse issue, resulting in the failure to the network training.

Computational Efficiency Decoder +2

Paper
Add Code

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

2 code implementations • 18 Feb 2024 • Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo

Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc.

Question Answering Text Summarization

Paper
Code

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

1 code implementation • 16 Nov 2023 • Yunshan Zhong, Jiawei Hu, Mingbao Lin, Mengzhao Chen, Rongrong Ji

Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training & inference) undermine their position in industrial applications.

Quantization

Paper
Code

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

2 code implementations • 25 Aug 2023 • Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo

LWC modulates the extreme values of weights by optimizing the clipping threshold.

Common Sense Reasoning Computational Efficiency +3

573

Paper
Code

Spatial Re-parameterization for N:M Sparsity

no code implementations • 9 Jun 2023 • Yuxin Zhang, Mingbao Lin, Yunshan Zhong, Mengzhao Chen, Fei Chao, Rongrong Ji

This paper presents a Spatial Re-parameterization (SpRe) method for the N:M sparsity in CNNs.

Paper
Add Code

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

1 code implementation • ICCV 2023 • Mengzhao Chen, Wenqi Shao, Peng Xu, Mingbao Lin, Kaipeng Zhang, Fei Chao, Rongrong Ji, Yu Qiao, Ping Luo

Token compression aims to speed up large-scale vision transformers (e. g. ViTs) by pruning (dropping) or merging tokens.

Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-S)

Efficient ViTs

Paper
Code

MultiQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization

1 code implementation • 14 May 2023 • Yunshan Zhong, Mingbao Lin, Yuyao Zhou, Mengzhao Chen, Yuxin Zhang, Fei Chao, Rongrong Ji

However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by frequent bit-width switching of weights and activations, leading to limited performance.

Quantization

Paper
Code

SMMix: Self-Motivated Image Mixing for Vision Transformers

1 code implementation • ICCV 2023 • Mengzhao Chen, Mingbao Lin, Zhihang Lin, Yuxin Zhang, Fei Chao, Rongrong Ji

Due to the subtle designs of the self-motivated paradigm, our SMMix is significant in its smaller training overhead and better performance than other CutMix variants.

Paper
Code

Super Vision Transformer

1 code implementation • 23 May 2022 • Mingbao Lin, Mengzhao Chen, Yuxin Zhang, Chunhua Shen, Rongrong Ji, Liujuan Cao

Experimental results on ImageNet demonstrate that our SuperViT can considerably reduce the computational costs of ViT models with even performance increase.

Paper
Code

CF-ViT: A General Coarse-to-Fine Method for Vision Transformer

1 code implementation • 8 Mar 2022 • Mengzhao Chen, Mingbao Lin, Ke Li, Yunhang Shen, Yongjian Wu, Fei Chao, Rongrong Ji

Our proposed CF-ViT is motivated by two important observations in modern ViT models: (1) The coarse-grained patch splitting can locate informative regions of an input image.

Paper
Code

OptG: Optimizing Gradient-driven Criteria in Network Sparsity

1 code implementation • 30 Jan 2022 • Yuxin Zhang, Mingbao Lin, Mengzhao Chen, Fei Chao, Rongrong Ji

We prove that supermask training is to accumulate the criteria of gradient-driven sparsity for both removed and preserved weights, and it can partly solve the independence paradox.

Paper
Code

Fine-grained Data Distribution Alignment for Post-Training Quantization

1 code implementation • 9 Sep 2021 • Yunshan Zhong, Mingbao Lin, Mengzhao Chen, Ke Li, Yunhang Shen, Fei Chao, Yongjian Wu, Rongrong Ji

While post-training quantization receives popularity mostly due to its evasion in accessing the original complete training dataset, its poor performance also stems from scarce images.

Quantization

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.