1 code implementation • 28 Mar 2024 • Pingcheng Dong, Yonghao Tan, Dong Zhang, Tianwei Ni, Xuejiao Liu, Yu Liu, Peng Luo, Luhong Liang, Shih-Yang Liu, Xijie Huang, Huaiyu Zhu, Yun Pan, Fengwei An, Kwang-Ting Cheng
Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs.
4 code implementations • 14 Feb 2024 • Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen
By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead.
no code implementations • 14 Dec 2023 • Chi-Hsuan Wu, Shih-Yang Liu, Xijie Huang, Xingbo Wang, Rong Zhang, Luca Minciullo, Wong Kai Yiu, Kenny Kwan, Kwang-Ting Cheng
We also developed a training mechanism, MocoRank, to handle the intra-class variation, the ordinal relationship between different classes, and the data imbalance problem.
1 code implementation • 25 Oct 2023 • Shih-Yang Liu, Zechun Liu, Xijie Huang, Pingcheng Dong, Kwang-Ting Cheng
Our method, for the first time, can quantize both weights and activations in the LLaMA-13B to only 4-bit and achieves an average score of 63. 1 on the common sense zero-shot reasoning tasks, which is only 5. 8 lower than the full-precision model, significantly outperforming the previous state-of-the-art by 12. 7 points.
1 code implementation • 12 Jun 2023 • Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng
Compared with previous coreset selection methods, our method significantly improves QAT performance with different dataset fractions.
1 code implementation • 4 Feb 2023 • Shih-Yang Liu, Zechun Liu, Kwang-Ting Cheng
In addition, we also found that the interdependence between quantized weights in $\textit{query}$ and $\textit{key}$ of a self-attention layer makes ViT vulnerable to oscillation.