1 code implementation • 2 Nov 2023 • Hanwen Chang, Haihao Shen, Yiyang Cai, Xinyu Ye, Zhenzhong Xu, Wenhua Cheng, Kaokao Lv, Weiwei Zhang, Yintong Lu, Heng Guo
Diffusion models have gained popularity for generating images from textual descriptions.
2 code implementations • 1 Nov 2023 • Haihao Shen, Hanwen Chang, Bo Dong, Yu Luo, Hengyu Meng
Large language models (LLMs) have demonstrated remarkable performance and tremendous potential across a wide range of tasks.
1 code implementation • 28 Jun 2023 • Haihao Shen, Hengyu Meng, Bo Dong, Zhe Wang, Ofir Zafrir, Yi Ding, Yu Luo, Hanwen Chang, Qun Gao, Ziheng Wang, Guy Boudoukh, Moshe Wasserblat
We apply our sparse accelerator on widely-used Transformer-based language models including Bert-Mini, DistilBERT, Bert-Base, and BERT-Large.
1 code implementation • 27 Oct 2022 • Haihao Shen, Ofir Zafrir, Bo Dong, Hengyu Meng, Xinyu Ye, Zhe Wang, Yi Ding, Hanwen Chang, Guy Boudoukh, Moshe Wasserblat
In this work, we propose a new pipeline for creating and running Fast Transformer models on CPUs, utilizing hardware-aware pruning, knowledge distillation, quantization, and our own Transformer inference runtime engine with optimized kernels for sparse and quantized operators.