no code implementations • 13 Nov 2023 • Ziwei He, Jian Yuan, Le Zhou, Jingwen Leng, Bo Jiang
The quadratic complexity of self-attention in Transformers has hindered the processing of long text.
no code implementations • 16 Aug 2023 • Shuwen Lu, Zhihui Zhang, Cong Guo, Jingwen Leng, Yangjie Zhou, Minyi Guo
However, designing GNN accelerators faces two fundamental challenges: the high bandwidth requirement of GNN models and the diversity of GNN models.
no code implementations • 27 May 2023 • Yangjie Zhou, Yaoxu Song, Jingwen Leng, Zihan Liu, Weihao Cui, Zhendong Zhang, Cong Guo, Quan Chen, Li Li, Minyi Guo
Graph neural networks (GNNs) are powerful tools for exploring and learning from graph structures and features.
1 code implementation • 24 May 2023 • Ziwei He, Meng Yang, Minwei Feng, Jingcheng Yin, Xinbing Wang, Jingwen Leng, Zhouhan Lin
Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models.
Ranked #1 on Open-Domain Question Answering on ELI5
no code implementations • 22 Sep 2022 • Cong Guo, Yuxian Qiu, Jingwen Leng, Chen Zhang, Ying Cao, Quanlu Zhang, Yunxin Liu, Fan Yang, Minyi Guo
An activation function is an element-wise mathematical function and plays a crucial role in deep neural networks (DNN).
1 code implementation • 30 Aug 2022 • Cong Guo, Chen Zhang, Jingwen Leng, Zihan Liu, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu
In this work, we propose a fixed-length adaptive numerical data type called ANT to achieve low-bit quantization with tiny hardware overheads.
no code implementations • 25 Aug 2022 • Zhengyi Li, Cong Guo, Zhanda Zhu, Yangjie Zhou, Yuxian Qiu, Xiaotian Gao, Jingwen Leng, Minyi Guo
To deal with the runtime overhead, we use a coarse-grained version of the border function.
no code implementations • 29 Jun 2022 • Guan Shen, Jieru Zhao, Quan Chen, Jingwen Leng, Chao Li, Minyi Guo
However, the quadratic complexity of self-attention w. r. t the sequence length incurs heavy computational and memory burdens, especially for tasks with long sequences.
1 code implementation • ACL 2022 • Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo
To address the above limitations, we propose the Transkimmer architecture, which learns to identify hidden state tokens that are not required by each layer.
1 code implementation • ICLR 2022 • Cong Guo, Yuxian Qiu, Jingwen Leng, Xiaotian Gao, Chen Zhang, Yunxin Liu, Fan Yang, Yuhao Zhu, Minyi Guo
This paper proposes an on-the-fly DFQ framework with sub-second quantization time, called SQuant, which can quantize networks on inference-only devices with low computation and memory requirements.
1 code implementation • 16 Dec 2021 • Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo, Yuhao Zhu
We further prune the hidden states corresponding to the unnecessary positions early in lower layers, achieving significant inference-time speedup.
no code implementations • 8 Sep 2021 • Shulai Zhang, Zirui Li, Quan Chen, Wenli Zheng, Jingwen Leng, Minyi Guo
Federated learning (FL) is a distributed machine learning paradigm that allows clients to collaboratively train a model over their own local data.
no code implementations • 20 May 2021 • Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng
We demonstrate the feasibility of our design with minimal changes to the existing production-scale inner-product-based Tensor Core.
no code implementations • 1 Jan 2021 • Yue Guan, Jingwen Leng, Yuhao Zhu, Minyi Guo
Following this idea, we proposed Block Skim Transformer (BST) to improve and accelerate the processing of transformer QA models.
no code implementations • COLING 2020 • Yue Guan, Jingwen Leng, Chao Li, Quan Chen, Minyi Guo
Recent research on the multi-head attention mechanism, especially that in pre-trained models such as BERT, has shown us heuristics and clues in analyzing various aspects of the mechanism.
no code implementations • 2 Nov 2020 • Yue Guan, Jingwen Leng, Chao Li, Quan Chen, Minyi Guo
Recent research on the multi-head attention mechanism, especially that in pre-trained models such as BERT, has shown us heuristics and clues in analyzing various aspects of the mechanism.
no code implementations • 2 Sep 2020 • Zhihui Zhang, Jingwen Leng, Lingxiao Ma, Youshan Miao, Chao Li, Minyi Guo
Graph neural networks (GNN) represent an emerging line of deep learning models that operate on graph structures.
1 code implementation • 29 Aug 2020 • Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, Yuhao Zhu
Network pruning can reduce the high computation cost of deep neural network (DNN) models.
no code implementations • 18 Feb 2020 • Cong Guo, Yangjie Zhou, Jingwen Leng, Yuhao Zhu, Zidong Du, Quan Chen, Chao Li, Bin Yao, Minyi Guo
We propose Simultaneous Multi-mode Architecture (SMA), a novel architecture design and execution model that offers general-purpose programmability on DNN accelerators in order to accelerate end-to-end applications.
no code implementations • CVPR 2019 • Yuxian Qiu, Jingwen Leng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu
Recently, researchers have started decomposing deep neural network models according to their semantics or functions.
no code implementations • 27 Sep 2018 • Yuxian Qiu, Jingwen Leng, Yuhao Zhu, Quan Chen, Chao Li, Minyi Guo
Despite their enormous success, there is still no solid understanding of deep neural network’s working mechanism.