Search Results for author: Jingwen Leng

Found 21 papers, 6 papers with code

Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention

no code implementations • 13 Nov 2023 • Ziwei He, Jian Yuan, Le Zhou, Jingwen Leng, Bo Jiang

The quadratic complexity of self-attention in Transformers has hindered the processing of long text.

Paper
Add Code

Accelerating Generic Graph Neural Networks via Architecture, Compiler, Partition Method Co-Design

no code implementations • 16 Aug 2023 • Shuwen Lu, Zhihui Zhang, Cong Guo, Jingwen Leng, Yangjie Zhou, Minyi Guo

However, designing GNN accelerators faces two fundamental challenges: the high bandwidth requirement of GNN models and the diversity of GNN models.

Graph Learning graph partitioning

Paper
Add Code

AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs

no code implementations • 27 May 2023 • Yangjie Zhou, Yaoxu Song, Jingwen Leng, Zihan Liu, Weihao Cui, Zhendong Zhang, Cong Guo, Quan Chen, Li Li, Minyi Guo

Graph neural networks (GNNs) are powerful tools for exploring and learning from graph structures and features.

Paper
Add Code

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

1 code implementation • 24 May 2023 • Ziwei He, Meng Yang, Minwei Feng, Jingcheng Yin, Xinbing Wang, Jingwen Leng, Zhouhan Lin

Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models.

Ranked #1 on Open-Domain Question Answering on ELI5

Abstractive Text Summarization Document Summarization +2

Paper
Code

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training

no code implementations • 22 Sep 2022 • Cong Guo, Yuxian Qiu, Jingwen Leng, Chen Zhang, Ying Cao, Quanlu Zhang, Yunxin Liu, Fan Yang, Minyi Guo

An activation function is an element-wise mathematical function and plays a crucial role in deep neural networks (DNN).

Paper
Add Code

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization

1 code implementation • 30 Aug 2022 • Cong Guo, Chen Zhang, Jingwen Leng, Zihan Liu, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu

In this work, we propose a fixed-length adaptive numerical data type called ANT to achieve low-bit quantization with tiny hardware overheads.

Quantization

Paper
Code

Efficient Adaptive Activation Rounding for Post-Training Quantization

no code implementations • 25 Aug 2022 • Zhengyi Li, Cong Guo, Zhanda Zhu, Yangjie Zhou, Yuxian Qiu, Xiaotian Gao, Jingwen Leng, Minyi Guo

To deal with the runtime overhead, we use a coarse-grained version of the border function.

Quantization

Paper
Add Code

SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences

no code implementations • 29 Jun 2022 • Guan Shen, Jieru Zhao, Quan Chen, Jingwen Leng, Chao Li, Minyi Guo

However, the quadratic complexity of self-attention w. r. t the sequence length incurs heavy computational and memory burdens, especially for tasks with long sequences.

Paper
Add Code

Transkimmer: Transformer Learns to Layer-wise Skim

1 code implementation • ACL 2022 • Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo

To address the above limitations, we propose the Transkimmer architecture, which learns to identify hidden state tokens that are not required by each layer.

Computational Efficiency

Paper
Code

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation

1 code implementation • ICLR 2022 • Cong Guo, Yuxian Qiu, Jingwen Leng, Xiaotian Gao, Chen Zhang, Yunxin Liu, Fan Yang, Yuhao Zhu, Minyi Guo

This paper proposes an on-the-fly DFQ framework with sub-second quantization time, called SQuant, which can quantize networks on inference-only devices with low computation and memory requirements.

Data Free Quantization

154

Paper
Code

Block-Skim: Efficient Question Answering for Transformer

1 code implementation • 16 Dec 2021 • Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo, Yuhao Zhu

We further prune the hidden states corresponding to the unnecessary positions early in lower layers, achieving significant inference-time speedup.

Extractive Question-Answering Question Answering

Paper
Code

Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client Selection

no code implementations • 8 Sep 2021 • Shulai Zhang, Zirui Li, Quan Chen, Wenli Zheng, Jingwen Leng, Minyi Guo

Federated learning (FL) is a distributed machine learning paradigm that allows clients to collaboratively train a model over their own local data.

Federated Learning

Paper
Add Code

Dual-side Sparse Tensor Core

no code implementations • 20 May 2021 • Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng

We demonstrate the feasibility of our design with minimal changes to the existing production-scale inner-product-based Tensor Core.

Paper
Add Code

Block Skim Transformer for Efficient Question Answering

no code implementations • 1 Jan 2021 • Yue Guan, Jingwen Leng, Yuhao Zhu, Minyi Guo

Following this idea, we proposed Block Skim Transformer (BST) to improve and accelerate the processing of transformer QA models.

Language Modelling Model Compression +1

Paper
Add Code

How Far Does BERT Look At: Distance-based Clustering and Analysis of BERT's Attention

no code implementations • COLING 2020 • Yue Guan, Jingwen Leng, Chao Li, Quan Chen, Minyi Guo

Recent research on the multi-head attention mechanism, especially that in pre-trained models such as BERT, has shown us heuristics and clues in analyzing various aspects of the mechanism.

Clustering

Paper
Add Code

How Far Does BERT Look At:Distance-based Clustering and Analysis of BERT$'$s Attention

no code implementations • 2 Nov 2020 • Yue Guan, Jingwen Leng, Chao Li, Quan Chen, Minyi Guo

Recent research on the multi-head attention mechanism, especially that in pre-trained models such as BERT, has shown us heuristics and clues in analyzing various aspects of the mechanism.

Clustering

Paper
Add Code

Architectural Implications of Graph Neural Networks

no code implementations • 2 Sep 2020 • Zhihui Zhang, Jingwen Leng, Lingxiao Ma, Youshan Miao, Chao Li, Minyi Guo

Graph neural networks (GNN) represent an emerging line of deep learning models that operate on graph structures.

Paper
Add Code

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

1 code implementation • 29 Aug 2020 • Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, Yuhao Zhu

Network pruning can reduce the high computation cost of deep neural network (DNN) models.

Network Pruning

138

Paper
Code

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration

no code implementations • 18 Feb 2020 • Cong Guo, Yangjie Zhou, Jingwen Leng, Yuhao Zhu, Zidong Du, Quan Chen, Chao Li, Bin Yao, Minyi Guo

We propose Simultaneous Multi-mode Architecture (SMA), a novel architecture design and execution model that offers general-purpose programmability on DNN accelerators in order to accelerate end-to-end applications.

Paper
Add Code

Adversarial Defense Through Network Profiling Based Path Extraction

no code implementations • CVPR 2019 • Yuxian Qiu, Jingwen Leng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu

Recently, researchers have started decomposing deep neural network models according to their semantics or functions.

Adversarial Defense

Paper
Add Code

Effective Path: Know the Unknowns of Neural Network

no code implementations • 27 Sep 2018 • Yuxian Qiu, Jingwen Leng, Yuhao Zhu, Quan Chen, Chao Li, Minyi Guo

Despite their enormous success, there is still no solid understanding of deep neural network’s working mechanism.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.