Search Results for author: Wei Niu

Found 38 papers, 9 papers with code

Efficient Pruning of Large Language Model with Adaptive Estimation Fusion

no code implementations • 16 Mar 2024 • Jun Liu, Chao Wu, Changdi Yang, Hao Tang, Haoye Dong, Zhenglun Kong, Geng Yuan, Wei Niu, Dong Huang, Yanzhi Wang

Large language models (LLMs) have become crucial for many generative downstream tasks, leading to an inevitable trend and significant challenge to deploy them efficiently on resource-constrained devices.

Language Modelling Large Language Model

Paper
Add Code

SoD$^2$: Statically Optimizing Dynamic Deep Neural Network

no code implementations • 29 Feb 2024 • Wei Niu, Gagan Agrawal, Bin Ren

Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs.

Code Generation

Paper
Add Code

EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge

1 code implementation • 16 Feb 2024 • Xuan Shen, Zhenglun Kong, Changdi Yang, Zhaoyang Han, Lei Lu, Peiyan Dong, Cheng Lyu, Chih-hsiang Li, Xuehang Guo, Zhihao Shu, Wei Niu, Miriam Leeser, Pu Zhao, Yanzhi Wang

In this paper, we propose EdgeQAT, the Entropy and Distribution Guided QAT for the optimization of lightweight LLMs to achieve inference acceleration on Edge devices.

Quantization

Paper
Code

Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges

no code implementations • 14 Sep 2023 • Fei Dou, Jin Ye, Geng Yuan, Qin Lu, Wei Niu, Haijian Sun, Le Guan, Guoyu Lu, Gengchen Mai, Ninghao Liu, Jin Lu, Zhengliang Liu, Zihao Wu, Chenjiao Tan, Shaochen Xu, Xianqiao Wang, Guoming Li, Lilong Chai, Sheng Li, Jin Sun, Hongyue Sun, Yunli Shao, Changying Li, Tianming Liu, WenZhan Song

Artificial General Intelligence (AGI), possessing the capacity to comprehend, learn, and execute tasks with human cognitive abilities, engenders significant anticipation and intrigue across scientific, commercial, and societal arenas.

Decision Making

Paper
Add Code

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

1 code implementation • CVPR 2023 • Gen Li, Jie Ji, Minghai Qin, Wei Niu, Bin Ren, Fatemeh Afghah, Linke Guo, Xiaolong Ma

To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum.

Video Super-Resolution

Paper
Code

Complex dynamics of knowledgeable monopoly models with gradient mechanisms

no code implementations • 4 Jan 2023 • Xiaoliang Li, Jiacheng Fu, Wei Niu

Furthermore, we find that the basins of the two stable equilibria in the second model are disconnected and also have complicated topological structures.

Paper
Add Code

Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on the Edge

no code implementations • CVPR 2023 • Changdi Yang, Pu Zhao, Yanyu Li, Wei Niu, Jiexiong Guan, Hao Tang, Minghai Qin, Bin Ren, Xue Lin, Yanzhi Wang

With the ever-increasing popularity of edge devices, it is necessary to implement real-time segmentation on the edge for autonomous driving and many other applications.

Autonomous Driving Segmentation +1

Paper
Add Code

SparCL: Sparse Continual Learning on the Edge

1 code implementation • 20 Sep 2022 • Zifeng Wang, Zheng Zhan, Yifan Gong, Geng Yuan, Wei Niu, Tong Jian, Bin Ren, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy

SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity.

Continual Learning

Paper
Code

Survey: Exploiting Data Redundancy for Optimization of Deep Learning

no code implementations • 29 Aug 2022 • Jou-An Chen, Wei Niu, Bin Ren, Yanzhi Wang, Xipeng Shen

It surveys hundreds of recent papers on the topic, introduces a novel taxonomy to put the various techniques into a single categorization framework, offers a comprehensive description of the main methods used for exploiting data redundancy in improving multiple kinds of DNNs on data, and points out a set of research opportunities for future to explore.

Paper
Add Code

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution

1 code implementation • 25 Jul 2022 • Yushu Wu, Yifan Gong, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, Yanzhi Wang

Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence.

Neural Architecture Search SSIM +1

Paper
Code

Real-Time Portrait Stylization on the Edge

no code implementations • 2 Jun 2022 • Yanyu Li, Xuan Shen, Geng Yuan, Jiexiong Guan, Wei Niu, Hao Tang, Bin Ren, Yanzhi Wang

In this work we demonstrate real-time portrait stylization, specifically, translating self-portrait into cartoon or anime style on mobile devices.

Paper
Add Code

SPViT: Enabling Faster Vision Transformers via Soft Token Pruning

1 code implementation • 27 Dec 2021 • Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Xuan Shen, Geng Yuan, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang

Moreover, our framework can guarantee the identified model to meet resource specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile platforms.

Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-S)

Efficient ViTs Model Compression

Paper
Code

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

no code implementations • 22 Nov 2021 • Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices.

Model Compression

Paper
Add Code

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

1 code implementation • NeurIPS 2021 • Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, Siyue Wang, Minghai Qin, Bin Ren, Yanzhi Wang, Sijia Liu, Xue Lin

Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works.

Paper
Code

Enabling Level-4 Autonomous Driving on a Single $1k Off-the-Shelf Card

no code implementations • 12 Oct 2021 • Hsin-Hsuan Sung, Yuanchao Xu, Jiexiong Guan, Wei Niu, Shaoshan Liu, Bin Ren, Yanzhi Wang, Xipeng Shen

Autonomous driving is of great interest in both research and industry.

Autonomous Driving

Paper
Add Code

HFSP: A Hardware-friendly Soft Pruning Framework for Vision Transformers

no code implementations • 29 Sep 2021 • Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang

Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult.

Image Classification Model Compression

Paper
Add Code

DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

no code implementations • 30 Aug 2021 • Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, Bin Ren

Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices.

Code Generation

Paper
Add Code

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

no code implementations • 25 Aug 2021 • Wei Niu, Zhengang Li, Xiaolong Ma, Peiyan Dong, Gang Zhou, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

It necessitates the sparse model inference via weight pruning, i. e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilitate real-time inference on mobile devices while preserving a high sparse model accuracy.

Code Generation Compiler Optimization

Paper
Add Code

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search

no code implementations • ICCV 2021 • Zheng Zhan, Yifan Gong, Pu Zhao, Geng Yuan, Wei Niu, Yushu Wu, Tianyun Zhang, Malith Jayaweera, David Kaeli, Bin Ren, Xue Lin, Yanzhi Wang

Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices.

Image Super-Resolution Neural Architecture Search +1

Paper
Add Code

Achieving Real-Time Object Detection on MobileDevices with Neural Pruning Search

no code implementations • 28 Jun 2021 • Pu Zhao, Wei Niu, Geng Yuan, Yuxuan Cai, Bin Ren, Yanzhi Wang, Xue Lin

Object detection plays an important role in self-driving cars for security development.

3D Object Detection Compiler Optimization +4

Paper
Add Code

Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices

no code implementations • 6 Jun 2021 • Xuan Shen, Geng Yuan, Wei Niu, Xiaolong Ma, Jiexiong Guan, Zhengang Li, Bin Ren, Yanzhi Wang

The rapid development of autonomous driving, abnormal behavior detection, and behavior recognition makes an increasing demand for multi-person pose estimation-based applications, especially on mobile platforms.

Autonomous Driving Multi-Person Pose Estimation

Paper
Add Code

A Compression-Compilation Framework for On-mobile Real-time BERT Applications

no code implementations • 30 May 2021 • Wei Niu, Zhenglun Kong, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang

In this paper, we propose a compression-compilation co-design framework that can guarantee the identified model to meet both resource and real-time specifications of mobile devices.

Question Answering Text Generation

Paper
Add Code

Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device

no code implementations • 26 Dec 2020 • Pu Zhao, Wei Niu, Geng Yuan, Yuxuan Cai, Hsin-Hsuan Sung, Sijia Liu, Xipeng Shen, Bin Ren, Yanzhi Wang, Xue Lin

3D object detection is an important task, especially in the autonomous driving application domain.

3D Object Detection Autonomous Driving +5

Paper
Add Code

NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration

no code implementations • CVPR 2021 • Zhengang Li, Geng Yuan, Wei Niu, Pu Zhao, Yanyu Li, Yuxuan Cai, Xuan Shen, Zheng Zhan, Zhenglun Kong, Qing Jin, Zhiyu Chen, Sijia Liu, Kaiyuan Yang, Bin Ren, Yanzhi Wang, Xue Lin

With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed.

Bayesian Optimization Code Generation +2

Paper
Add Code

ClickTrain: Efficient and Accurate End-to-End Deep Learning Training via Fine-Grained Architecture-Preserving Pruning

no code implementations • 20 Nov 2020 • Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao

Moreover, compared with the state-of-the-art pruning-during-training approach, ClickTrain provides significant improvements both accuracy and compression ratio on the tested CNN models and datasets, under similar limited training time.

Paper
Add Code

Real-Time Execution of Large-scale Language Models on Mobile

no code implementations • 15 Sep 2020 • Wei Niu, Zhenglun Kong, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang

Our framework can guarantee the identified model to meet both resource and real-time specifications of mobile devices, thus achieving real-time execution of large transformer-based models like BERT variants.

Edge-computing

Paper
Add Code

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

3 code implementations • 12 Sep 2020 • Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang

In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design.

Computational Efficiency Object +2

363

Paper
Code

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

no code implementations • 20 Jul 2020 • Wei Niu, Mengshu Sun, Zhengang Li, Jou-An Chen, Jiexiong Guan, Xipeng Shen, Yanzhi Wang, Sijia Liu, Xue Lin, Bin Ren

The vanilla sparsity removes whole kernel groups, while KGS sparsity is a more fine-grained structured sparsity that enjoys higher flexibility while exploiting full on-device parallelism.

Code Generation Model Compression

Paper
Add Code

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

no code implementations • 22 Apr 2020 • Wei Niu, Pu Zhao, Zheng Zhan, Xue Lin, Yanzhi Wang, Bin Ren

High-end mobile platforms rapidly serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications.

Compiler Optimization Style Transfer +1

Paper
Add Code

A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework

no code implementations • 13 Mar 2020 • Yifan Gong, Zheng Zhan, Zhengang Li, Wei Niu, Xiaolong Ma, Wenhao Wang, Bin Ren, Caiwen Ding, Xue Lin, Xiao-Lin Xu, Yanzhi Wang

Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.

Model Compression Privacy Preserving

Paper
Add Code

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition

no code implementations • 19 Feb 2020 • Peiyan Dong, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang Li, Yifan Gong, Bin Ren, Xue Lin, Yanzhi Wang, Dingwen Tao

Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

1 code implementation • 12 Feb 2020 • Phillip Keung, Wei Niu, Yichao Lu, Julian Salazar, Vikas Bhardwaj

We discuss the problem of echographic transcription in autoregressive sequence-to-sequence attentional architectures for automatic speech recognition, where a model produces very long sequences of repetitive outputs when presented with out-of-domain utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method

no code implementations • 23 Jan 2020 • Xiaolong Ma, Zhengang Li, Yifan Gong, Tianyun Zhang, Wei Niu, Zheng Zhan, Pu Zhao, Jian Tang, Xue Lin, Bin Ren, Yanzhi Wang

Accelerating DNN execution on various resource-limited computing platforms has been a long-standing problem.

Paper
Add Code

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

no code implementations • ECCV 2020 • Xiaolong Ma, Wei Niu, Tianyun Zhang, Sijia Liu, Sheng Lin, Hongjia Li, Xiang Chen, Jian Tang, Kaisheng Ma, Bin Ren, Yanzhi Wang

Weight pruning has been widely acknowledged as a straightforward and effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby achieving acceleration on various platforms.

Code Generation Compiler Optimization

Paper
Add Code

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

no code implementations • 1 Jan 2020 • Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss.

Code Generation Model Compression

Paper
Add Code

A Hierarchical Self-Attentive Model for Recommending User-Generated Item Lists

1 code implementation • 30 Dec 2019 • Yun He, Jianling Wang, Wei Niu, James Caverlee

User-generated item lists are a popular feature of many different platforms.

Paper
Code

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices

no code implementations • 6 Sep 2019 • Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, Yanzhi Wang

Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method.

Model Compression

Paper
Add Code

26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone

no code implementations • 2 May 2019 • Wei Niu, Xiaolong Ma, Yanzhi Wang, Bin Ren

With the rapid emergence of a spectrum of high-end mobile devices, many applications that required desktop-level computation capability formerly can now run on these devices without any problem.

Model Compression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.