Search Results for author: Yingyan Lin

Found 73 papers, 33 papers with code

HALO: Hardware-Aware Learning to Optimize

1 code implementation ECCV 2020 Chaojian Li, Tianlong Chen, Haoran You, Zhangyang Wang, Yingyan Lin

There has been an explosive demand for bringing machine learning (ML) powered intelligence into numerous Internet-of-Things (IoT) devices.

Omni-Recon: Towards General-Purpose Neural Radiance Fields for Versatile 3D Applications

no code implementations17 Mar 2024 Yonggan Fu, Huaizhi Qu, Zhifan Ye, Chaojian Li, Kevin Zhao, Yingyan Lin

Specifically, our Omni-Recon features a general-purpose NeRF model using image-based rendering with two decoupled branches: one complex transformer-based branch that progressively fuses geometry and appearance features for accurate geometry estimation, and one lightweight branch for predicting blending weights of source views.

3D Reconstruction Scene Understanding +1

Towards Cognitive AI Systems: a Survey and Prospective on Neuro-Symbolic AI

no code implementations2 Jan 2024 Zishen Wan, Che-Kai Liu, Hanchen Yang, Chaojian Li, Haoran You, Yonggan Fu, Cheng Wan, Tushar Krishna, Yingyan Lin, Arijit Raychowdhury

The remarkable advancements in artificial intelligence (AI), primarily driven by deep neural networks, have significantly impacted various aspects of our lives.

GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models

no code implementations19 Sep 2023 Yonggan Fu, Yongan Zhang, Zhongzhi Yu, Sixu Li, Zhifan Ye, Chaojian Li, Cheng Wan, Yingyan Lin

To our knowledge, this work is the first to demonstrate an effective pipeline for LLM-powered automated AI accelerator generation.

In-Context Learning

Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning

no code implementations23 Jun 2023 Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Yonggan Fu, Yingyan Lin

Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while avoiding over-fitting and catastrophic forgetting issues.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants

no code implementations23 Jun 2023 Zhongzhi Yu, Yonggan Fu, Jiayi Yuan, Haoran You, Yingyan Lin

Tiny deep learning has attracted increasing attention driven by the substantial demand for deploying deep learning on numerous intelligent Internet-of-Things devices.

NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations

1 code implementation10 Jun 2023 Yonggan Fu, Ye Yuan, Souvik Kundu, Shang Wu, Shunyao Zhang, Yingyan Lin

Generalizable Neural Radiance Fields (GNeRF) are one of the most promising real-world solutions for novel view synthesis, thanks to their cross-scene generalization capability and thus the possibility of instant rendering on new scenes.

Adversarial Robustness Novel View Synthesis

ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer

1 code implementation NeurIPS 2023 Haoran You, Huihong Shi, Yipin Guo, Yingyan Lin

To marry the best of both worlds, we further propose a new mixture of experts (MoE) framework to reparameterize MLPs by taking multiplication or its primitives as experts, e. g., multiplication and shift, and designing a new latency-aware load-balancing loss.

Efficient ViTs

Instant-NeRF: Instant On-Device Neural Radiance Field Training via Algorithm-Accelerator Co-Designed Near-Memory Processing

no code implementations9 May 2023 Yang Zhao, Shang Wu, Jingqun Zhang, Sixu Li, Chaojian Li, Yingyan Lin

Instant on-device Neural Radiance Fields (NeRFs) are in growing demand for unleashing the promise of immersive AR/VR experiences, but are still limited by their prohibitive training time.

Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning

1 code implementation CVPR 2023 Zhongzhi Yu, Shang Wu, Yonggan Fu, Shunyao Zhang, Yingyan Lin

To tackle this challenge, we first identify an opportunity for FViTs in few-shot tuning: pretrained FViTs themselves have already learned highly representative features from large-scale pretraining data, which are fully preserved during widely used parameter-efficient tuning.

Data Augmentation

Gen-NeRF: Efficient and Generalizable Neural Radiance Fields via Algorithm-Hardware Co-Design

no code implementations24 Apr 2023 Yonggan Fu, Zhifan Ye, Jiayi Yuan, Shunyao Zhang, Sixu Li, Haoran You, Yingyan Lin

Novel view synthesis is an essential functionality for enabling immersive experiences in various Augmented- and Virtual-Reality (AR/VR) applications, for which generalizable Neural Radiance Fields (NeRFs) have gained increasing popularity thanks to their cross-scene generalization capability.

Generalizable Novel View Synthesis Novel View Synthesis

Robust Tickets Can Transfer Better: Drawing More Transferable Subnetworks in Transfer Learning

no code implementations24 Apr 2023 Yonggan Fu, Ye Yuan, Shang Wu, Jiayi Yuan, Yingyan Lin

Transfer learning leverages feature representations of deep neural networks (DNNs) pretrained on source tasks with rich data to empower effective finetuning on downstream tasks.

Adversarial Robustness Transfer Learning

Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time Mobile Telepresence

no code implementations CVPR 2023 Yonggan Fu, Yuecheng Li, Chenghui Li, Jason Saragih, Peizhao Zhang, Xiaoliang Dai, Yingyan Lin

Real-time and robust photorealistic avatars for telepresence in AR/VR have been highly desired for enabling immersive photorealistic telepresence.

Neural Architecture Search

ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement

no code implementations19 Mar 2023 Chaojian Li, Wenwan Chen, Jiayi Yuan, Yingyan Lin, Ashutosh Sabharwal

To this end, we propose a dedicated neural architecture search framework for Energy-efficient and Real-time SAM (ERSAM).

Neural Architecture Search

INGeo: Accelerating Instant Neural Scene Reconstruction with Noisy Geometry Priors

no code implementations5 Dec 2022 Chaojian Li, Bichen Wu, Albert Pumarola, Peizhao Zhang, Yingyan Lin, Peter Vajda

We present a method that accelerates reconstruction of 3D scenes and objects, aiming to enable instant reconstruction on edge devices such as mobile phones and AR/VR headsets.

Novel View Synthesis

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference

1 code implementation CVPR 2023 Haoran You, Yunyang Xiong, Xiaoliang Dai, Bichen Wu, Peizhao Zhang, Haoqi Fan, Peter Vajda, Yingyan Lin

Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens.

Efficient ViTs

ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention

1 code implementation9 Nov 2022 Jyotikrishna Dass, Shang Wu, Huihong Shi, Chaojian Li, Zhifan Ye, Zhongfeng Wang, Yingyan Lin

Unlike sparsity-based Transformer accelerators for NLP, ViTALiTy unifies both low-rank and sparse components of the attention in ViTs.

NASA: Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks

2 code implementations24 Oct 2022 Huihong Shi, Haoran You, Yang Zhao, Zhongfeng Wang, Yingyan Lin

Multiplication is arguably the most cost-dominant operation in modern deep neural networks (DNNs), limiting their achievable efficiency and thus more extensive deployment in resource-constrained applications.

Neural Architecture Search

ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design

1 code implementation18 Oct 2022 Haoran You, Zhanyi Sun, Huihong Shi, Zhongzhi Yu, Yang Zhao, Yongan Zhang, Chaojian Li, Baopu Li, Yingyan Lin

Specifically, on the algorithm level, ViTCoD prunes and polarizes the attention maps to have either denser or sparser fixed patterns for regularizing two levels of workloads without hurting the accuracy, largely reducing the attention computations while leaving room for alleviating the remaining dominant data movements; on top of that, we further integrate a lightweight and learnable auto-encoder module to enable trading the dominant high-cost data movements for lower-cost computations.

e-G2C: A 0.14-to-8.31 $μ$J/Inference NN-based Processor with Continuous On-chip Adaptation for Anomaly Detection and ECG Conversion from EGM

no code implementations24 Jul 2022 Yang Zhao, Yongan Zhang, Yonggan Fu, Xu Ouyang, Cheng Wan, Shang Wu, Anton Banta, Mathews M. John, Allison Post, Mehdi Razavi, Joseph Cavallaro, Behnaam Aazhang, Yingyan Lin

This work presents the first silicon-validated dedicated EGM-to-ECG (G2C) processor, dubbed e-G2C, featuring continuous lightweight anomaly detection, event-driven coarse/precise conversion, and on-chip adaptation.

Anomaly Detection

SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning

1 code implementation8 Jul 2022 Haoran You, Baopu Li, Zhanyi Sun, Xu Ouyang, Yingyan Lin

In this paper, we discover for the first time that both efficient DNNs and their lottery subnetworks (i. e., lottery tickets) can be directly identified from a supernet, which we term as SuperTickets, via a two-in-one training scheme with jointly architecture searching and parameter pruning.

Neural Architecture Search

DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks

1 code implementation2 Jun 2022 Yonggan Fu, Haichuan Yang, Jiayi Yuan, Meng Li, Cheng Wan, Raghuraman Krishnamoorthi, Vikas Chandra, Yingyan Lin

Efficient deep neural network (DNN) models equipped with compact operators (e. g., depthwise convolutions) have shown great potential in reducing DNNs' theoretical complexity (e. g., the total number of weights/operations) while maintaining a decent model accuracy.

ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks

1 code implementation17 May 2022 Haoran You, Baopu Li, Huihong Shi, Yonggan Fu, Yingyan Lin

To this end, this work advocates hybrid NNs that consist of both powerful yet costly multiplications and efficient yet less powerful operators for marrying the best of both worlds, and proposes ShiftAddNAS, which can automatically search for more accurate and more efficient NNs.

Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?

1 code implementation ICLR 2022 Yonggan Fu, Shunyao Zhang, Shang Wu, Cheng Wan, Yingyan Lin

In particular, recent works show that ViTs are more robust against adversarial attacks as compared with convolutional neural networks (CNNs), and conjecture that this is because ViTs focus more on capturing global interactions among different input/feature patches, leading to their improved robustness to local perturbations imposed by adversarial attacks.

LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference

no code implementations15 Mar 2022 Zhongzhi Yu, Yonggan Fu, Shang Wu, Mengquan Li, Haoran You, Yingyan Lin

While existing works mostly fix the model precision during the whole training process, a few pioneering works have shown that dynamic precision schedules help DNNs converge to a better accuracy while leading to a lower training cost than their static precision training counterparts.

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

no code implementations7 Mar 2022 Tong Geng, Chunshu Wu, Yongan Zhang, Cheng Tan, Chenhao Xie, Haoran You, Martin C. Herbordt, Yingyan Lin, Ang Li

In this paper we propose a novel hardware accelerator for GCN inference, called I-GCN, that significantly improves data locality and reduces unnecessary computation.

MIA-Former: Efficient and Robust Vision Transformers via Multi-grained Input-Adaptation

no code implementations21 Dec 2021 Zhongzhi Yu, Yonggan Fu, Sicheng Li, Chaojian Li, Yingyan Lin

ViTs are often too computationally expensive to be fitted onto real-world resource-constrained devices, due to (1) their quadratically increased complexity with the number of input tokens and (2) their overparameterized self-attention heads and model depth.

Locality Sensitive Teaching

no code implementations NeurIPS 2021 Zhaozhuo Xu, Beidi Chen, Chaojian Li, Weiyang Liu, Le Song, Yingyan Lin, Anshumali Shrivastava

However, as one of the most influential and practical MT paradigms, iterative machine teaching (IMT) is prohibited on IoT devices due to its inefficient and unscalable algorithms.

FBNetV5: Neural Architecture Search for Multiple Tasks in One Run

no code implementations19 Nov 2021 Bichen Wu, Chaojian Li, Hang Zhang, Xiaoliang Dai, Peizhao Zhang, Matthew Yu, Jialiang Wang, Yingyan Lin, Peter Vajda

To tackle these challenges, we propose FBNetV5, a NAS framework that can search for neural architectures for a variety of vision tasks with much reduced computational cost and human effort.

Classification Image Classification +4

RT-RCG: Neural Network and Accelerator Search Towards Effective and Real-time ECG Reconstruction from Intracardiac Electrograms

no code implementations4 Nov 2021 Yongan Zhang, Anton Banta, Yonggan Fu, Mathews M. John, Allison Post, Mehdi Razavi, Joseph Cavallaro, Behnaam Aazhang, Yingyan Lin

To close this gap and make a heuristic step towards real-time critical intervention in instant response to irregular and infrequent ventricular rhythms, we propose a new framework dubbed RT-RCG to automatically search for (1) efficient Deep Neural Network (DNN) structures and then (2)corresponding accelerators, to enable Real-Time and high-quality Reconstruction of ECG signals from EGM signals.

Navigate Neural Architecture Search

Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks

1 code implementation NeurIPS 2021 Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Lin

Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks, i. e., an imperceptible perturbation to the input can mislead DNNs trained on clean images into making erroneous predictions.

Adversarial Robustness

An Investigation on Hardware-Aware Vision Transformer Scaling

no code implementations29 Sep 2021 Chaojian Li, KyungMin Kim, Bichen Wu, Peizhao Zhang, Hang Zhang, Xiaoliang Dai, Peter Vajda, Yingyan Lin

In particular, when transferred to PiT, our scaling strategies lead to a boosted ImageNet top-1 accuracy of from $74. 6\%$ to $76. 7\%$ ($\uparrow2. 1\%$) under the same 0. 7G FLOPs; and when transferred to the COCO object detection task, the average precision is boosted by $\uparrow0. 7\%$ under a similar throughput on a V100 GPU.

Image Classification object-detection +2

D$^2$-GCN: Data-Dependent GCNs for Boosting Both Efficiency and Scalability

no code implementations29 Sep 2021 Chaojian Li, Xu Ouyang, Yang Zhao, Haoran You, Yonggan Fu, Yuchen Gu, Haonan Liu, Siyuan Miao, Yingyan Lin

Graph Convolutional Networks (GCNs) have gained an increasing attention thanks to their state-of-the-art (SOTA) performance in graph-based learning tasks.

Contrastive Quant: Quantization Makes Stronger Contrastive Learning

no code implementations29 Sep 2021 Yonggan Fu, Qixuan Yu, Meng Li, Xu Ouyang, Vikas Chandra, Yingyan Lin

Contrastive learning, which learns visual representations by enforcing feature consistency under different augmented views, has emerged as one of the most effective unsupervised learning methods.

Contrastive Learning Quantization

G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and Efficiency

no code implementations18 Sep 2021 Yongan Zhang, Haoran You, Yonggan Fu, Tong Geng, Ang Li, Yingyan Lin

While end-to-end jointly optimizing GNNs and their accelerators is promising in boosting GNNs' inference efficiency and expediting the design process, it is still underexplored due to the vast and distinct design spaces of GNNs and their accelerators.

2-in-1 Accelerator: Enabling Random Precision Switch for Winning Both Adversarial Robustness and Efficiency

no code implementations11 Sep 2021 Yonggan Fu, Yang Zhao, Qixuan Yu, Chaojian Li, Yingyan Lin

The recent breakthroughs of deep neural networks (DNNs) and the advent of billions of Internet of Things (IoT) devices have excited an explosive demand for intelligent IoT devices equipped with domain-specific DNN accelerators.

Adversarial Robustness Quantization

O-HAS: Optical Hardware Accelerator Search for Boosting Both Acceleration Performance and Development Speed

no code implementations17 Aug 2021 Mengquan Li, Zhongzhi Yu, Yongan Zhang, Yonggan Fu, Yingyan Lin

The recent breakthroughs and prohibitive complexities of Deep Neural Networks (DNNs) have excited extensive interest in domain-specific DNN accelerators, among which optical DNN accelerators are particularly promising thanks to their unprecedented potential of achieving superior performance-per-watt.

DANCE: DAta-Network Co-optimization for Efficient Segmentation Model Training and Inference

no code implementations16 Jul 2021 Chaojian Li, Wuyang Chen, Yuchen Gu, Tianlong Chen, Yonggan Fu, Zhangyang Wang, Yingyan Lin

Semantic segmentation for scene understanding is nowadays widely demanded, raising significant challenges for the algorithm efficiency, especially its applications on resource-limited platforms.

Scene Understanding Segmentation +1

Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators

1 code implementation11 Jun 2021 Yonggan Fu, Yongan Zhang, Yang Zhang, David Cox, Yingyan Lin

The key challenges include (1) the dilemma of whether to explode the memory consumption due to the huge joint space or achieve sub-optimal designs, (2) the discrete nature of the accelerator design space that is coupled yet different from that of the networks and bitwidths, and (3) the chicken and egg problem associated with network-accelerator co-search, i. e., co-search requires operation-wise hardware cost, which is lacking during search as the optimal accelerator depending on the whole network is still unknown during search.

A3C-S: Automated Agent Accelerator Co-Search towards Efficient Deep Reinforcement Learning

no code implementations11 Jun 2021 Yonggan Fu, Yongan Zhang, Chaojian Li, Zhongzhi Yu, Yingyan Lin

Driven by the explosive interest in applying deep reinforcement learning (DRL) agents to numerous real-time control and decision-making applications, there has been a growing demand to deploy DRL agents to empower daily-life intelligent devices, while the prohibitive complexity of DRL stands at odds with limited on-device resources.

Decision Making reinforcement-learning +1

InstantNet: Automated Generation and Deployment of Instantaneously Switchable-Precision Networks

1 code implementation22 Apr 2021 Yonggan Fu, Zhongzhi Yu, Yongan Zhang, Yifan Jiang, Chaojian Li, Yongyuan Liang, Mingchao Jiang, Zhangyang Wang, Yingyan Lin

The promise of Deep Neural Network (DNN) powered Internet of Thing (IoT) devices has motivated a tremendous demand for automated solutions to enable fast development and deployment of efficient (1) DNNs equipped with instantaneous accuracy-efficiency trade-off capability to accommodate the time-varying resources at IoT devices and (2) dataflows to optimize DNNs' execution efficiency on different devices.

HW-NAS-Bench:Hardware-Aware Neural Architecture Search Benchmark

1 code implementation19 Mar 2021 Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, Yingyan Lin

To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance of all the networks in the search spaces of both NAS-Bench-201 and FBNet, on six hardware devices that fall into three categories (i. e., commercial edge devices, FPGA, and ASIC).

Hardware Aware Neural Architecture Search Neural Architecture Search

Early-Bird GCNs: Graph-Network Co-Optimization Towards More Efficient GCN Training and Inference via Drawing Early-Bird Lottery Tickets

2 code implementations1 Mar 2021 Haoran You, Zhihan Lu, Zijian Zhou, Yonggan Fu, Yingyan Lin

Experiments on various GCN models and datasets consistently validate our GEB finding and the effectiveness of our GEBT, e. g., our GEBT achieves up to 80. 2% ~ 85. 6% and 84. 6% ~ 87. 5% savings of GCN training and inference costs while offering a comparable or even better accuracy as compared to state-of-the-art methods.

Representation Learning

CPT: Efficient Deep Neural Network Training via Cyclic Precision

1 code implementation ICLR 2021 Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

In this paper, we attempt to explore low-precision training from a new perspective as inspired by recent findings in understanding DNN training: we conjecture that DNNs' precision might have a similar effect as the learning rate during DNN training, and advocate dynamic precision along the training trajectory for further boosting the time/energy efficiency of DNN training.

Language Modelling

Max-Affine Spline Insights Into Deep Network Pruning

no code implementations7 Jan 2021 Haoran You, Randall Balestriero, Zhihan Lu, Yutong Kou, Huihong Shi, Shunyao Zhang, Shang Wu, Yingyan Lin, Richard Baraniuk

In this paper, we study the importance of pruning in Deep Networks (DNs) and the yin & yang relationship between (1) pruning highly overparametrized DNs that have been trained from random initialization and (2) training small DNs that have been "cleverly" initialized.

Network Pruning

SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training

1 code implementation4 Jan 2021 Xiaohan Chen, Yang Zhao, Yue Wang, Pengfei Xu, Haoran You, Chaojian Li, Yonggan Fu, Yingyan Lin, Zhangyang Wang

Results show that: 1) applied to inference, SD achieves up to 2. 44x energy efficiency as evaluated via real hardware implementations; 2) applied to training, SD leads to 10. 56x and 4. 48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.

BDS-GCN: Efficient Full-Graph Training of Graph Convolutional Nets with Partition-Parallelism and Boundary Sampling

no code implementations1 Jan 2021 Cheng Wan, Youjie Li, Nam Sung Kim, Yingyan Lin

While it can be natural to leverage graph partition and distributed training for tackling this challenge, this direction has only been slightly touched on previously due to the unique challenge posed by the GCN structures, especially the excessive amount of boundary nodes in each partitioned subgraph, which can easily explode the required memory and communications for distributed training of GCNs.

SACoD: Sensor Algorithm Co-Design Towards Efficient CNN-powered Intelligent PhlatCam

1 code implementation ICCV 2021 Yonggan Fu, Yang Zhang, Yue Wang, Zhihan Lu, Vivek Boominathan, Ashok Veeraraghavan, Yingyan Lin

PhlatCam, with its form factor potentially reduced by orders of magnitude, has emerged as a promising solution to the first aforementioned challenge, while the second one remains a bottleneck.

Benchmarking Model Compression +1

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

no code implementations ICLR 2021 Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, Cong Hao, Yingyan Lin

To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance (e. g., energy cost and latency) of all the networks in the search space of both NAS-Bench-201 and FBNet, considering six hardware devices that fall into three categories (i. e., commercial edge devices, FPGA, and ASIC).

Hardware Aware Neural Architecture Search Neural Architecture Search

Triple-Search: Differentiable Joint-Search of Networks, Precision, and Accelerators

no code implementations1 Jan 2021 Yonggan Fu, Yongan Zhang, Haoran You, Yingyan Lin

First, to jointly search for a network and its precision via differentiable search, there exists a dilemma of whether to explode the memory consumption or achieve sub-optimal designs.

Auto-Agent-Distiller: Towards Efficient Deep Reinforcement Learning Agents via Neural Architecture Search

no code implementations24 Dec 2020 Yonggan Fu, Zhongzhi Yu, Yongan Zhang, Yingyan Lin

We therefore propose an Auto-Agent-Distiller (A2D) framework, which to our best knowledge is the first neural architecture search (NAS) applied to DRL to automatically search for the optimal DRL agents for various tasks that optimize both the test scores and efficiency.

Neural Architecture Search reinforcement-learning +1

FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training

1 code implementation NeurIPS 2020 Yonggan Fu, Haoran You, Yang Zhao, Yue Wang, Chaojian Li, Kailash Gopalakrishnan, Zhangyang Wang, Yingyan Lin

Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at the edge and the required massive training costs for state-of-the-art (SOTA) DNNs.

Quantization

DNA: Differentiable Network-Accelerator Co-Search

no code implementations28 Oct 2020 Yongan Zhang, Yonggan Fu, Weiwen Jiang, Chaojian Li, Haoran You, Meng Li, Vikas Chandra, Yingyan Lin

Powerful yet complex deep neural networks (DNNs) have fueled a booming demand for efficient DNN solutions to bring DNN-powered intelligence into numerous applications.

AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks

3 code implementations ICML 2020 Yonggan Fu, Wuyang Chen, Haotao Wang, Haoran Li, Yingyan Lin, Zhangyang Wang

Inspired by the recent success of AutoML in deep compression, we introduce AutoML to GAN compression and develop an AutoGAN-Distiller (AGD) framework.

AutoML Knowledge Distillation +2

AdaDeep: A Usage-Driven, Automated Deep Model Compression Framework for Enabling Ubiquitous Intelligent Mobiles

no code implementations8 Jun 2020 Sicong Liu, Junzhao Du, Kaiming Nan, ZimuZhou, Atlas Wang, Yingyan Lin

Recent breakthroughs in Deep Neural Networks (DNNs) have fueled a tremendously growing demand for bringing DNN-powered intelligence into mobile platforms.

Model Compression

SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation

no code implementations7 May 2020 Yang Zhao, Xiaohan Chen, Yue Wang, Chaojian Li, Haoran You, Yonggan Fu, Yuan Xie, Zhangyang Wang, Yingyan Lin

We present SmartExchange, an algorithm-hardware co-design framework to trade higher-cost memory storage/access for lower-cost computation, for energy-efficient inference of deep neural networks (DNNs).

Model Compression Quantization

TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain

no code implementations3 May 2020 Weitao Li, Pengfei Xu, Yang Zhao, Haitong Li, Yuan Xie, Yingyan Lin

Resistive-random-access-memory (ReRAM) based processing-in-memory (R$^2$PIM) accelerators show promise in bridging the gap between Internet of Thing devices' constrained resources and Convolutional/Deep Neural Networks' (CNNs/DNNs') prohibitive energy cost.

Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks

1 code implementation ICLR 2020 Haoran You, Chaojian Li, Pengfei Xu, Yonggan Fu, Yue Wang, Xiaohan Chen, Richard G. Baraniuk, Zhangyang Wang, Yingyan Lin

Finally, we leverage the existence of EB tickets and the proposed mask distance to develop efficient training methods, which are achieved by first identifying EB tickets via low-cost schemes, and then continuing to train merely the EB tickets towards the target accuracy.

A New MRAM-based Process In-Memory Accelerator for Efficient Neural Network Training with Floating Point Precision

no code implementations2 Mar 2020 Hongjie Wang, Yang Zhao, Chaojian Li, Yue Wang, Yingyan Lin

The excellent performance of modern deep neural networks (DNNs) comes at an often prohibitive training cost, limiting the rapid development of DNN innovations and raising various environmental concerns.

Efficient Neural Network

DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures

no code implementations26 Feb 2020 Yang Zhao, Chaojian Li, Yue Wang, Pengfei Xu, Yongan Zhang, Yingyan Lin

The recent breakthroughs in deep neural networks (DNNs) have spurred a tremendously increased demand for DNN accelerators.

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs

1 code implementation6 Jan 2020 Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, Yingyan Lin

Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator's energy, throughput, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balancing, etc.

Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference

1 code implementation3 Jan 2020 Jianghao Shen, Yonggan Fu, Yue Wang, Pengfei Xu, Zhangyang Wang, Yingyan Lin

The core idea of DFS is to hypothesize layer-wise quantization (to different bitwidths) as intermediate "soft" choices to be made between fully utilizing and skipping a layer.

Quantization

E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings

no code implementations NeurIPS 2019 Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, Zhangyang Wang

Extensive simulations and ablation studies, with real energy measurements from an FPGA board, confirm the superiority of our proposed strategies and demonstrate remarkable energy savings for training.

Drawing Early-Bird Tickets: Towards More Efficient Training of Deep Networks

2 code implementations26 Sep 2019 Haoran You, Chaojian Li, Pengfei Xu, Yonggan Fu, Yue Wang, Xiaohan Chen, Richard G. Baraniuk, Zhangyang Wang, Yingyan Lin

In this paper, we discover for the first time that the winning tickets can be identified at the very early training stage, which we term as early-bird (EB) tickets, via low-cost training schemes (e. g., early stopping and low-precision training) at large learning rates.

Dual Dynamic Inference: Enabling More Efficient, Adaptive and Controllable Deep Inference

no code implementations10 Jul 2019 Yue Wang, Jianghao Shen, Ting-Kuei Hu, Pengfei Xu, Tan Nguyen, Richard Baraniuk, Zhangyang Wang, Yingyan Lin

State-of-the-art convolutional neural networks (CNNs) yield record-breaking predictive performance, yet at the cost of high-energy-consumption inference, that prohibits their widely deployments in resource-constrained Internet of Things (IoT) applications.

EnergyNet: Energy-Efficient Dynamic Inference

no code implementations NIPS Workshop CDNNRIA 2018 Yue Wang, Tan Nguyen, Yang Zhao, Zhangyang Wang, Yingyan Lin, Richard Baraniuk

The prohibitive energy cost of running high-performance Convolutional Neural Networks (CNNs) has been limiting their deployment on resource-constrained platforms including mobile and wearable devices.

Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions

1 code implementation ICML 2018 Junru Wu, Yue Wang, Zhen-Yu Wu, Zhangyang Wang, Ashok Veeraraghavan, Yingyan Lin

The current trend of pushing CNNs deeper with convolutions has created a pressing demand to achieve higher compression gains on CNNs where convolutions dominate the computation and parameter amount (e. g., GoogLeNet, ResNet and Wide ResNet).

Clustering

Deep $k$-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions

1 code implementation24 Jun 2018 Junru Wu, Yue Wang, Zhen-Yu Wu, Zhangyang Wang, Ashok Veeraraghavan, Yingyan Lin

The current trend of pushing CNNs deeper with convolutions has created a pressing demand to achieve higher compression gains on CNNs where convolutions dominate the computation and parameter amount (e. g., GoogLeNet, ResNet and Wide ResNet).

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.