Search Results for author: Qijing Huang

Found 13 papers, 9 papers with code

Full Stack Optimization of Transformer Inference: a Survey

no code implementations • 27 Feb 2023 • Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami

In this work, we survey different approaches for efficient Transformer inference, including: (i) analysis and profiling of the bottlenecks in existing Transformer architectures and their similarities and differences with previous convolutional models; (ii) implications of Transformer architecture on hardware, including the impact of non-linear operations such as Layer Normalization, Softmax, and GELU, as well as linear operations, on hardware design; (iii) approaches for optimizing a fixed Transformer architecture; (iv) challenges in finding the right mapping and scheduling of operations for Transformer models; and (v) approaches for optimizing Transformer models by adapting the architecture using neural architecture search.

Neural Architecture Search Scheduling

Paper
Add Code

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

no code implementations • 5 May 2021 • Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, Yakun Sophia Shao

Recent advances in Deep Neural Networks (DNNs) have led to active development of specialized DNN accelerators, many of which feature a large number of processing elements laid out spatially, together with a multi-level memory hierarchy and flexible interconnect.

Navigate Scheduling

Paper
Add Code

HAO: Hardware-aware neural Architecture Optimization for Efficient Inference

no code implementations • 26 Apr 2021 • Zhen Dong, Yizhao Gao, Qijing Huang, John Wawrzynek, Hayden K. H. So, Kurt Keutzer

Automatic algorithm-hardware co-design for DNN has shown great success in improving the performance of DNNs on FPGAs.

Hardware Aware Neural Architecture Search Image Classification +2

Paper
Add Code

HAWQV3: Dyadic Neural Network Quantization

1 code implementation • 20 Nov 2020 • Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael W. Mahoney, Kurt Keutzer

Current low-precision quantization algorithms often have the hidden cost of conversion back and forth from floating point to quantized integer values.

Model Compression Quantization

395

Paper
Code

CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs

3 code implementations • 12 Jun 2020 • Zhen Dong, Dequan Wang, Qijing Huang, Yizhao Gao, Yaohui Cai, Tian Li, Bichen Wu, Kurt Keutzer, John Wawrzynek

Deploying deep learning models on embedded systems has been challenging due to limited computing resources.

Image Classification Novel Object Detection +3

Paper
Code

ProTuner: Tuning Programs with Monte Carlo Tree Search

no code implementations • 27 May 2020 • Ameer Haj-Ali, Hasan Genc, Qijing Huang, William Moses, John Wawrzynek, Krste Asanović, Ion Stoica

We explore applying the Monte Carlo Tree Search (MCTS) algorithm in a notoriously difficult task: tuning programs for high-performance deep learning and image processing.

Scheduling

Paper
Add Code

AutoPhase: Juggling HLS Phase Orderings in Random Forests with Deep Reinforcement Learning

1 code implementation • 2 Mar 2020 • Qijing Huang, Ameer Haj-Ali, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek

We compare the performance of AutoPhase to state-of-the-art algorithms that address the phase-ordering problem.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Algorithm-hardware Co-design for Deformable Convolution

2 code implementations • 19 Feb 2020 • Qijing Huang, Dequan Wang, Yizhao Gao, Yaohui Cai, Zhen Dong, Bichen Wu, Kurt Keutzer, John Wawrzynek

In this work, we first investigate the overhead of the deformable convolution on embedded FPGA SoCs, and then show the accuracy-latency tradeoffs for a set of algorithm modifications including full versus depthwise, fixed-shape, and limited-range.

Image Classification Instance Segmentation +4

Paper
Code

AutoCkt: Deep Reinforcement Learning of Analog Circuit Designs

1 code implementation • 6 Jan 2020 • Keertana Settaluri, Ameer Haj-Ali, Qijing Huang, Kourosh Hakhamaneshi, Borivoje Nikolic

Domain specialization under energy constraints in deeply-scaled CMOS has been driving the need for agile development of Systems on a Chip (SoCs).

Signal Processing

Paper
Code

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim

1 code implementation • 5 Mar 2019 • Farzad Farshchi, Qijing Huang, Heechul Yun

We then evaluate the performance of NVDLA by running YOLOv3 object-detection algorithm.

object-detection Object Detection

154

Paper
Code

AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning

1 code implementation • 15 Jan 2019 • Ameer Haj-Ali, Qijing Huang, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek

We implement a framework in the context of the LLVM compiler to optimize the ordering for HLS programs and compare the performance of deep reinforcement learning to state-of-the-art algorithms that address the phase-ordering problem.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

1 code implementation • 21 Nov 2018 • Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, Kurt Keutzer

DiracDeltaNet achieves competitive accuracy on ImageNet (88. 7\% top-5), but with 42$\times$ fewer parameters and 48$\times$ fewer OPs than VGG16.

Paper
Code

FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud

1 code implementation • 45th ACM/IEEE International Symposium on Computer Architecture (ISCA 2018) 2018 • Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, Qijing Huang, Kyle Kovacs, Borivoje Nikolic, Randy Katz, Jonathan Bachrach, Krste Asanovic

We present FireSim, an open-source simulation platform that enables cycle-exact microarchitectural simulation of large scale-out clusters by combining FPGA-accelerated simulation of silicon-proven RTL designs with a scalable, distributed network simulation.

815

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.