Search Results for author: Xuehai Qian

Found 23 papers, 3 papers with code

Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems

no code implementations • 9 Jan 2024 • Qinyi Luo, Penghan Wang, Wei zhang, Fan Lai, Jiachen Mao, Xiaohan Wei, Jun Song, Wei-Yu Tsai, Shuai Yang, Yuxi Hu, Xuehai Qian

Huge embedding tables in modern Deep Learning Recommender Models (DLRM) require prohibitively large memory during training and inference.

Click-Through Rate Prediction Recommendation Systems

Paper
Add Code

RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training

no code implementations • 27 Nov 2023 • Hanrui Wang, Yilian Liu, Pengyu Liu, Jiaqi Gu, Zirui Li, Zhiding Liang, Jinglei Cheng, Yongshan Ding, Xuehai Qian, Yiyu Shi, David Z. Pan, Frederic T. Chong, Song Han

Arbitrary state preparation algorithms can be broadly categorized into arithmetic decomposition (AD) and variational quantum state preparation (VQSP).

Paper
Add Code

GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism

no code implementations • 19 Aug 2023 • Jingji Chen, Zhuoming Chen, Xuehai Qian

Communication is a key bottleneck for distributed graph neural network (GNN) training.

Paper
Add Code

QuEst: Graph Transformer for Quantum Circuit Reliability Estimation

1 code implementation • 30 Oct 2022 • Hanrui Wang, Pengyu Liu, Jinglei Cheng, Zhiding Liang, Jiaqi Gu, Zirui Li, Yongshan Ding, Weiwen Jiang, Yiyu Shi, Xuehai Qian, David Z. Pan, Frederic T. Chong, Song Han

Specifically, the TorchQuantum library also supports using data-driven ML models to solve problems in quantum system research, such as predicting the impact of quantum noise on circuit fidelity and improving the quantum circuit compilation efficiency.

1,195

Paper
Code

NAPA: Intermediate-level Variational Native-pulse Ansatz for Variational Quantum Algorithms

no code implementations • 2 Aug 2022 • Zhiding Liang, Jinglei Cheng, Hang Ren, Hanrui Wang, Fei Hua, Zhixin Song, Yongshan Ding, Fred Chong, Song Han, Xuehai Qian, Yiyu Shi

Therefore, we propose NAPA, a native-pulse ansatz generator framework for VQAs.

Neural Architecture Search Visual Question Answering (VQA)

Paper
Add Code

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

no code implementations • 25 Aug 2021 • Wei Niu, Zhengang Li, Xiaolong Ma, Peiyan Dong, Gang Zhou, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

It necessitates the sparse model inference via weight pruning, i. e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilitate real-time inference on mobile devices while preserving a high sparse model accuracy.

Code Generation Compiler Optimization

Paper
Add Code

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

no code implementations • 16 Jun 2021 • Geng Yuan, Payman Behnam, Zhengang Li, Ali Shafiee, Sheng Lin, Xiaolong Ma, Hang Liu, Xuehai Qian, Mahdi Nazm Bojnordi, Yanzhi Wang, Caiwen Ding

With weights stored in the ReRAM crossbar cells as conductance, when the input vector is applied to word lines, the matrix-vector multiplication results can be generated as the current in bit lines.

Paper
Add Code

HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation

1 code implementation • 4 May 2021 • Qingcheng Xiao, Size Zheng, Bingzhe Wu, Pengcheng Xu, Xuehai Qian, Yun Liang

Second, the overall design space composed of HW/SW partitioning, hardware optimization, and software optimization is huge.

Bayesian Optimization Q-Learning

Paper
Code

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

no code implementations • 8 Dec 2020 • Sung-En Chang, Yanyu Li, Mengshu Sun, Runbin Shi, Hayden K. -H. So, Xuehai Qian, Yanzhi Wang, Xue Lin

Unlike existing methods that use the same quantization scheme for all weights, we propose the first solution that applies different quantization schemes for different rows of the weight matrix.

Edge-computing Model Compression +1

Paper
Add Code

PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices

no code implementations • 23 Apr 2020 • Chunhua Deng, Siyu Liao, Yi Xie, Keshab K. Parhi, Xuehai Qian, Bo Yuan

On the other hand, the recent structured matrix-based approach (i. e., CirCNN) is limited by the relatively complex arithmetic computation (i. e., FFT), less flexible compression ratio, and its inability to fully utilize input sparsity.

Model Compression

Paper
Add Code

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

no code implementations • 1 Jan 2020 • Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss.

Code Generation Model Compression

Paper
Add Code

Heterogeneity-Aware Asynchronous Decentralized Training

no code implementations • 17 Sep 2019 • Qinyi Luo, Jiaao He, Youwei Zhuo, Xuehai Qian

Is it possible to get the best of both worlds - designing a distributed training method that has both high performance as All-Reduce in homogeneous environment and good heterogeneity tolerance as AD-PSGD?

Scheduling

Paper
Add Code

A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology

no code implementations • 22 Jul 2019 • Ruizhe Cai, Ao Ren, Olivia Chen, Ning Liu, Caiwen Ding, Xuehai Qian, Jie Han, Wenhui Luo, Nobuyuki Yoshikawa, Yanzhi Wang

Further, the application of SC has been investigated in DNNs in prior work, and the suitability has been illustrated as SC is more compatible with approximate computations.

Paper
Add Code

Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?

no code implementations • 3 Jul 2019 • Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, Xue Lin, Kaisheng Ma, Yanzhi Wang

Based on the proposed comparison framework, with the same accuracy and quantization, the results show that non-structrued pruning is not competitive in terms of both storage and computation efficiency.

Model Compression Quantization

Paper
Add Code

Hop: Heterogeneity-Aware Decentralized Training

no code implementations • 4 Feb 2019 • Qinyi Luo, JinKun Lin, Youwei Zhuo, Xuehai Qian

Based on a unique characteristic of decentralized training that we have identified, the iteration gap, we propose a queue-based synchronization mechanism that can efficiently implement backup workers and bounded staleness in the decentralized setting.

Paper
Add Code

HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array

no code implementations • 7 Jan 2019 • Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Li, Yiran Chen

In this paper, inspired by recent work in machine learning systems, we propose a solution HyPar to determine layer-wise parallelism for deep neural network training with an array of DNN accelerators.

Paper
Add Code

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers

1 code implementation • 31 Dec 2018 • Ao Ren, Tianyun Zhang, Shaokai Ye, Jiayu Li, Wenyao Xu, Xuehai Qian, Xue Lin, Yanzhi Wang

The first part of ADMM-NN is a systematic, joint framework of DNN weight pruning and quantization using ADMM.

Model Compression Quantization

Paper
Code

E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs

no code implementations • 12 Dec 2018 • Zhe Li, Caiwen Ding, Siyue Wang, Wujie Wen, Youwei Zhuo, Chang Liu, Qinru Qiu, Wenyao Xu, Xue Lin, Xuehai Qian, Yanzhi Wang

It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

no code implementations • 18 Feb 2018 • Yanzhi Wang, Caiwen Ding, Zhe Li, Geng Yuan, Siyu Liao, Xiaolong Ma, Bo Yuan, Xuehai Qian, Jian Tang, Qinru Qiu, Xue Lin

Hardware accelerations of deep learning systems have been extensively investigated in industry and academia.

Paper
Add Code

VIBNN: Hardware Acceleration of Bayesian Neural Networks

no code implementations • 2 Feb 2018 • Ruizhe Cai, Ao Ren, Ning Liu, Caiwen Ding, Luhao Wang, Xuehai Qian, Massoud Pedram, Yanzhi Wang

In this paper, we propose VIBNN, an FPGA-based hardware accelerator design for variational inference on BNNs.

Small Data Image Classification Variational Inference

Paper
Add Code

CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices

no code implementations • 29 Aug 2017 • Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, Yi-Peng Zhang, Jian Tang, Qinru Qiu, Xue Lin, Bo Yuan

As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy.

Paper
Add Code

GraphR: Accelerating Graph Processing Using ReRAM

no code implementations • 21 Aug 2017 • Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Li, Yiran Chen

GRAPHR gains a speedup of 1. 16x to 4. 12x, and is 3. 67x to 10. 96x more energy efficiency compared to PIM-based architecture.

Distributed, Parallel, and Cluster Computing Hardware Architecture

Paper
Add Code

SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing

no code implementations • 18 Nov 2016 • Ao Ren, Ji Li, Zhe Li, Caiwen Ding, Xuehai Qian, Qinru Qiu, Bo Yuan, Yanzhi Wang

Stochastic Computing (SC), which uses bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has a high potential for implementing DCNNs with high scalability and ultra-low hardware footprint.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.