Accelerating Neural Network Inference by Overflow Aware Quantization

27 May 2020Hongwei XieShuo ZhangHuanghao DingYafei SongBaitao ShaoConggang HuLing CaiMingyang Li

The inherent heavy computation of deep neural networks prevents their widespread applications. A widely used method for accelerating model inference is quantization, by replacing the input operands of a network using fixed-point values... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet