1 code implementation • 19 Aug 2022 • Bradley McDanel, Chi Phuong Huynh
For the pre-trained model, we achieve a 0. 26% reduction in classification accuracy for a 31% reduction in training time (from 25 to 17 hours) compared to using all patches each iteration.
1 code implementation • 1 Feb 2022 • Bradley McDanel, Helia Dinh, John Magallanes
However, most weight pruning techniques generally does not speed up DNN training and can even require more iterations to reach model convergence.
no code implementations • 28 Oct 2021 • Sai Qian Zhang, Bradley McDanel, H. T. Kung
Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values.
no code implementations • 29 Sep 2021 • Sai Qian Zhang, Bradley McDanel
By leveraging the N:M sparsity constraint, we can identify the unimportant weights across each group of M weights at earlier stages of iterative pruning, which significantly lowers the cost of iterative training compared to conventional unstructured pruning.
no code implementations • 13 Jul 2020 • H. T. Kung, Bradley McDanel, Sai Qian Zhang
To perform conversion from binary to SDR, we develop an efficient encoding method called HESE (Hybrid Encoding for Signed Expressions) that can be performed in one pass looking at only two bits at a time.
no code implementations • 1 May 2019 • Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong
A highlight of our full-stack approach which attributes to the achieved high energy efficiency is an efficient Selector-Accumulator (SAC) architecture for implementing the multiplier-accumulator (MAC) operation present in any digital CNN hardware.
no code implementations • 7 Nov 2018 • H. T. Kung, Bradley McDanel, Sai Qian Zhang
We study the effectiveness of this joint optimization for both high utilization and classification accuracy with ASIC and FPGA designs based on efficient bit-serial implementations of multiplier-accumulators.
no code implementations • 21 Oct 2017 • Bradley McDanel, Surat Teerapittayanon, H. T. Kung
At inference time, the number of channels used can be dynamically adjusted to trade off accuracy for lowered power consumption and reduced latency by selecting only a beginning subset of channels.
3 code implementations • 6 Sep 2017 • Surat Teerapittayanon, Bradley McDanel, H. T. Kung
Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer.
1 code implementation • 6 Sep 2017 • Surat Teerapittayanon, Bradley McDanel, H. T. Kung
In our experiment, compared with the traditional method of offloading raw sensor data to be processed in the cloud, DDNN locally processes most sensor data on end devices while achieving high accuracy and is able to reduce the communication cost by a factor of over 20x.
2 code implementations • 6 Sep 2017 • Bradley McDanel, Surat Teerapittayanon, H. T. Kung
Beyond minimizing the memory required to store weights, as in a BNN, we show that it is essential to minimize the memory used for temporaries which hold intermediate results between layers in feedforward inference.