Search Results for author: Bradley McDanel

Found 11 papers, 5 papers with code

Accelerating Vision Transformer Training via a Patch Sampling Schedule

1 code implementation19 Aug 2022 Bradley McDanel, Chi Phuong Huynh

For the pre-trained model, we achieve a 0. 26% reduction in classification accuracy for a 31% reduction in training time (from 25 to 17 hours) compared to using all patches each iteration.

Accelerating DNN Training with Structured Data Gradient Pruning

1 code implementation1 Feb 2022 Bradley McDanel, Helia Dinh, John Magallanes

However, most weight pruning techniques generally does not speed up DNN training and can even require more iterations to reach model convergence.

FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

no code implementations28 Oct 2021 Sai Qian Zhang, Bradley McDanel, H. T. Kung

Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values.

Quantization

Efficient Winning Tickets Drawing over Fine-Grained Structured Sparsity

no code implementations29 Sep 2021 Sai Qian Zhang, Bradley McDanel

By leveraging the N:M sparsity constraint, we can identify the unimportant weights across each group of M weights at earlier stages of iterative pruning, which significantly lowers the cost of iterative training compared to conventional unstructured pruning.

Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

no code implementations13 Jul 2020 H. T. Kung, Bradley McDanel, Sai Qian Zhang

To perform conversion from binary to SDR, we develop an efficient encoding method called HESE (Hybrid Encoding for Signed Expressions) that can be performed in one pass looking at only two bits at a time.

Quantization

Full-stack Optimization for Accelerating CNNs with FPGA Validation

no code implementations1 May 2019 Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong

A highlight of our full-stack approach which attributes to the achieved high energy efficiency is an efficient Selector-Accumulator (SAC) architecture for implementing the multiplier-accumulator (MAC) operation present in any digital CNN hardware.

Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

no code implementations7 Nov 2018 H. T. Kung, Bradley McDanel, Sai Qian Zhang

We study the effectiveness of this joint optimization for both high utilization and classification accuracy with ASIC and FPGA designs based on efficient bit-serial implementations of multiplier-accumulators.

Incomplete Dot Products for Dynamic Computation Scaling in Neural Network Inference

no code implementations21 Oct 2017 Bradley McDanel, Surat Teerapittayanon, H. T. Kung

At inference time, the number of channels used can be dynamically adjusted to trade off accuracy for lowered power consumption and reduced latency by selecting only a beginning subset of channels.

Image Classification

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

2 code implementations6 Sep 2017 Surat Teerapittayanon, Bradley McDanel, H. T. Kung

Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer.

Distributed Deep Neural Networks over the Cloud, the Edge and End Devices

1 code implementation6 Sep 2017 Surat Teerapittayanon, Bradley McDanel, H. T. Kung

In our experiment, compared with the traditional method of offloading raw sensor data to be processed in the cloud, DDNN locally processes most sensor data on end devices while achieving high accuracy and is able to reduce the communication cost by a factor of over 20x.

Distributed Computing Object Recognition +1

Embedded Binarized Neural Networks

2 code implementations6 Sep 2017 Bradley McDanel, Surat Teerapittayanon, H. T. Kung

Beyond minimizing the memory required to store weights, as in a BNN, we show that it is essential to minimize the memory used for temporaries which hold intermediate results between layers in feedforward inference.

Cannot find the paper you are looking for? You can Submit a new open access paper.