Search Results for author: Matthew Mattina

Found 34 papers, 8 papers with code

Design Principles for Lifelong Learning AI Accelerators

no code implementations • 5 Oct 2023 • Dhireesha Kudithipudi, Anurag Daram, Abdullah M. Zyarah, Fatima Tuz Zohora, James B. Aimone, Angel Yanguas-Gil, Nicholas Soures, Emre Neftci, Matthew Mattina, Vincenzo Lomonaco, Clare D. Thiem, Benjamin Epstein

Lifelong learning - an agent's ability to learn throughout its lifetime - is a hallmark of biological learning systems and a central challenge for artificial intelligence (AI).

Paper
Add Code

UDC: Unified DNAS for Compressible TinyML Models

no code implementations • 15 Jan 2022 • Igor Fedorov, Ramon Matas, Hokchhay Tann, Chuteng Zhou, Matthew Mattina, Paul Whatmough

Deploying TinyML models on low-cost IoT hardware is very challenging, due to limited device memory capacity.

Model Compression Neural Architecture Search +2

Paper
Add Code

Federated Learning Based on Dynamic Regularization

3 code implementations • ICLR 2021 • Durmus Alp Emre Acar, Yue Zhao, Ramon Matas Navarro, Matthew Mattina, Paul N. Whatmough, Venkatesh Saligrama

We propose a novel federated learning method for distributively training neural network models, where the server orchestrates cooperation between a subset of randomly chosen devices in each round.

Federated Learning

1,147

Paper
Code

Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification

no code implementations • 13 Aug 2021 • Shyam A. Tailor, René de Jong, Tiago Azevedo, Matthew Mattina, Partha Maji

In recent years graph neural network (GNN)-based approaches have become a popular strategy for processing point cloud data, regularly achieving state-of-the-art performance on a variety of tasks.

Mixed Reality

Paper
Add Code

S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration

no code implementations • 16 Jul 2021 • Zhi-Gang Liu, Paul N. Whatmough, Yuhao Zhu, Matthew Mattina

We propose to exploit structured sparsity, more specifically, Density Bound Block (DBB) sparsity for both weights and activations.

Paper
Add Code

On the Effects of Quantisation on Model Uncertainty in Bayesian Neural Networks

1 code implementation • 22 Feb 2021 • Martin Ferianc, Partha Maji, Matthew Mattina, Miguel Rodrigues

Bayesian neural networks (BNNs) are making significant progress in many research areas where decision-making needs to be accompanied by uncertainty estimation.

Autonomous Driving Decision Making

Paper
Code

Doping: A technique for efficient compression of LSTM models using sparse structured additive matrices

no code implementations • 14 Feb 2021 • Urmish Thakker, Paul N. Whatmough, ZhiGang Liu, Matthew Mattina, Jesse Beu

Additionally, results with doped kronecker product matrices demonstrate state-of-the-art accuracy at large compression factors (10 - 25x) across 4 natural language processing applications with minor loss in accuracy.

Paper
Add Code

Information contraction in noisy binary neural networks and its implications

no code implementations • 28 Jan 2021 • Chuteng Zhou, Quntao Zhuang, Matthew Mattina, Paul N. Whatmough

Our SDPI can be applied to various information processing systems, including neural networks and cellular automata.

Image Classification object-detection +1

Paper
Add Code

MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers

1 code implementation • 21 Oct 2020 • Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas Navarro, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, Paul N. Whatmough

To address this challenge, neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints.

Ranked #1 on Keyword Spotting on Google Speech Commands V2 12

Anomaly Detection Keyword Spotting +1

177

Paper
Code

Rank and run-time aware compression of NLP Applications

no code implementations • EMNLP (sustainlp) 2020 • Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina

We evaluate the impact of this technique on 5 NLP benchmarks across multiple tasks (Translation, Intent Detection, Language Modeling) and show that for similar accuracy values and compression factors, HMF can achieve more than 2. 32x faster inference run-time than pruning and 16. 77% better accuracy than LMF.

Intent Detection Language Modelling +1

Paper
Add Code

Stochastic-YOLO: Efficient Probabilistic Object Detection under Dataset Shifts

1 code implementation • 7 Sep 2020 • Tiago Azevedo, René de Jong, Matthew Mattina, Partha Maji

In this paper, we adapt the well-established YOLOv3 architecture to generate uncertainty estimations by introducing stochasticity in the form of Monte Carlo Dropout (MC-Drop), and evaluate it across different levels of dataset shift.

Image Classification Object +2

Paper
Code

Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration

no code implementations • 4 Sep 2020 • Zhi-Gang Liu, Paul N. Whatmough, Matthew Mattina

In this paper, we address a key architectural challenge with structural sparsity: how to provide support for a range of sparsity levels while maintaining high utilization of the hardware.

Paper
Add Code

High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands

no code implementations • 3 Aug 2020 • Dibakar Gope, Jesse Beu, Matthew Mattina

While existing SIMD matrix multiplication instructions for symmetric bit-width operands can support operands of mixed precision by zero- or sign-extending the narrow operand to match the size of the other operands, they cannot exploit the benefit of narrow bit-width of one of the operands.

BIG-bench Machine Learning Vocal Bursts Intensity Prediction

Paper
Add Code

Efficient Residue Number System Based Winograd Convolution

no code implementations • ECCV 2020 • Zhi-Gang Liu, Matthew Mattina

Prior research has shown that Winograd algorithm can reduce the computational complexity of convolutional neural networks (CNN) with weights and activations represented in floating point.

Paper
Add Code

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

1 code implementation • 20 May 2020 • Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, Paul N. Whatmough

Modern speech enhancement algorithms achieve remarkable noise suppression by means of large recurrent neural networks (RNNs).

Model Compression Quantization +1

Paper
Code

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference

no code implementations • 16 May 2020 • Zhi-Gang Liu, Paul N. Whatmough, Matthew Mattina

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM).

Paper
Add Code

Searching for Winograd-aware Quantized Networks

1 code implementation • 25 Feb 2020 • Javier Fernandez-Marques, Paul N. Whatmough, Andrew Mundy, Matthew Mattina

Lightweight architectural designs of Convolutional Neural Networks (CNNs) together with quantization have paved the way for the deployment of demanding computer vision applications on mobile devices.

Neural Architecture Search Quantization

Paper
Code

Compressing Language Models using Doped Kronecker Products

no code implementations • 24 Jan 2020 • Urmish Thakker, Paul N. Whatmough, Zhi-Gang Liu, Matthew Mattina, Jesse Beu

Kronecker Products (KP) have been used to compress IoT RNN Applications by 15-38x compression factors, achieving better results than traditional compression methods.

Language Modelling Large Language Model

Paper
Add Code

Noisy Machines: Understanding Noisy Neural Networks and Enhancing Robustness to Analog Hardware Errors Using Distillation

no code implementations • 14 Jan 2020 • Chuteng Zhou, Prad Kadambi, Matthew Mattina, Paul N. Whatmough

Hence, for successful deployment on analog accelerators, it is essential to be able to train deep neural networks to be robust to random continuous noise in the network weights, which is a somewhat new challenge in machine learning.

Knowledge Distillation

Paper
Add Code

ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems

no code implementations • 18 Nov 2019 • Patrick Hansen, Alexey Vilkin, Yury Khrustalev, James Imber, David Hanwell, Matthew Mattina, Paul N. Whatmough

In this work, we investigate the efficacy of the ISP in CNN classification tasks, and outline the system-level trade-offs between prediction accuracy and computational cost.

Paper
Add Code

Ternary MobileNets via Per-Layer Hybrid Filter Banks

no code implementations • 4 Nov 2019 • Dibakar Gope, Jesse Beu, Urmish Thakker, Matthew Mattina

Using this proposed quantization method, we quantized a substantial portion of weight filters of MobileNets to ternary values resulting in 27. 98% savings in energy, and a 51. 07% reduction in the model size, while achieving comparable accuracy and no degradation in throughput on specialized hardware in comparison to the baseline full-precision MobileNets.

Quantization

Paper
Add Code

Pushing the limits of RNN Compression

no code implementations • 4 Oct 2019 • Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou, Ganesh Dasika, Matthew Mattina

This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP).

Paper
Add Code

Run-Time Efficient RNN Compression for Inference on Edge Devices

no code implementations • 12 Jun 2019 • Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina

Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints.

Edge-computing

Paper
Add Code

Compressing RNNs for IoT devices by 15-38x using Kronecker Products

no code implementations • 7 Jun 2019 • Urmish Thakker, Jesse Beu, Dibakar Gope, Chu Zhou, Igor Fedorov, Ganesh Dasika, Matthew Mattina

Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy.

Paper
Add Code

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers

no code implementations • NeurIPS 2019 • Igor Fedorov, Ryan P. Adams, Matthew Mattina, Paul N. Whatmough

The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment.

BIG-bench Machine Learning Neural Architecture Search

Paper
Add Code

Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications

no code implementations • 4 Mar 2019 • Dibakar Gope, Ganesh Dasika, Matthew Mattina

Machine learning-based applications are increasingly prevalent in IoT devices.

Keyword Spotting Quantization

Paper
Add Code

Efficient Winograd or Cook-Toom Convolution Kernel Implementation on Widely Used Mobile CPUs

no code implementations • 4 Mar 2019 • Partha Maji, Andrew Mundy, Ganesh Dasika, Jesse Beu, Matthew Mattina, Robert Mullins

The Winograd or Cook-Toom class of algorithms help to reduce the overall compute complexity of many modern deep convolutional neural networks (CNNs).

Paper
Add Code

Learning low-precision neural networks without Straight-Through Estimator(STE)

no code implementations • 4 Mar 2019 • Zhi-Gang Liu, Matthew Mattina

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding.

Quantization

Paper
Add Code

FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning

1 code implementation • 27 Feb 2019 • Paul N. Whatmough, Chuteng Zhou, Patrick Hansen, Shreyas Kolala Venkataramanaiah, Jae-sun Seo, Matthew Mattina

Over a suite of six datasets we trained models via transfer learning with an accuracy loss of $<1\%$ resulting in up to 11. 2 TOPS/W - nearly $2 \times$ more efficient than a conventional programmable CNN accelerator of the same area.

General Classification Image Classification +1

Paper
Code

Efficient and Robust Machine Learning for Real-World Systems

no code implementations • 5 Dec 2018 • Franz Pernkopf, Wolfgang Roth, Matthias Zoehrer, Lukas Pfeifenberger, Guenther Schindler, Holger Froening, Sebastian Tschiatschek, Robert Peharz, Matthew Mattina, Zoubin Ghahramani

In that way, we provide an extensive overview of the current state-of-the-art of robust and efficient machine learning for real-world systems.

Autonomous Navigation BIG-bench Machine Learning

Paper
Add Code

Energy Efficient Hardware for On-Device CNN Inference via Transfer Learning

no code implementations • 4 Dec 2018 • Paul Whatmough, Chuteng Zhou, Patrick Hansen, Matthew Mattina

On-device CNN inference for real-time computer vision applications can result in computational demands that far exceed the energy budgets of mobile devices.

Image Classification Transfer Learning

Paper
Add Code

SCALE-Sim: Systolic CNN Accelerator

8 code implementations • 16 Oct 2018 • Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, Tushar Krishna

Systolic Arrays are one of the most popular compute substrates within Deep Learning accelerators today, as they provide extremely high efficiency for running dense matrix multiplications.

Distributed, Parallel, and Cluster Computing Hardware Architecture

312

Paper
Code

Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision

no code implementations • 29 Mar 2018 • Yuhao Zhu, Anand Samajdar, Matthew Mattina, Paul Whatmough

Specifically, we propose to expose the motion data that is naturally generated by the Image Signal Processor (ISP) early in the vision pipeline to the CNN engine.

Paper
Add Code

Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective

no code implementations • 19 Jan 2018 • Yuhao Zhu, Matthew Mattina, Paul Whatmough

Machine learning is playing an increasingly significant role in emerging mobile application domains such as AR/VR, ADAS, etc.

BIG-bench Machine Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.