Search Results for author: Sheng-Chun Kao

Found 13 papers, 6 papers with code

NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads

no code implementations • 17 Apr 2024 • Rachid Karami, Hemanth Kota, Sheng-Chun Kao, Hyoukjun Kwon

Therefore, significant effort has been put to study and optimize the GEMM operators in order to speed up the execution of ML models.

Paper
Add Code

Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers

1 code implementation • 7 Feb 2024 • Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna

In this work, we study the effectiveness of existing sparse training recipes at \textit{high-sparsity regions} and argue that these methods fail to sustain the model quality on par with low-sparsity regions.

Paper
Code

JaxPruner: A concise library for sparsity research

1 code implementation • 27 Apr 2023 • Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research.

196

Paper
Code

Demystifying Map Space Exploration for NPUs

1 code implementation • 7 Oct 2022 • Sheng-Chun Kao, Angshuman Parashar, Po-An Tsai, Tushar Krishna

Map Space Exploration is the problem of finding optimized mappings of a Deep Neural Network (DNN) model on an accelerator.

Navigate Neural Architecture Search

Paper
Code

Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask

no code implementations • 15 Sep 2022 • Sheng-Chun Kao, Amir Yazdanbakhsh, Suvinay Subramanian, Shivani Agrawal, Utku Evci, Tushar Krishna

In this work, we focus on N:M sparsity and extensively study and evaluate various training recipes for N:M sparsity in terms of the trade-off between model accuracy and compute cost (FLOPs).

Paper
Add Code

DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators

2 code implementations • 26 Jan 2022 • Sheng-Chun Kao, Michael Pellauer, Angshuman Parashar, Tushar Krishna

The design of DNN accelerators includes two key parts: HW resource configuration and mapping strategy.

Paper
Code

DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators

no code implementations • 26 Jan 2022 • Sheng-Chun Kao, Xiaoyu Huang, Tushar Krishna

Dataflow/mapping decides the compute and energy efficiency of DNN accelerators.

Language Modelling

Paper
Add Code

FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks

no code implementations • 13 Jul 2021 • Sheng-Chun Kao, Suvinay Subramanian, Gaurav Agrawal, Amir Yazdanbakhsh, Tushar Krishna

In contrast, FLAT unblocks transformer models for inputs with up to 64K elements

Paper
Add Code

MAGMA: An Optimization Framework for Mapping Multiple DNNs on Multiple Accelerator Cores

no code implementations • 28 Apr 2021 • Sheng-Chun Kao, Tushar Krishna

In particular, we focus on the problem of mapping jobs from several DNNs simultaneously on an accelerator.

Efficient Exploration

Paper
Add Code

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

1 code implementation • 4 Sep 2020 • Sheng-Chun Kao, Geonhwa Jeong, Tushar Krishna

We also augment the RL approach with a genetic algorithm for further fine-tuning.

Bayesian Optimization reinforcement-learning +1

Paper
Code

Conditional Neural Architecture Search

no code implementations • 6 Jun 2020 • Sheng-Chun Kao, Arun Ramamurthy, Reed Williams, Tushar Krishna

Designing resource-efficient Deep Neural Networks (DNNs) is critical to deploy deep learning solutions over edge platforms due to diverse performance, power, and memory budgets.

Neural Architecture Search

Paper
Add Code

Generative Design of Hardware-aware DNNs

no code implementations • 6 Jun 2020 • Sheng-Chun Kao, Arun Ramamurthy, Tushar Krishna

We propose a new way for autonomous quantization and HW-aware tuning.

Quantization

Paper
Add Code

Reinforcement Learning based Interconnection Routing for Adaptive Traffic Optimization

2 code implementations • 13 Aug 2019 • Sheng-Chun Kao, Chao-Han Huck Yang, Pin-Yu Chen, Xiaoli Ma, Tushar Krishna

In this work, we demonstrate the promise of applying reinforcement learning (RL) to optimize NoC runtime performance.

BIG-bench Machine Learning reinforcement-learning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.