Search Results for author: Jungwook Choi

Found 23 papers, 7 papers with code

Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization

no code implementations • 9 Nov 2023 • Jangwhan Lee, Minsoo Kim, SeungCheol Baek, Seok Joong Hwang, Wonyong Sung, Jungwook Choi

Large Language Models (LLMs) are proficient in natural language processing tasks, but their deployment is often restricted by extensive parameter sizes and computational demands.

Computational Efficiency Quantization

Paper
Add Code

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

1 code implementation • NeurIPS 2023 • Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang, Wonyong Sung, Jungwook Choi

Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning.

Arithmetic Reasoning Common Sense Reasoning +4

Paper
Code

SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving

no code implementations • 12 May 2023 • Minjae Lee, Seongmin Park, Hyungmin Kim, Minyong Yoon, Janghwan Lee, Jun Won Choi, Nam Sung Kim, Mingu Kang, Jungwook Choi

3D object detection using point cloud (PC) data is essential for perception pipelines of autonomous driving, where efficient encoding is key to meeting stringent resource and latency requirements.

3D Object Detection Autonomous Driving +2

Paper
Add Code

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

1 code implementation • 23 Feb 2023 • Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, Jungwook Choi

Pre-trained Transformer models such as BERT have shown great success in a wide range of applications, but at the cost of substantial increases in model complexity.

Knowledge Distillation Quantization

Paper
Code

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

no code implementations • 29 Jan 2023 • Kyuhong Shim, Jungwook Choi, Wonyong Sung

In this paper, we provide a comprehensive study on attention map reuse focusing on its ability to accelerate inference.

speech-recognition Speech Recognition

Paper
Add Code

Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization

no code implementations • 21 Dec 2022 • Seongmin Park, Beomseok Kwon, Jieun Lim, Kyuyoung Sim, Tae-Ho Kim, Jungwook Choi

Uniform-precision neural network quantization has gained popularity since it simplifies densely packed arithmetic unit for high computing capability.

Neural Architecture Search Quantization

Paper
Add Code

Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

1 code implementation • 20 Nov 2022 • Minsoo Kim, Sihwa Lee, Sukjin Hong, Du-Seong Chang, Jungwook Choi

In particular, KD has been employed in quantization-aware training (QAT) of Transformer encoders like BERT to improve the accuracy of the student model with the reduced-precision weight parameters.

Knowledge Distillation Model Compression +1

Paper
Code

Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores

no code implementations • 11 Feb 2022 • Junkyeong Choi, Hyucksung Kwon, Woongkyu Lee, Jungwook Choi, Jieun Lim

In this method, we devise a search space that explores the thread tile and warp sizes to increase the data reuse despite a large matrix operand of reduced-precision MMA.

Scheduling

Paper
Add Code

Thermal Face Detection for High-Speed AI Thermometer

1 code implementation • 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC) 2022 • Woongkyu Lee, Hyucksung Kwon, Jungwook Choi

However, the computation-demanding nature of DNNs, along with the time-consuming fusion of video and thermal camera frames, raises hurdles for the cost-effective deployment of such AI thermometer systems.

Face Detection object-detection +2

Paper
Code

NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

no code implementations • 3 Dec 2021 • Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, Dong Hyun Lee, Jungwook Choi

Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models.

Paper
Add Code

TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference

1 code implementation • 2021 IEEE Workshop on Signal Processing Systems (SiPS) 2021 • Seokhyeon Choi, Kyuhong Shim, Jungwook Choi, Wonyong Sung, Byonghyo Shim

We propose TernGEMM, a special GEMM library using SIMD instructions for Deep Neural Network (DNN) inference with ternary weights and activations under 8-bit.

Paper
Code

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

1 code implementation • 2021 18th International SoC Design Conference (ISOCC) 2021 • Kyuhong Shim, Iksoo Choi, Wonyong Sung, Jungwook Choi

While Transformer-based models have shown impressive language modeling performance, the large computation cost is often prohibitive for practical use.

Language Modelling

Paper
Code

Understanding the Role of Self Attention for Efficient Speech Recognition

no code implementations • ICLR 2022 • Kyuhong Shim, Jungwook Choi, Wonyong Sung

Self-attention (SA) is a critical component of Transformer neural networks that have succeeded in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Robust Machine Learning Systems: Challenges, Current Trends, Perspectives, and the Road Ahead

no code implementations • 4 Jan 2021 • Muhammad Shafique, Mahum Naseer, Theocharis Theocharides, Christos Kyrkou, Onur Mutlu, Lois Orosa, Jungwook Choi

Machine Learning (ML) techniques have been rapidly adopted by smart Cyber-Physical Systems (CPS) and Internet-of-Things (IoT) due to their powerful decision-making capabilities.

BIG-bench Machine Learning Decision Making

Paper
Add Code

Uniform-Precision Neural Network Quantization via Neural Channel Expansion

no code implementations • 1 Jan 2021 • Seongmin Park, Beomseok Kwon, Kyuyoung Sim, Jieun Lim, Tae-Ho Kim, Jungwook Choi

Uniform-precision neural network quantization has gained popularity thanks to its simple arithmetic unit densely packed for high computing capability.

Neural Architecture Search Quantization

Paper
Add Code

Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks

no code implementations • 30 Sep 2020 • Yoonho Boo, Sungho Shin, Jungwook Choi, Wonyong Sung

In this study, we propose stochastic precision ensemble training for QDNNs (SPEQ).

Image Classification Quantization +3

Paper
Add Code

Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks

no code implementations • NeurIPS 2019 • Xiao Sun, Jungwook Choi, Chia-Yu Chen, Naigang Wang, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Xiaodong Cui, Wei zhang, Kailash Gopalakrishnan

Reducing the numerical precision of data and computation is extremely effective in accelerating deep learning training workloads.

Image Classification object-detection +1

Paper
Add Code

The Sooner The Better: Investigating Structure of Early Winning Lottery Tickets

no code implementations • 25 Sep 2019 • Shihui Yin, Kyu-Hyoun Kim, Jinwook Oh, Naigang Wang, Mauricio Serrano, Jae-sun Seo, Jungwook Choi

In the case of ResNet50 on ImageNet, this comes to the winning ticket of 75:02% Top-1 accuracy at 80% pruning rate in only 22% of the total epochs for iterative pruning.

Memorization

Paper
Add Code

Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks

no code implementations • ICLR 2019 • Charbel Sakr, Naigang Wang, Chia-Yu Chen, Jungwook Choi, Ankur Agrawal, Naresh Shanbhag, Kailash Gopalakrishnan

Observing that a bad choice for accumulation precision results in loss of information that manifests itself as a reduction in variance in an ensemble of partial sums, we derive a set of equations that relate this variance to the length of accumulation and the minimum number of bits needed for accumulation.

Paper
Add Code

Training Deep Neural Networks with 8-bit Floating Point Numbers

no code implementations • NeurIPS 2018 • Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, Kailash Gopalakrishnan

The state-of-the-art hardware platforms for training Deep Neural Networks (DNNs) are moving from traditional single precision (32-bit) computations towards 16 bits of precision -- in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations.

Paper
Add Code

Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)

no code implementations • 17 Jul 2018 • Jungwook Choi, Pierce I-Jen Chuang, Zhuo Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan

Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost.

General Classification Quantization

Paper
Add Code

PACT: Parameterized Clipping Activation for Quantized Neural Networks

3 code implementations • ICLR 2018 • Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan

We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets.

Quantization

38,826

Paper
Code

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

no code implementations • 7 Dec 2017 • Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei zhang, Kailash Gopalakrishnan

Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained.

Quantization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.