no code implementations • 9 Nov 2023 • Jangwhan Lee, Minsoo Kim, SeungCheol Baek, Seok Joong Hwang, Wonyong Sung, Jungwook Choi
Large Language Models (LLMs) are proficient in natural language processing tasks, but their deployment is often restricted by extensive parameter sizes and computational demands.
1 code implementation • NeurIPS 2023 • Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang, Wonyong Sung, Jungwook Choi
Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning.
no code implementations • 12 May 2023 • Minjae Lee, Seongmin Park, Hyungmin Kim, Minyong Yoon, Janghwan Lee, Jun Won Choi, Nam Sung Kim, Mingu Kang, Jungwook Choi
3D object detection using point cloud (PC) data is essential for perception pipelines of autonomous driving, where efficient encoding is key to meeting stringent resource and latency requirements.
1 code implementation • 23 Feb 2023 • Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, Jungwook Choi
Pre-trained Transformer models such as BERT have shown great success in a wide range of applications, but at the cost of substantial increases in model complexity.
no code implementations • 29 Jan 2023 • Kyuhong Shim, Jungwook Choi, Wonyong Sung
In this paper, we provide a comprehensive study on attention map reuse focusing on its ability to accelerate inference.
no code implementations • 21 Dec 2022 • Seongmin Park, Beomseok Kwon, Jieun Lim, Kyuyoung Sim, Tae-Ho Kim, Jungwook Choi
Uniform-precision neural network quantization has gained popularity since it simplifies densely packed arithmetic unit for high computing capability.
1 code implementation • 20 Nov 2022 • Minsoo Kim, Sihwa Lee, Sukjin Hong, Du-Seong Chang, Jungwook Choi
In particular, KD has been employed in quantization-aware training (QAT) of Transformer encoders like BERT to improve the accuracy of the student model with the reduced-precision weight parameters.
no code implementations • 11 Feb 2022 • Junkyeong Choi, Hyucksung Kwon, Woongkyu Lee, Jungwook Choi, Jieun Lim
In this method, we devise a search space that explores the thread tile and warp sizes to increase the data reuse despite a large matrix operand of reduced-precision MMA.
1 code implementation • 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC) 2022 • Woongkyu Lee, Hyucksung Kwon, Jungwook Choi
However, the computation-demanding nature of DNNs, along with the time-consuming fusion of video and thermal camera frames, raises hurdles for the cost-effective deployment of such AI thermometer systems.
no code implementations • 3 Dec 2021 • Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, Dong Hyun Lee, Jungwook Choi
Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models.
1 code implementation • 2021 IEEE Workshop on Signal Processing Systems (SiPS) 2021 • Seokhyeon Choi, Kyuhong Shim, Jungwook Choi, Wonyong Sung, Byonghyo Shim
We propose TernGEMM, a special GEMM library using SIMD instructions for Deep Neural Network (DNN) inference with ternary weights and activations under 8-bit.
1 code implementation • 2021 18th International SoC Design Conference (ISOCC) 2021 • Kyuhong Shim, Iksoo Choi, Wonyong Sung, Jungwook Choi
While Transformer-based models have shown impressive language modeling performance, the large computation cost is often prohibitive for practical use.
no code implementations • ICLR 2022 • Kyuhong Shim, Jungwook Choi, Wonyong Sung
Self-attention (SA) is a critical component of Transformer neural networks that have succeeded in automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 4 Jan 2021 • Muhammad Shafique, Mahum Naseer, Theocharis Theocharides, Christos Kyrkou, Onur Mutlu, Lois Orosa, Jungwook Choi
Machine Learning (ML) techniques have been rapidly adopted by smart Cyber-Physical Systems (CPS) and Internet-of-Things (IoT) due to their powerful decision-making capabilities.
no code implementations • 1 Jan 2021 • Seongmin Park, Beomseok Kwon, Kyuyoung Sim, Jieun Lim, Tae-Ho Kim, Jungwook Choi
Uniform-precision neural network quantization has gained popularity thanks to its simple arithmetic unit densely packed for high computing capability.
no code implementations • 30 Sep 2020 • Yoonho Boo, Sungho Shin, Jungwook Choi, Wonyong Sung
In this study, we propose stochastic precision ensemble training for QDNNs (SPEQ).
no code implementations • NeurIPS 2019 • Xiao Sun, Jungwook Choi, Chia-Yu Chen, Naigang Wang, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Xiaodong Cui, Wei zhang, Kailash Gopalakrishnan
Reducing the numerical precision of data and computation is extremely effective in accelerating deep learning training workloads.
no code implementations • 25 Sep 2019 • Shihui Yin, Kyu-Hyoun Kim, Jinwook Oh, Naigang Wang, Mauricio Serrano, Jae-sun Seo, Jungwook Choi
In the case of ResNet50 on ImageNet, this comes to the winning ticket of 75:02% Top-1 accuracy at 80% pruning rate in only 22% of the total epochs for iterative pruning.
no code implementations • ICLR 2019 • Charbel Sakr, Naigang Wang, Chia-Yu Chen, Jungwook Choi, Ankur Agrawal, Naresh Shanbhag, Kailash Gopalakrishnan
Observing that a bad choice for accumulation precision results in loss of information that manifests itself as a reduction in variance in an ensemble of partial sums, we derive a set of equations that relate this variance to the length of accumulation and the minimum number of bits needed for accumulation.
no code implementations • NeurIPS 2018 • Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, Kailash Gopalakrishnan
The state-of-the-art hardware platforms for training Deep Neural Networks (DNNs) are moving from traditional single precision (32-bit) computations towards 16 bits of precision -- in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations.
no code implementations • 17 Jul 2018 • Jungwook Choi, Pierce I-Jen Chuang, Zhuo Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost.
3 code implementations • ICLR 2018 • Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets.
no code implementations • 7 Dec 2017 • Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei zhang, Kailash Gopalakrishnan
Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained.