Search Results for author: Daehyun Ahn

Found 5 papers, 1 papers with code

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

1 code implementation • 15 Feb 2024 • Taesu Kim, Jongho Lee, Daehyun Ahn, Sarang Kim, Jiwoong Choi, Minkyu Kim, HyungJun Kim

We introduce QUICK, a group of novel optimized CUDA kernels for the efficient inference of quantized Large Language Models (LLMs).

Quantization

Paper
Code

Squeezing Large-Scale Diffusion Models for Mobile

no code implementations • 3 Jul 2023 • Jiwoong Choi, Minkyu Kim, Daehyun Ahn, Taesu Kim, Yulhwa Kim, Dongwon Jo, Hyesung Jeon, Jae-Joon Kim, HyungJun Kim

The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research.

Image Generation

Paper
Add Code

Temporal Dynamic Quantization for Diffusion Models

no code implementations • NeurIPS 2023 • Junhyuk So, Jungwon Lee, Daehyun Ahn, HyungJun Kim, Eunhyeok Park

The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility.

Quantization

Paper
Add Code

Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network

no code implementations • ICLR 2019 • Daehyun Ahn, Dongsoo Lee, Taesu Kim, Jae-Joon Kim

In this paper, we propose a new sparse matrix format in order to enable a highly parallel decoding process of the entire sparse matrix.

Model Compression Quantization

Paper
Add Code

Viterbi-based Pruning for Sparse Matrix with Fixed and High Index Compression Ratio

no code implementations • ICLR 2018 • Dongsoo Lee, Daehyun Ahn, Taesu Kim, Pierce I. Chuang, Jae-Joon Kim

Hence, pruning is usually restricted to inference with a batch size of one, for which an efficient parallel matrix-vector multiplication method exists.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.