Search Results for author: Daehyun Ahn

Found 5 papers, 1 papers with code

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

1 code implementation15 Feb 2024 Taesu Kim, Jongho Lee, Daehyun Ahn, Sarang Kim, Jiwoong Choi, Minkyu Kim, HyungJun Kim

We introduce QUICK, a group of novel optimized CUDA kernels for the efficient inference of quantized Large Language Models (LLMs).

Quantization

Squeezing Large-Scale Diffusion Models for Mobile

no code implementations3 Jul 2023 Jiwoong Choi, Minkyu Kim, Daehyun Ahn, Taesu Kim, Yulhwa Kim, Dongwon Jo, Hyesung Jeon, Jae-Joon Kim, HyungJun Kim

The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research.

Image Generation

Temporal Dynamic Quantization for Diffusion Models

no code implementations NeurIPS 2023 Junhyuk So, Jungwon Lee, Daehyun Ahn, HyungJun Kim, Eunhyeok Park

The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility.

Quantization

Viterbi-based Pruning for Sparse Matrix with Fixed and High Index Compression Ratio

no code implementations ICLR 2018 Dongsoo Lee, Daehyun Ahn, Taesu Kim, Pierce I. Chuang, Jae-Joon Kim

Hence, pruning is usually restricted to inference with a batch size of one, for which an efficient parallel matrix-vector multiplication method exists.

Cannot find the paper you are looking for? You can Submit a new open access paper.