1 code implementation • 15 Feb 2024 • Taesu Kim, Jongho Lee, Daehyun Ahn, Sarang Kim, Jiwoong Choi, Minkyu Kim, HyungJun Kim
We introduce QUICK, a group of novel optimized CUDA kernels for the efficient inference of quantized Large Language Models (LLMs).
no code implementations • 3 Jul 2023 • Jiwoong Choi, Minkyu Kim, Daehyun Ahn, Taesu Kim, Yulhwa Kim, Dongwon Jo, Hyesung Jeon, Jae-Joon Kim, HyungJun Kim
The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research.
no code implementations • NeurIPS 2023 • Junhyuk So, Jungwon Lee, Daehyun Ahn, HyungJun Kim, Eunhyeok Park
The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility.
no code implementations • ICLR 2019 • Daehyun Ahn, Dongsoo Lee, Taesu Kim, Jae-Joon Kim
In this paper, we propose a new sparse matrix format in order to enable a highly parallel decoding process of the entire sparse matrix.
no code implementations • ICLR 2018 • Dongsoo Lee, Daehyun Ahn, Taesu Kim, Pierce I. Chuang, Jae-Joon Kim
Hence, pruning is usually restricted to inference with a batch size of one, for which an efficient parallel matrix-vector multiplication method exists.