2 code implementations • 4 Jun 2023 • Changhun Lee, Jungyu Jin, Taesu Kim, HyungJun Kim, Eunhyeok Park
Large language models (LLMs) with hundreds of billions of parameters require powerful server-grade GPUs for inference, limiting their practical deployment.