Search Results for author: Baris Kasikci

Found 2 papers, 2 papers with code

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

1 code implementation • 10 Feb 2024 • Keisuke Kamahori, Yile Gu, Kan Zhu, Baris Kasikci

Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architecture are showing promising performance on various tasks.

133

Paper
Code

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

1 code implementation • 29 Oct 2023 • Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.

Quantization Sentiment Analysis

171

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.