Search Results for author: Yashas Samaga B L

Found 1 papers, 0 papers with code

HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference

no code implementations14 Feb 2024 Yashas Samaga B L, Varun Yerram, Chong You, Srinadh Bhojanapalli, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli

Autoregressive decoding with generative Large Language Models (LLMs) on accelerators (GPUs/TPUs) is often memory-bound where most of the time is spent on transferring model parameters from high bandwidth memory (HBM) to cache.

Cannot find the paper you are looking for? You can Submit a new open access paper.