Search Results for author: Ramachandran Ramjee

Found 9 papers, 2 papers with code

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

no code implementations • 4 Mar 2024 • Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee

However, batching multiple requests leads to an interleaving of prefill and decode iterations which makes it challenging to achieve both high throughput and low latency.

Scheduling

Paper
Add Code

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

no code implementations • 31 Aug 2023 • Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee

SARATHI employs chunked-prefills, which splits a prefill request into equal sized chunks, and decode-maximal batching, which constructs a batch using a single prefill chunk and populates the remaining slots with decodes.

Language Modelling Large Language Model

Paper
Add Code

NGAME: Negative Mining-aware Mini-batching for Extreme Classification

no code implementations • 10 Jul 2022 • Kunal Dahiya, Nilesh Gupta, Deepak Saini, Akshay Soni, Yajun Wang, Kushal Dave, Jian Jiao, Gururaj K, Prasenjit Dey, Amit Singh, Deepesh Hada, Vidit Jain, Bhawna Paliwal, Anshul Mittal, Sonu Mehta, Ramachandran Ramjee, Sumeet Agarwal, Purushottam Kar, Manik Varma

This paper identifies that memory overheads of popular negative mining techniques often force mini-batch sizes to remain small and slow training down.

Classification TAG

Paper
Add Code

Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

no code implementations • 16 Feb 2022 • Dharma Shukla, Muthian Sivathanu, Srinidhi Viswanatha, Bhargav Gulavani, Rimma Nehme, Amey Agrawal, Chen Chen, Nipun Kwatra, Ramachandran Ramjee, Pankaj Sharma, Atul Katiyar, Vipul Modi, Vaibhav Sharma, Abhishek Singh, Shreshth Singhal, Kaustubh Welankar, Lu Xun, Ravi Anupindi, Karthik Elangovan, Hasibur Rahman, Zhou Lin, Rahul Seetharaman, Cheng Xu, Eddie Ailijiang, Suresh Krishnappa, Mark Russinovich

At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically scale deep learning workloads to drive high utilization without impacting their correctness or performance, across a global fleet of AI accelerators (e. g., GPUs, FPGAs).

Scheduling

Paper
Add Code

LRTuner: A Learning Rate Tuner for Deep Neural Networks

1 code implementation • ICML Workshop AutoML 2021 • Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

For example on ImageNet with Resnet-50, LRTuner shows up to 0. 2% absolute gains in test accuracy compared to the hand-tuned baseline schedule.

Paper
Code

Unsupervised Clustering using Pseudo-semi-supervised Learning

no code implementations • ICLR 2020 • Divam Gupta, Ramachandran Ramjee, Nipun Kwatra, Muthian Sivathanu

In this paper, we propose a framework that leverages semi-supervised models to improve unsupervised clustering performance.

Clustering

Paper
Add Code

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

2 code implementations • 9 Mar 2020 • Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

Several papers argue that wide minima generalize better than narrow minima.

Ranked #6 on Machine Translation on WMT2014 German-English

Machine Translation

Paper
Code

AutoLR: A Method for Automatic Tuning of Learning Rate

no code implementations • 25 Sep 2019 • Nipun Kwatra, V Thejas, Nikhil Iyer, Ramachandran Ramjee, Muthian Sivathanu

We compare favorably against state of the art learning rate schedules for the given dataset and models, including for ImageNet on Resnet-50, Cifar-10 on Resnet-18, and SQuAD fine-tuning on BERT.

Paper
Add Code

Privado: Practical and Secure DNN Inference with Enclaves

no code implementations • 1 Oct 2018 • Karan Grover, Shruti Tople, Shweta Shinde, Ranjita Bhagwan, Ramachandran Ramjee

In this paper, we ask a timely question: "Can third-party cloud services use Intel SGX enclaves to provide practical, yet secure DNN Inference-as-a-service?"

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.