Search Results for author: Dheevatsa Mudigere

Found 29 papers, 8 papers with code

Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

no code implementations1 Mar 2024 Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov

We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology.

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

1 code implementation12 Sep 2023 Hao-Jun Michael Shi, Tsung-Hsien Lee, Shintaro Iwasaki, Jose Gallego-Posada, Zhijing Li, Kaushik Rangadurai, Dheevatsa Mudigere, Michael Rabbat

It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network.

Stochastic Optimization

MTrainS: Improving DLRM training efficiency using heterogeneous memories

no code implementations19 Apr 2023 Hiwot Tadese Kassa, Paul Johnson, Jason Akers, Mrinmoy Ghosh, Andrew Tulloch, Dheevatsa Mudigere, Jongsoo Park, Xing Liu, Ronald Dreslinski, Ehsan K. Ardestani

In Deep Learning Recommendation Models (DLRM), sparse features capturing categorical inputs through embedding tables are the major contributors to model size and require high memory bandwidth.

DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction

no code implementations11 Mar 2022 Buyun Zhang, Liang Luo, Xi Liu, Jay Li, Zeliang Chen, Weilin Zhang, Xiaohan Wei, Yuchen Hao, Michael Tsang, Wenjun Wang, Yang Liu, Huayu Li, Yasmine Badr, Jongsoo Park, Jiyan Yang, Dheevatsa Mudigere, Ellie Wen

To overcome the challenge brought by DHEN's deeper and multi-layer structure in training, we propose a novel co-designed training system that can further improve the training efficiency of DHEN.

Click-Through Rate Prediction

Differentiable NAS Framework and Application to Ads CTR Prediction

1 code implementation25 Oct 2021 Ravi Krishna, Aravind Kalaiah, Bichen Wu, Maxim Naumov, Dheevatsa Mudigere, Misha Smelyanskiy, Kurt Keutzer

Neural architecture search (NAS) methods aim to automatically find the optimal deep neural network (DNN) architecture as measured by a given objective function, typically some combination of task accuracy and inference efficiency.

Click-Through Rate Prediction Neural Architecture Search

Check-N-Run: A Checkpointing System for Training Deep Learning Recommendation Models

no code implementations17 Oct 2020 Assaf Eisenman, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere, Raghuraman Krishnamoorthi, Krishnakumar Nair, Misha Smelyanskiy, Murali Annavaram

While Check-N-Run is applicable to long running ML jobs, we focus on checkpointing recommendation models which are currently the largest ML models with Terabytes of model size.

Quantization Recommendation Systems

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems

no code implementations20 Mar 2020 Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, Krishnakumar Nair, Isabel Gao, Bor-Yiing Su, Jiyan Yang, Mikhail Smelyanskiy

Large-scale training is important to ensure high performance and accuracy of machine-learning models.

Distributed, Parallel, and Cluster Computing 68T05, 68M10 H.3.3; I.2.6; C.2.1

SEERL: Sample Efficient Ensemble Reinforcement Learning

no code implementations15 Jan 2020 Rohan Saphal, Balaraman Ravindran, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul

However, ensemble methods are relatively less popular in reinforcement learning owing to the high sample complexity and computational expense involved in obtaining a diverse ensemble.

Continuous Control Ensemble Learning +3

Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

6 code implementations25 Sep 2019 Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, James Zou

Embedding representations power machine intelligence in many applications, including recommendation systems, but they are space intensive -- potentially occupying hundreds of gigabytes in large-scale settings.

Click-Through Rate Prediction Collaborative Filtering +1

Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

6 code implementations4 Sep 2019 Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, Jiyan Yang

We propose a novel approach for reducing the embedding size in an end-to-end fashion by exploiting complementary partitions of the category set to produce a unique embedding vector for each category without explicit definition.

Recommendation Systems

Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy

no code implementations26 Oct 2018 Majid Jahani, Xi He, Chenxin Ma, Aryan Mokhtari, Dheevatsa Mudigere, Alejandro Ribeiro, Martin Takáč

In this paper, we propose a Distributed Accumulated Newton Conjugate gradiEnt (DANCE) method in which sample size is gradually increasing to quickly obtain a solution whose empirical loss is under satisfactory statistical accuracy.

Hierarchical Block Sparse Neural Networks

no code implementations10 Aug 2018 Dharma Teja Vooturi, Dheevatsa Mudigere, Sasikanth Avancha

In this work, we jointly address both accuracy and performance of sparse DNNs using our proposed class of sparse neural networks called HBsNN (Hierarchical Block sparse Neural Networks).

A Progressive Batching L-BFGS Method for Machine Learning

no code implementations ICML 2018 Raghu Bollapragada, Dheevatsa Mudigere, Jorge Nocedal, Hao-Jun Michael Shi, Ping Tak Peter Tang

The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function.

BIG-bench Machine Learning

On Scale-out Deep Learning Training for Cloud and HPC

no code implementations24 Jan 2018 Srinivas Sridharan, Karthikeyan Vaidyanathan, Dhiraj Kalamkar, Dipankar Das, Mikhail E. Smorkalov, Mikhail Shiryaev, Dheevatsa Mudigere, Naveen Mellempudi, Sasikanth Avancha, Bharat Kaul, Pradeep Dubey

The exponential growth in use of large deep neural networks has accelerated the need for training these deep neural networks in hours or even minutes.

Philosophy

RAIL: Risk-Averse Imitation Learning

1 code implementation20 Jul 2017 Anirban Santara, Abhishek Naik, Balaraman Ravindran, Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul

Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories.

Autonomous Driving Continuous Control +1

Ternary Residual Networks

no code implementations15 Jul 2017 Abhisek Kundu, Kunal Banerjee, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey

Aided by such an elegant trade-off between accuracy and compute, the 8-2 model (8-bit activations, ternary weights), enhanced by ternary residual edges, turns out to be sophisticated enough to achieve very high accuracy ($\sim 1\%$ drop from our FP-32 baseline), despite $\sim 1. 6\times$ reduction in model size, $\sim 26\times$ reduction in number of multiplications, and potentially $\sim 2\times$ power-performance gain comparing to 8-8 representation, on the state-of-the-art deep network ResNet-101 pre-trained on ImageNet dataset.

Ternary Neural Networks with Fine-Grained Quantization

no code implementations2 May 2017 Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey

We address this by fine-tuning Resnet-50 with 8-bit activations and ternary weights at $N=64$, improving the Top-1 accuracy to within $4\%$ of the full precision result with $<30\%$ additional training overhead.

Quantization

Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point

no code implementations31 Jan 2017 Naveen Mellempudi, Abhisek Kundu, Dipankar Das, Dheevatsa Mudigere, Bharat Kaul

We propose a cluster-based quantization method to convert pre-trained full precision weights into ternary weights with minimal impact on the accuracy.

Quantization

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

9 code implementations15 Sep 2016 Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, Ping Tak Peter Tang

The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks.

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent

no code implementations22 Feb 2016 Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidynathan, Srinivas Sridharan, Dhiraj Kalamkar, Bharat Kaul, Pradeep Dubey

We design and implement a distributed multinode synchronous SGD algorithm, without altering hyper parameters, or compressing data, or altering algorithmic behavior.

Identification of Helicopter Dynamics based on Flight Data using Nature Inspired Techniques

no code implementations12 Nov 2014 S. N. Omkar, Dheevatsa Mudigere, J Senthilnath, M. Vijaya Kumar

Most of the traditional techniques require a model structure to be defined apriori and in case of helicopter dynamics, this is difficult due to its complexity and the interplay between various subsystems. To overcome this difficulty, non-parametric approaches are commonly adopted for helicopter system identification.

Cannot find the paper you are looking for? You can Submit a new open access paper.