no code implementations • 6 Nov 2023 • Kabir Nagrecha, Arun Kumar
In this paper, we propose Saturn, a new data system to improve the efficiency of multi-large-model training (e. g., during model selection/hyperparameter optimization).
1 code implementation • 3 Sep 2023 • Kabir Nagrecha, Arun Kumar
Such models need multiple GPUs due to both their size and computational load, driving the development of a bevy of "model parallelism" techniques & tools.
no code implementations • 13 Aug 2023 • Kabir Nagrecha, Lingyi Liu, Pablo Delgado, Prasanna Padmanabhan
Our studies lead us to design and build a new solution for data pipeline optimization, InTune.
no code implementations • 6 Jan 2023 • Kabir Nagrecha
Deep learning (DL) has transformed applications in a variety of domains, including computer vision, natural language processing, and tabular data analysis.
1 code implementation • 16 Oct 2021 • Kabir Nagrecha, Arun Kumar
In this paper, we present Hydra, a system designed to tackle such challenges by enabling out-of-the-box scaling for multi-large-model DL workloads on even commodity GPUs in a resource-efficient manner.
Ranked #5 on Language Modelling on WikiText-2 (using extra training data)
no code implementations • 14 Jul 2021 • Kabir Nagrecha
As deep learning becomes more expensive, both in terms of time and compute, inefficiencies in machine learning (ML) training prevent practical usage of state-of-the-art models for most users.
no code implementations • CVPR 2021 • Pei Wang, Kabir Nagrecha, Nuno Vasconcelos
This is formulated as a problem of functional optimization where, at each teaching iteration, the teacher seeks to align the steepest descent directions of the risk of (1) the teaching set and (2) entire example population.