no code implementations • 29 Apr 2024 • Daniel Nichols, Pranav Polasam, Harshitha Menon, Aniruddha Marathe, Todd Gamblin, Abhinav Bhatele
Optimizing scientific software is a difficult task because codebases are often large and complex, and performance can depend upon several factors including the algorithm, its implementation, and hardware among others.
no code implementations • 23 Jan 2024 • Daniel Nichols, Joshua H. Davis, Zhaojun Xie, Arjun Rajaram, Abhinav Bhatele
Large language models are increasingly becoming a popular tool for software development.
no code implementations • 18 Oct 2023 • Siddharth Singh, Zachary Sating, Abhinav Bhatele
The primary efficiency bottleneck in such optimizers is matrix inverse calculations in the preconditioning step, which are expensive to compute on GPUs.
no code implementations • 29 Jun 2023 • Daniel Nichols, Aniruddha Marathe, Harshitha Menon, Todd Gamblin, Abhinav Bhatele
In this paper, we show how large language models (LLMs) can be applied to tasks specific to high performance and scientific codes.
1 code implementation • 22 May 2023 • Siddharth Singh, Prajwal Singhania, Aditya K. Ranjan, Zack Sating, Abhinav Bhatele
Large communication costs are a critical bottleneck in training state-of-the-art neural networks on distributed systems.
1 code implementation • 11 Mar 2023 • Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele
Mixture-of-Experts (MoE) is a neural network architecture that adds sparsely activated expert blocks to a base model, increasing the number of parameters without impacting computational costs.
no code implementations • 10 Feb 2023 • Siddharth Singh, Abhinav Bhatele
Parallel training of neural networks at scale is challenging due to significant overheads arising from communication.
no code implementations • 9 Nov 2021 • Daniel Nichols, Siddharth Singh, Shu-Huai Lin, Abhinav Bhatele
This phenomenon has spurred the development of algorithms for distributed training of neural networks over a larger number of hardware accelerators.
no code implementations • 25 Oct 2021 • Siddharth Singh, Abhinav Bhatele
This has necessitated the development of efficient algorithms to train these neural networks in parallel on large-scale GPU-based clusters.
no code implementations • 7 Jul 2020 • Ian J. Costello, Abhinav Bhatele
In recent years, several HPC facilities have started continuous monitoring of their systems and jobs to collect performance-related data for understanding performance and operational efficiency.
1 code implementation • 1 Jul 2020 • Suraj P. Kesavan, Harsh Bhatia, Abhinav Bhatele, Todd Gamblin, Peer-Timo Bremer, Kwan-Liu Ma
Optimizing the performance of large-scale parallel codes is critical for efficient utilization of computing resources.
Distributed, Parallel, and Cluster Computing Performance