Search Results for author: Dhabaleswar K.

Found 3 papers, 3 papers with code

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

1 code implementation16 Jan 2024 Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

Unlike previous methods, our solution can be directly applied to pre-trained MoE models without any fine-tuning or accuracy degradation.

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

1 code implementation22 May 2023 Jinghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

Inference on these models, by design, harnesses a temporal dependency, where the current token's probability distribution is conditioned on preceding tokens.

Computational Efficiency

Efficient MPI-based Communication for GPU-Accelerated Dask Applications

1 code implementation21 Jan 2021 Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K., Panda

This paper presents the design and implementation of a new communication backend for Dask -- called MPI4Dask -- that is targeted for modern HPC clusters built with GPUs.

Blocking Distributed Computing

Cannot find the paper you are looking for? You can Submit a new open access paper.