Search Results for author: Alexandros Daglis

Found 1 papers, 0 papers with code

COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training

no code implementations30 Nov 2022 Divya Kiran Kadiyala, Saeed Rashidi, Taekyung Heo, Abhimanyu Rajeshkumar Bambhaniya, Tushar Krishna, Alexandros Daglis

To facilitate the design space exploration of such massive DL training clusters, we introduce COMET, a holistic cluster design methodology and workflow to jointly study the impact of parallelization strategies and key cluster resource provisioning on the performance of distributed DL training.

Cannot find the paper you are looking for? You can Submit a new open access paper.