Random Convolutional Coding for Robust and Straggler Resilient Distributed Matrix Computation

18 Jul 2019 · Anindya B. Das, Aditya Ramamoorthy, Namrata Vaswani ·

Distributed matrix computations (matrix-vector and matrix-matrix multiplications) are at the heart of several tasks within the machine learning pipeline. However, distributed clusters are well-recognized to suffer from the problem of stragglers (slow or failed nodes). Prior work in this area has presented straggler mitigation strategies based on polynomial evaluation/interpolation. However, such approaches suffer from numerical problems (blow up of round-off errors) owing to the high condition numbers of the corresponding Vandermonde matrices. In this work, we introduce a novel solution approach that relies on embedding distributed matrix computations into the structure of a convolutional code. This simple innovation allows us to develop a provably numerically robust and efficient (fast) solution for distributed matrix-vector and matrix-matrix multiplication.

PDF Abstract

Code

Add Remove Mark official

anindyabijoydas/StragglerMitigateCo…

Datasets

Add Datasets introduced or used in this paper

Edit Social Preview

Random Convolutional Coding for Robust and Straggler Resilient Distributed Matrix Computation

Code Edit Add Remove Mark official

Categories

Datasets Edit

Code

Add Remove Mark official

Datasets