1 code implementation • 15 Mar 2022 • Vivek Bharadwaj, Aydın Buluç, James Demmel
Further, we give two communication-eliding strategies to reduce costs further for FusedMM kernels: either reusing the replication of an input dense matrix for the SDDMM and SpMM in sequence, or fusing the local SDDMM and SpMM kernels.
1 code implementation • 12 Feb 2020 • Alberto Zeni, Giulia Guidi, Marquita Ellis, Nan Ding, Marco D. Santambrogio, Steven Hofmeyr, Aydın Buluç, Leonid Oliker, Katherine Yelick
To highlight the impact of our work on a real-world application, we couple LOGAN with a many-to-many long-read alignment software called BELLA, and demonstrate that our implementation improves the overall BELLA runtime by up to 10. 6x.
no code implementations • 14 Oct 2019 • Yu-Hang Tang, Oguz Selvitopi, Doru Popovici, Aydın Buluç
To cope with the gap between the instruction throughput and the memory bandwidth of current generation GPUs, our solver forms the tensor product linear system on-the-fly without storing it in memory when performing matrix-vector dot product operations in PCG.
1 code implementation • 5 Apr 2018 • Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, Aydın Buluç
Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type.
Distributed, Parallel, and Cluster Computing