2 code implementations • 9 Feb 2020 • Cody Rivera, Jieyang Chen, Nan Xiong, Shuaiwen Leon Song, Dingwen Tao
Many works have been done on optimizing linear algebra operations on GPUs with regular-shaped input.
Distributed, Parallel, and Cluster Computing