2 code implementations • 20 Feb 2022 • Aravind Sankaran, Navid Akbari Alashti, Christos Psarras, Paolo Bientinesi
Linear algebra operations, which are ubiquitous in machine learning, form major performance bottlenecks.
1 code implementation • 9 Oct 2020 • Christos Psarras, Lars Karlsson, Rasmus Bro, Paolo Bientinesi
We observe that, in practice, experts often have to compute multiple decompositions of the same tensor, each with a small number of components (typically fewer than 20), to ultimately find the best ones to use for the application at hand.
no code implementations • 30 Dec 2019 • Henrik Barthels, Christos Psarras, Paolo Bientinesi
In order to combine the productivity offered by high-level languages, and the performance of low-level kernels, we are developing Linnea, a code generator for linear algebra problems.
Mathematical Software
1 code implementation • 5 Jul 2019 • Henrik Barthels, Christos Psarras, Paolo Bientinesi
In order to both achieve the productivity that comes with high-level languages, and make use of the efficiency of low level kernels, we are developing Linnea, a code generator for linear algebra problems.
Mathematical Software
no code implementations • 12 May 2018 • Tina Raissi, Alessandro Tibo, Paolo Bientinesi
We present a feature engineering pipeline for the construction of musical signal characteristics, to be used for the design of a supervised model for musical genre identification.
4 code implementations • 1 Jul 2016 • Paul Springer, Paolo Bientinesi
We present "GEMM-like Tensor-Tensor multiplication" (GETT), a novel approach to tensor contractions that mirrors the design of a high-performance general matrix-matrix multiplication (GEMM).
Mathematical Software Performance G.4; D.3.4; I.1.2; I.1.3
2 code implementations • 7 Mar 2016 • Paul Springer, Jeff R. Hammond, Paolo Bientinesi
We present TTC, an open-source parallel compiler for multidimensional tensor transpositions.
Mathematical Software Distributed, Parallel, and Cluster Computing Performance
1 code implementation • 22 Feb 2016 • Elmar Peise, Paolo Bientinesi
To exploit both memory locality and the full performance potential of highly tuned kernels, dense linear algebra libraries such as LAPACK commonly implement operations as blocked algorithms.
Mathematical Software Performance