no code implementations • 3 Feb 2024 • Le Chen, Nesreen K. Ahmed, Akash Dutta, Arijit Bhattacharjee, Sixing Yu, Quazi Ishtiaque Mahmud, Waqwoya Abebe, Hung Phan, Aishwarya Sarkar, Branden Butler, Niranjan Hasabnis, Gal Oren, Vy A. Vo, Juan Pablo Munoz, Theodore L. Willke, Tim Mattson, Ali Jannesari
Recently, language models (LMs), especially large language models (LLMs), have revolutionized the field of deep learning.
no code implementations • 25 Apr 2023 • Akash Dutta, Jordi Alcaraz, Ali TehraniJamsaz, Eduardo Cesar, Anna Sikora, Ali Jannesari
There is, thus, a need for a general purpose and efficient tuning approach that can be easily scaled and adapted to various tuning tasks.
no code implementations • 7 Apr 2023 • Ali TehraniJamsaz, Alok Mishra, Akash Dutta, Abid M. Malik, Barbara Chapman, Ali Jannesari
However, even with OpenMP, the developer must choose from among many strategies for exploiting a GPU or a CPU.
no code implementations • 22 Feb 2023 • Akash Dutta, Jee Choi, Ali Jannesari
Our approach identifies OpenMP configurations at different power constraints that yield a geometric mean performance improvement of more than $25\%$ and $13\%$ over the default OpenMP configuration on a 32-core Skylake and a $16$-core Haswell processor respectively.
no code implementations • 1 Mar 2022 • Ali TehraniJamsaz, Mihail Popov, Akash Dutta, Emmanuelle Saillard, Ali Jannesari
This paper demonstrates how the static Intermediate Representation (IR) of the code can guide NUMA/prefetcher optimizations without the prohibitive cost of performance profiling.