no code implementations • 4 Apr 2023 • Norman P. Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay Subramanian, Andy Swing, Brian Towles, Cliff Young, Xiang Zhou, Zongwei Zhou, David Patterson
For similar sized systems, it is ~4. 3x-4. 5x faster than the Graphcore IPU Bow and is 1. 2x-1. 7x faster and uses 1. 3x-1. 9x less power than the Nvidia A100.
no code implementations • NeurIPS 2020 • Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter Ma, Qiumin Xu, Hanxiao Liu, Phitchaya Mangpo Phothilimthana, Shen Wang, Anna Goldie, Azalia Mirhoseini, James Laudon
Most compilers for machine learning (ML) frameworks need to solve many correlated optimization problems to generate efficient machine code.