The Ultimate DataFlow for Ultimate SuperComputers-on-a-Chip, for Scientific Computing, Geo Physics, Complex Mathematics, and Information Processing
This article starts from the assumption that near future 100BTransistor SuperComputers-on-a-Chip will include N big multi-core processors, 1000N small many-core processors, a TPU-like fixed-structure systolic array accelerator for the most frequently used Machine Learning algorithms needed in bandwidth-bound applications and a flexible-structure reprogrammable accelerator for less frequently used Machine Learning algorithms needed in latency-critical applications.
PDF AbstractCategories
Distributed, Parallel, and Cluster Computing
Datasets
Add Datasets
introduced or used in this paper