no code implementations • 1 Oct 2023 • Cyrus Zhou, Zack Hassman, Ruize Xu, Dhirpal Shah, Vaugnn Richard, Yanjing Li
Our results demonstrate that the dataflow that keeps outputs in SIMD registers while also maximizing both input and weight reuse consistently yields the best performance for a wide variety of inference workloads, achieving up to 3x speedup for 8-bit neural networks, and up to 4. 8x speedup for binary neural networks, respectively, over the optimized implementations of neural networks today.