no code implementations • 23 May 2024 • Madison Threadgill, Andreas Gerstlauer
In the era of deep learning (DL), convolutional neural networks (CNNs), and large language models (LLMs), machine learning (ML) models are becoming increasingly complex, demanding significant computational resources for both inference and training stages.
no code implementations • 14 Jul 2021 • Jackson Farley, Andreas Gerstlauer
Distributed partitioning approaches can, however, also be used to run in a reduced memory footprint on a single device by subdividing the network into smaller operations.
no code implementations • 9 Dec 2020 • Qinzhe Wu, Jonathan Beard, Ashen Ekanayake, Andreas Gerstlauer, Lizy K. John
Cross-core communication is increasingly a bottleneck as the number of processing elements increase per system-on-chip.
Hardware Architecture