Virtual-Link: A Scalable Multi-Producer, Multi-Consumer Message Queue Architecture for Cross-Core Communication

9 Dec 2020  ·  Qinzhe Wu, Jonathan Beard, Ashen Ekanayake, Andreas Gerstlauer, Lizy K. John ·

Cross-core communication is increasingly a bottleneck as the number of processing elements increase per system-on-chip. Typical hardware solutions to cross-core communication are often inflexible; while software solutions are flexible, they have performance scaling limitations. A key problem, as we will show, is that of shared state in software-based message queue mechanisms. This paper proposes Virtual-Link (VL), a novel light-weight communication mechanism with hardware support to facilitate M:N lock-free data movement. VL reduces the amount of coherent shared state, which is a bottleneck for many approaches, to zero. VL provides further latency benefit by keeping data on the fast path (i.e., within the on-chip interconnect). VL enables directed cache-injection (stashing) between PEs on the coherence bus, reducing the latency for core-to-core communication. VL is particularly effective for fine-grain tasks on streaming data. Evaluation on a full system simulator with 7 benchmarks shows that VL achieves a 2.09x speedup over state-of-the-art software-based communication mechanisms, while reducing memory traffic by 61%.

PDF Abstract
No code implementations yet. Submit your code now

Categories


Hardware Architecture

Datasets


  Add Datasets introduced or used in this paper