计算机科学
炸薯条
带宽(计算)
并发
缓存一致性
共享内存
嵌入式系统
芯片上的系统
计算机体系结构
模块化设计
网络拓扑
隐藏物
计算机硬件
CPU缓存
计算机网络
分布式计算
电信
操作系统
缓存算法
作者
Andreas Kurth,Wolfgang Rönninger,Thomas Benz,Matheus de Araújo Cavalcante,Fabian Schuiki,Florian Zaruba,Luca Benini
标识
DOI:10.1109/tc.2021.3107726
摘要
On-chip communication infrastructure is a central component of modern systems-on-chip (SoCs), and it continues to gain importance as the number of cores, the heterogeneity of components, and the on-chip and off-chip bandwidth continue to grow. Decades of research on on-chip networks enabled cache-coherent shared-memory multiprocessors. However, communication fabrics that meet the needs of heterogeneous many-cores and accelerator-rich SoCs, which are not, or only partially, coherent, are a much less mature research area. In this work, we present a modular, topology-agnostic, high-performance on-chip communication platform. The platform includes components to build and link subnetworks with customizable bandwidth and concurrency properties and adheres to a state-of-the-art, industry-standard protocol. We discuss microarchitectural trade-offs and timing/area characteristics of our modules and show that they can be composed to build high-bandwidth (e.g., 2.5 GHz and 1024 bit data width) end-to-end on-chip communication fabrics (not only network switches but also DMA engines and memory controllers) with high degrees of concurrency. We design and implement a state-of-the-art ML training accelerator, where our communication fabric scales to 1024 cores on a die, providing 32 TB/s cross-sectional bandwidth at only 24 ns round-trip latency between any two cores.
科研通智能强力驱动
Strongly Powered by AbleSci AI