计算机科学
带宽(计算)
互连
并行计算
隐藏物
炸薯条
内存带宽
分拆(数论)
共享内存
客户端
嵌入式系统
操作系统
计算机网络
电信
数学
组合数学
作者
Shiqing Zhang,Mahmood Naderan-Tahan,Magnus Jahre,Lieven Eeckhout
标识
DOI:10.1145/3579371.3589078
摘要
Bandwidth non-uniformity in multi-chip GPUs poses a major design challenge for its last-level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local memory partition while being accessible by all chips, an SM-side LLC is private to a chip while caching data from all memory partitions. We find that some workloads prefer a memory-side LLC while others prefer an SM-side LLC, and this preference solely depends on which organization maximizes the effective LLC bandwidth. In contrast to prior work which optimizes bandwidth beyond the LLC, we make the observation that the effective bandwidth ahead of the LLC is critical to end-to-end application performance. We propose Sharing-Aware Caching (SAC) to adopt either a memory-side or SM-side LLC organization by dynamically reconfiguring the routing policies in the intra-chip interconnection network and LLC controllers. SAC is driven by a simple and lightweight analytical model that predicts the impact of data sharing across chips on the effective LLC bandwidth. SAC improves average performance by 76% and 12% (and up to 157% and 49%) compared to a memory-side and SM-side LLC, respectively. We demonstrate significant performance improvements across the design space and across workloads.
科研通智能强力驱动
Strongly Powered by AbleSci AI