现场可编程门阵列
计算机科学
协处理器
延迟(音频)
嵌入式系统
可扩展性
Virtex公司
低延迟(资本市场)
计算机网络
操作系统
电信
作者
Roberto Sanchez Correa,Jean‐Pierre David
出处
期刊:Integration
[Elsevier]
日期:2018-05-22
卷期号:63: 41-55
被引量:17
标识
DOI:10.1016/j.vlsi.2018.05.005
摘要
The FPGA technology offers numerous advantages in terms of parallel computation, which is supported by on-chip low latency communications. Nevertheless, clustering FPGAs to achieve a larger computing power may require external high-speed and low-latency communication channels. Because of the overhead due to complex features and functionalities, existing off-the-shelf IP cores for high-speed standard communication often waste valuable clock cycles and bandwidth. This paper presents the implementation of an ultra-low latency inter-FPGAs communication IP suitable for high performance computing machines. Our IP achieved 272 ns (34 clock cycles) half-round trip end-to-end latency and an aggregate bandwidth of 16 Gbps per node on Virtex-5 FPGA. To test the proposed IP under a high-performance situation, we implemented an eight-FPGA parallel computing machine hosting 48 coprocessors interconnected through our custom designed network. Experimental results show a global computational efficiency of 97.6%. The proposed architecture is scalable and easily portable to most recent FPGAs, which should lower the latency and increase the bandwidth even more.
科研通智能强力驱动
Strongly Powered by AbleSci AI