远程直接内存访问
英菲尼班德
计算机科学
计算机网络
云计算
网络拥塞
可扩展性
网络数据包
数据包丢失
操作系统
分布式计算
作者
Jung-Hwan Cha,Shinhyeok Kang,Yewon Kang,Hansaem Seo,Jungeun Lee,Jongsung Kim,Minsung Jang
标识
DOI:10.1109/ipccc59175.2023.10253863
摘要
Remote Direct Memory Access (RDMA) characteristics, such as high bandwidth, low latency, and low CPU utilization, have positioned RDMA as mainstream for interconnect of cloud-based High-Performance Computing (HPC) services. However, existing RDMA technologies, including InfiniBand and RoCEv2, have limitations in terms of compatibility with legacy networks, scalability in large-scale deployment, and cost-inefficient. In order to address these challenges, we propose Cloud-optimized RDMA Networking (CORN). It features cloud-optimized congestion control, which considers the Bandwidth Delay Product (BDP) and the inflight packets to determine the amount of traffic to be transmitted. This congestion control scheme significantly reduces the likelihood of packet loss due to overflowing buffers on the network switches. CORN leverages the traditional Selective ACK (SACK) to deal with packet drops caused by network congestion or H/W fault. Consequently, CORN can support lossy RDMA networks on Ethernet. In addition, the two features of CORN are designed to operate without any modifications or configurations of the network switches. CORN functions as a shim layer between UDP and RDMA, operating solely within the end host. This design ensures the seamless deployment of CORN. The implementation using ns3 shows that CORN is feasible and more efficient than congestion control schemes like DCQCN, TIMELY, and HPCC.
科研通智能强力驱动
Strongly Powered by AbleSci AI