远程直接内存访问
寄主(生物学)
网络拥塞
计算机科学
计算机网络
操作系统
生物
生态学
网络数据包
作者
Zirui Wan,Jiao Zhang,Yuxiang Wang,Kefei Liu,Haoyu Pan,Yongchen Pan,Tao Huang
标识
DOI:10.1109/ton.2024.3524247
摘要
RDMA has been widely deployed in production datacenters. The conventional wisdom believes that the intra-host network delivers stable and high performance. However, intra-host resources witness a relative stagnation in technology trends compared to the evolving RDMA NIC (RNIC). Thus, the RNIC traffic may not get sufficient intra-host resources when it contends with CPU-to-memory traffic. A line of recent works from large-scale production datacenter operators demonstrates the emergence of intra-host congestion and associated performance collapse, which forces us to revisit the practice of intra-host congestion control. However, the ability to efficiently control RDMA intra-host networks is far less mature than inter-host networks, which brings challenges in congestion monitoring, intra-host resource allocation and RNIC traffic adjustment. In this paper, we propose RDMA intra-Host Congestion Control (RHCC), which combines CPU-to-memory traffic congestion avoidance with sub-RTT granularity and proactive RNIC traffic adjustment. RHCC ensures fast congestion avoidance and can work with different inter-host congestion control methods. We implement RHCC on commodity servers and RNICs and conduct experiments to evaluate the performance. The results show that RHCC can increase/decrease the network throughput/latency by up to 2 $\times$ and 1.4 $\times$ , respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI