: Monitoring Large-Scale Cloud-Native Infrastructure Using One-Sided RDMA

远程直接内存访问 云计算 计算机科学 架空(工程) 数据库 操作系统 算法
作者
Zhuo Song,Jiejian Wu,Teng Ma,Zhe Wang,Linghe Kong,Zhenzao Wen,Jingxuan Li,Yang Lu,Yong Yang,Tao Ma,Zheng Liu,Guihai Chen
出处
期刊:IEEE ACM Transactions on Networking [Institute of Electrical and Electronics Engineers]
卷期号:32 (4): 3499-3514
标识
DOI:10.1109/tnet.2024.3394514
摘要

Cloud services have shifted from monolithic designs to microservices running on cloud-native infrastructure with monitoring systems to ensure service level agreements (SLAs). However, traditional monitoring systems no longer meet the demands of cloud-native monitoring. In Alibaba's "double eleven" shopping festival, it is observed that the monitor occupies resources of the monitored infrastructure and even disrupts services. In this paper, we propose a novel monitoring system named for cloud-native monitoring. achieves zero overhead in collecting raw metrics using one-sided remote direct memory access (RDMA) and remedies network congestion by adopting a receiver-driven flow control scheme. also features a priority queue mechanism to meet different quality of service requirements and an efficient batch processing design to relieve CPU occupation. has been deployed and evaluated in four different clusters with heterogeneous RDMA NIC devices and architectures in Alibaba Cloud. Results show that achieves no CPU occupation at the monitored host and supports $1\sim10k$ hosts with $0.1\sim1s$ sampling interval using a single thread for network I/O. significantly relieves the incast issue and maintains $80\sim95\%$ of bandwidth utilization in several clusters when monitoring $1k$ hosts. also ensures services with high priority accomplish collecting metrics earlier than low priority ones by at least $400 \mu s$ when monitoring $1k$ hosts.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
健康的青槐完成签到 ,获得积分10
刚刚
流星发布了新的文献求助10
1秒前
小宏完成签到,获得积分10
1秒前
lihh发布了新的文献求助10
2秒前
大魁发布了新的文献求助10
2秒前
张广瀚发布了新的文献求助10
2秒前
Evelyn完成签到,获得积分10
3秒前
Akim应助george采纳,获得10
3秒前
顾矜应助悦耳的母鸡采纳,获得10
3秒前
天火完成签到,获得积分10
3秒前
5秒前
Lucas应助天火采纳,获得10
6秒前
7秒前
lxy发布了新的文献求助10
10秒前
思源应助恩對采纳,获得10
10秒前
丘比特应助antonx采纳,获得10
10秒前
Lucas应助Murphy采纳,获得10
11秒前
眼睛大迎波完成签到,获得积分10
12秒前
百谷昙发布了新的文献求助30
12秒前
大白完成签到,获得积分10
13秒前
14秒前
17秒前
脑洞疼应助11采纳,获得10
17秒前
baihan应助dwls采纳,获得50
18秒前
炙热书雪发布了新的文献求助10
18秒前
18秒前
五山第一院士完成签到,获得积分10
18秒前
18秒前
GCD完成签到 ,获得积分10
19秒前
田様应助11采纳,获得10
19秒前
19秒前
清爽幻竹完成签到,获得积分10
19秒前
小星星应助summer采纳,获得10
21秒前
21秒前
北极熊不吃牙膏完成签到,获得积分10
21秒前
22秒前
迅速寻琴完成签到 ,获得积分10
23秒前
一一发布了新的文献求助10
24秒前
24秒前
咪咪发布了新的文献求助10
25秒前
高分求助中
IZELTABART TAPATANSINE 500
Where and how to use plate heat exchangers 400
Seven new species of the Palaearctic Lauxaniidae and Asteiidae (Diptera) 400
Handbook of Laboratory Animal Science 300
Fundamentals of Medical Device Regulations, Fifth Edition(e-book) 300
Beginners Guide To Clinical Medicine (Pb 2020): A Systematic Guide To Clinical Medicine, Two-Vol Set 250
A method for calculating the flow in a centrifugal impeller when entropy gradients are present 240
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3710379
求助须知:如何正确求助?哪些是违规求助? 3259391
关于积分的说明 9908417
捐赠科研通 2972455
什么是DOI,文献DOI怎么找? 1629885
邀请新用户注册赠送积分活动 772978
科研通“疑难数据库(出版商)”最低求助积分说明 744148