计算机科学
概率逻辑
容错
节点(物理)
可靠性(半导体)
分布式计算
马尔可夫过程
控制重构
算法的概率分析
计算机网络
工程类
功率(物理)
统计
物理
数学
结构工程
量子力学
人工智能
嵌入式系统
作者
Yuetai Li,Yixuan Fan,Lei Zhang,Jon Crowcroft
标识
DOI:10.1109/jiot.2023.3257402
摘要
The centralized system becomes less efficient, secure, and resilient as the network size and heterogeneity increase due to its inherent single point of failure issues. Distributed consensus mechanisms characterized by decentralization, autonomy, parallelism, and fault-tolerance can meet the increasing demands of safety and security in critical interconnected systems. This article establishes a Node and Link probabilistic failure model in the presence of node and communication link failures for a representative crash fault-tolerant distributed consensus protocol: RAFT. The analytical results in terms of the probability density function and the mean value of consensus reliability are derived. Two important reliability performance indicators, Reliability Gain and Tolerance Gain are proposed to indicate the linear relationship between the consensus reliability and two basic parameters, i.e., the joint failure rate and the maximum number of tolerant faulty nodes, which provide the theoretical guidance for quickly deploying an RAFT system. The special case of a distributed consensus network with already a certain number of failures and its adverse impact are evaluated. The Markov probabilistic models, definitions of Reliability Gain and Tolerance Gain, and the analysis methods proposed in this article can be extended to other consensus mechanisms.
科研通智能强力驱动
Strongly Powered by AbleSci AI