工作站
回降
计算机科学
计算
工作(物理)
分布式计算
操作系统
可靠性工程
并行计算
数据库
工程类
算法
数据库事务
机械工程
作者
James S. Plank,Wael Elwasif
标识
DOI:10.1109/ftcs.1998.689454
摘要
In the past twenty years, there has been a wealth of theoretical research on minimizing the expected running time of a program in the presence of failures by employing checkpointing and rollback recovery. In the same time period, there has been little experimental research to corroborate these results. We study three separate projects that monitor failure in workstation networks. Our goals are twofold. The first is to see how these results correlate with the theoretical results, and the second is to assess their impact on strategies for checkpointing long-running computations on workstations and networks of workstations. A significant result of our work is that although the base assumptions of the theoretical research do not hold, many of the results are still applicable.
科研通智能强力驱动
Strongly Powered by AbleSci AI