计算机科学
并行计算
分布式计算
实时迁移
操作系统
云计算
虚拟化
作者
Tokito Murata,Kenichi Kourai
标识
DOI:10.1016/j.future.2024.05.024
摘要
Recently, clouds provide virtual machines (VMs) with a large amount of memory for big data analysis. For easier migration of such VMs, split migration divides the memory of a VM into several fragments and transfers them to multiple hosts. Since the migrated VM called a split-memory VM needs to exchange memory data between the hosts using remote paging, it is inherently subject to host and network failures. As a countermeasure, the checkpoint/restore mechanism has been used to periodically save the state of a VM, but the traditional mechanism is not suitable for split-memory VMs. It has to move a large amount of memory data between hosts during checkpointing and can just restore a normal VM on one host. This paper proposes D-CRES for enabling efficient checkpointing and restoration of split-memory VMs. D-CRES achieves fast checkpointing by saving the memory of a split-memory VM at all hosts in parallel without remote paging. It supports consistent live checkpointing to save the memory of a running VM by considering remote paging caused by the VM itself during checkpointing. In addition, it can incrementally take a checkpoint by considering remote paging since the last checkpointing. Upon failure, D-CRES restores a split-memory VM at multiple hosts in parallel. We have implemented D-CRES in KVM and showed that live checkpointing in D-CRES was up to 39x faster than the traditional mechanism.
科研通智能强力驱动
Strongly Powered by AbleSci AI