计算机科学
云计算
虚拟化
分布式计算
调度(生产过程)
容器(类型理论)
虚拟机
软件部署
吞吐量
操作系统
共享资源
并行计算
机械工程
工程类
经济
无线
运营管理
作者
Ting-An Yeh,Hung-Hsin Chen,Jerry Chou
标识
DOI:10.1145/3369583.3392679
摘要
Container has emerged as a new technology in clouds to replace virtual machines~(VM) for distributed applications deployment and operation. With the increasing number of new cloud-focused applications, such as deep learning and high performance applications, started to reply on the high computing throughput of GPUs, efficiently supporting GPU in container cloud becomes essential. While GPU virtualization has been extensively studied for VM, limited work has been done for containers. One of the key challenges is the lack of support for GPU sharing between multiple concurrent containers. This limitation leads to low resource utilization when a GPU device cannot be fully utilized by a single application due to the burstiness of GPU workload and the limited memory bandwidth. To overcome this issue, we designed and implemented KubeShare, which extends Kubernetes to enable GPU sharing with fine-grained allocation. KubeShare is the first solution for Kubernetes to make GPU device as a first class resources for scheduling and allocations. Using real deep learning workloads, we demonstrated KubeShare can significantly increase GPU utilization and overall system throughput around 2x with less than 10% performance overhead during container initialization and execution.
科研通智能强力驱动
Strongly Powered by AbleSci AI