计算机科学
内存占用
库达
并行计算
程序员
计算机体系结构
内存管理
平面存储模型
统一内存访问
记忆模型
交错存储器
分布式共享内存
半导体存储器
共享内存
嵌入式系统
操作系统
作者
Jake Choi,Heon Y. Yeom,Yoonhee Kim
出处
期刊:International Conference on Autonomic Computing
日期:2021-09-01
被引量:3
标识
DOI:10.1109/acsos-c52956.2021.00029
摘要
Popular deep learning frameworks like PyTorch utilize GPUs heavily for training, and suffer from out-of-memory (OOM) problems if memory is not managed properly. In this paper, we propose a modification that utilizes CUDA Unified Memory (UM) to expand GPU memory to the available host memory space so that practicality for the programmer can increase, and OOM memory errors will not result for any workload. We also pinpoint performance issues that result from our modifications to the framework, and outline future plans like reducing redundant memory copies, prefetching, and memory advising techniques to improve upon our design. Our implementation shows that PyTorch UM performance overheads are minimal when the data footprint is below GPU memory capacity.
科研通智能强力驱动
Strongly Powered by AbleSci AI