计算机科学
分布式计算
调度(生产过程)
科学发现
虚拟化
建筑
对称多处理机系统
云计算
操作系统
心理学
运营管理
艺术
视觉艺术
经济
认知科学
作者
Tiechui Yao,Jue Wang,Meng Wan,Zhikuang Xin,Yangang Wang,Rongqiang Cao,Shigang Li,Xuebin Chi
标识
DOI:10.1016/j.sysarc.2022.102550
摘要
Since the machine learning platform can provide one-stop artificial intelligence (AI) application solutions, it has been widely used in the industrial and commercial internet fields in recent years. Based on the heterogeneous accelerator cards, scientific discovery using large-scale computation and massive data is a significant tendency in the future. However, building a platform for scientific discovery remains challenging, including large-scale heterogeneous resource scheduling and support for massive multi-source data. To free researchers from tedious resource management and environmental configuration, we propose a VenusAI platform for large-scale computing scenarios in scientific research, based on heterogeneous resources scheduling framework. This paper firstly illustrates the VenusAI platform architecture design scheme based on the supercomputers and elaborates on the virtualization and containerization of the underlying hardware resources. Next, a technical framework for heterogeneous resource aggregation and scheduling is proposed. A unified resource interface in the application service layer is introduced. Considering the core three parts of the AI scenario: data, model, and computing power, modularized service decoupling is carried out. Furthermore, three types of experiments are evaluated on the supercomputers and show that the performance of the scheduling framework on virtual clusters is better than that on common clusters. Finally, three scientific discovery applications deployed on VenusAI, i.e., new energy forecasting, materials design, and unmanned aerial vehicle planning, demonstrate the advantages of the platform in solving practical scientific problems.
科研通智能强力驱动
Strongly Powered by AbleSci AI