计算机科学
人工智能
图灵
可用性
并行计算
巨量平行
开源
图灵机
程序设计语言
操作系统
软件
计算
作者
Jeff Rasley,Samyam Rajbhandari,Olatunji Ruwase,Yuxiong He
出处
期刊:Knowledge Discovery and Data Mining
日期:2020-08-20
卷期号:: 3505-3506
被引量:402
标识
DOI:10.1145/3394486.3406703
摘要
Explore new techniques in Microsoft's open source library called DeepSpeed, which advances large model training by improving scale, speed, cost, and usability, unlocking the ability to train 100-billion-parameter models. DeepSpeed is compatible with PyTorch. One piece of our library, called ZeRO, is a new parallelized optimizer that greatly reduces the resources needed for model and data parallelism while massively increasing the number of parameters that can be trained. Researchers have used these breakthroughs to create Turing Natural Language Generation (Turing-NLG), which at the time of its release was the largest publicly known language model at 17 billion parameters. In addition we will also go over our latest transformer kernel advancements that led the DeepSpeed team to achieve the world fastest BERT pretraining record.
科研通智能强力驱动
Strongly Powered by AbleSci AI