计算机科学
深度学习
人工智能
异步通信
随机梯度下降算法
深层神经网络
人工神经网络
机器学习
比例(比率)
分布式计算
任务(项目管理)
特征(语言学)
光学(聚焦)
计算机网络
语言学
哲学
物理
管理
量子力学
光学
经济
作者
Jay B. Dean,Greg S. Corrado,Rajat Monga,Kai Chen,Matthieu Devin,M. Mao,Marc’Aurelio Ranzato,Andrew Senior,Paul A. Tucker,Ke Yang,Quoc V. Le,Andrew Y. Ng
出处
期刊:Neural Information Processing Systems
日期:2012-12-03
卷期号:25: 1223-1231
被引量:3005
摘要
Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI