DSD: Dense-Sparse-Dense Training for Deep Neural Networks

培训(气象学) 深层神经网络 人工神经网络 人工智能 计算机科学 深度学习 模式识别(心理学) 物理 气象学
作者
Song Han,Jeff Pool,Sharan Narang,Huizi Mao,Enhao Gong,Shijian Tang,Erich Elsen,Péter Vajda,Manohar Paluri,John Tran,Bryan Catanzaro,William J. Dally
出处
期刊:Cornell University - arXiv 被引量:135
标识
DOI:10.48550/arxiv.1607.04381
摘要

Modern deep neural networks have a large number of parameters, making them very hard to train. We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance. In the first D (Dense) step, we train a dense network to learn connection weights and importance. In the S (Sparse) step, we regularize the network by pruning the unimportant connections with small weights and retraining the network given the sparsity constraint. In the final D (re-Dense) step, we increase the model capacity by removing the sparsity constraint, re-initialize the pruned parameters from zero and retrain the whole dense network. Experiments show that DSD training can improve the performance for a wide range of CNNs, RNNs and LSTMs on the tasks of image classification, caption generation and speech recognition. On ImageNet, DSD improved the Top1 accuracy of GoogLeNet by 1.1%, VGG-16 by 4.3%, ResNet-18 by 1.2% and ResNet-50 by 1.1%, respectively. On the WSJ'93 dataset, DSD improved DeepSpeech and DeepSpeech2 WER by 2.0% and 1.1%. On the Flickr-8K dataset, DSD improved the NeuralTalk BLEU score by over 1.7. DSD is easy to use in practice: at training time, DSD incurs only one extra hyper-parameter: the sparsity ratio in the S step. At testing time, DSD doesn't change the network architecture or incur any inference overhead. The consistent and significant performance gain of DSD experiments shows the inadequacy of the current training methods for finding the best local optimum, while DSD effectively achieves superior optimization performance for finding a better solution. DSD models are available to download at https://songhan.github.io/DSD.

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
my发布了新的文献求助10
2秒前
2秒前
小汪要加油完成签到,获得积分10
2秒前
小马甲应助abcdulla777采纳,获得10
2秒前
小二郎应助2424采纳,获得10
2秒前
笑一笑发布了新的文献求助10
4秒前
阿東发布了新的文献求助10
4秒前
5秒前
研友_08okB8发布了新的文献求助10
5秒前
科研通AI2S应助ewk采纳,获得10
7秒前
7秒前
己禾完成签到 ,获得积分10
8秒前
科研通AI2S应助壮观的梦易采纳,获得10
9秒前
12秒前
Paralloria完成签到,获得积分10
13秒前
贝拉完成签到 ,获得积分10
13秒前
冷静的伊完成签到,获得积分10
14秒前
15秒前
16秒前
小二郎应助呆瓜采纳,获得10
17秒前
17秒前
18秒前
19秒前
19秒前
Lucas应助爱笑宛海采纳,获得10
19秒前
科研通AI2S应助壮观的梦易采纳,获得10
20秒前
20秒前
舒心的依风完成签到,获得积分10
20秒前
Phucgialam完成签到,获得积分10
21秒前
21秒前
21秒前
abcdulla777发布了新的文献求助10
22秒前
科研通AI2S应助王琴采纳,获得10
22秒前
22秒前
23秒前
23秒前
一路前行发布了新的文献求助10
24秒前
dd发布了新的文献求助10
24秒前
25秒前
25秒前
高分求助中
Shape Determination of Large Sedimental Rock Fragments 2000
Sustainability in Tides Chemistry 2000
Handbook of Qualitative Research 1000
Rechtsphilosophie 1000
Bayesian Models of Cognition:Reverse Engineering the Mind 888
A Dissection Guide & Atlas to the Rabbit 600
Very-high-order BVD Schemes Using β-variable THINC Method 568
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 催化作用 物理化学 免疫学 量子力学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 3129419
求助须知:如何正确求助?哪些是违规求助? 2780198
关于积分的说明 7746898
捐赠科研通 2435421
什么是DOI,文献DOI怎么找? 1294067
科研通“疑难数据库(出版商)”最低求助积分说明 623580
版权声明 600554