树库
困惑
计算机科学
语言模型
循环神经网络
正规化(语言学)
机器翻译
人工智能
单调函数
词(群论)
深层神经网络
序列(生物学)
自然语言处理
人工神经网络
数学
数学分析
几何学
依赖关系(UML)
生物
遗传学
作者
Stephen Merity,Nitish Shirish Keskar,Richard Socher
出处
期刊:Cornell University - arXiv
日期:2017-08-07
被引量:299
摘要
Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this paper, we consider the specific problem of word-level language modeling and investigate strategies for regularizing and optimizing LSTM-based models. We propose the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization. Further, we introduce NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Using these and other regularization strategies, we achieve state-of-the-art word level perplexities on two data sets: 57.3 on Penn Treebank and 65.8 on WikiText-2. In exploring the effectiveness of a neural cache in conjunction with our proposed model, we achieve an even lower state-of-the-art perplexity of 52.8 on Penn Treebank and 52.0 on WikiText-2.
科研通智能强力驱动
Strongly Powered by AbleSci AI