过度拟合
计算机科学
正规化(语言学)
概化理论
语言模型
人工智能
机器学习
提前停车
理论(学习稳定性)
噪音(视频)
高斯分布
人工神经网络
数学
图像(数学)
物理
量子力学
统计
作者
Hang Hua,Xingjian Li,Dejing Dou,Cheng‐Zhong Xu,Jiebo Luo
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2023-01-01
卷期号:: 1-15
被引量:4
标识
DOI:10.1109/tnnls.2023.3330926
摘要
The advent of large-scale pretrained language models (PLMs) has contributed greatly to the progress in natural language processing (NLP). Despite its recent success and wide adoption, fine-tuning a PLM often suffers from overfitting, which leads to poor generalizability due to the extremely high complexity of the model and the limited training samples from downstream tasks. To address this problem, we propose a novel and effective fine-tuning framework, named layerwise noise stability regularization (LNSR). Specifically, our method perturbs the input of neural networks with the standard Gaussian or in-manifold noise in the representation space and regularizes each layer's output of the language model. We provide theoretical and experimental analyses to prove the effectiveness of our method. The empirical results show that our proposed method outperforms several state-of-the-art algorithms, such as [Formula: see text] norm and start point (L2-SP), Mixout, FreeLB, and smoothness inducing adversarial regularization and Bregman proximal point optimization (SMART). In addition to evaluating the proposed method on relatively simple text classification tasks, similar to the prior works, we further evaluate the effectiveness of our method on more challenging question-answering (QA) tasks. These tasks present a higher level of difficulty, and they provide a larger amount of training examples for tuning a well-generalized model. Furthermore, the empirical results indicate that our proposed method can improve the ability of language models to domain generalization.
科研通智能强力驱动
Strongly Powered by AbleSci AI